Posts Tagged ‘program’

PostHeaderIcon Python code for Tokenization

In the field of natural language processing it is often necessary to parse the sentences and analyze them. For this purpose tokenization is the key task. Python splits the given text or sentence based on the given delimiter or separator.

Following code splits the given text and generate a list of tokens.

if __name__ == '__main__':

#No separator
text = ‘This is a text for testing tokenization’
tokens = text.split()
print tokens

tokens = tokenize(text,’ ‘)
print tokens

#inconrrect separator
tokens = text.split(‘|’)
print tokens

#with more than one space
text = ‘This is a text for testing tokenization’
tokens = text.split()
print tokens

text = ‘This is a text for testing tokenization’
tokens = text.split(‘ ‘)
print tokens

tokens = text.split(‘ ‘)
print tokens

text = ‘This,is,a,text,for,testing,tokenization’
tokens = text.split(‘,’)
print tokens

Running the above program produces the following output.

[‘This’, ‘is’, ‘a’, ‘text’, ‘for’, ‘testing’, ‘tokenization’]
[‘This’, ‘is’, ‘a’, ‘text’, ‘for’, ‘testing’, ‘tokenization’]
[‘This is a text for testing tokenization’]
[‘This’, ‘is’, ‘a’, ‘text’, ‘for’, ‘testing’, ‘tokenization’]
[‘This’, ”, ‘is’, ”, ‘a’, ”, ‘text’, ”, ‘for’, ”, ‘testing’, ”, ‘tokenization’]
[‘This’, ‘is’, ‘a’, ‘text’, ‘for’, ‘testing’, ‘tokenization’]
[‘This’, ‘is’, ‘a’, ‘text’, ‘for’, ‘testing’, ‘tokenization’]

PostHeaderIcon How to read excel file in Python

The following Python program reads data from the spreadsheet.

'''
Created on 15-Mar-2013

@author: Robin
'''
from xlrd import open_workbook

def readWorksheet(sheetIO):
# Read row by row
for rownum in range(sheetIO.nrows):
rowValues = sheetIO.row_values(rownum)
rollNo = rowValues[0]
name = rowValues[1]
print rollNo, name

if __name__ == '__main__':
bookTestData = open_workbook("C:/pythontests/sample-spreadsheet.xls")
sheetIO = bookTestData.sheet_by_name('Sheet1')
readWorksheet(sheetIO)

In the above code, xlrd package is used for accessing the workbook. Note that the excel file should have the extension xls.

In the above example, the name of the excel file is sapmple-spreadsheet.xls. The name of the worksheet is Sheet1 and it has two columns namely, rollNo and name.