RULE BASED POS TAGGING

RULE-BASED PARTS-OF-SPEECH TAGGING

Rule-based part-of-speech tagging is the oldest approach that uses hand-written rules for tagging. Rule based taggers depends on dictionary or lexicon to get possible tags for each word to be tagged. Hand-written rules are used to identify the correct tag when a word has more than one possible tag.

Disambiguation is done by analysing the linguistic features of the word, its preceding word, its following word and other aspects. For example, if the preceding word is article then the word in question must be noun. This information is coded in the form of rules.

The rules may be context-pattern rules or as regular expressions compiled into finite-state automata that are intersected with lexically ambiguous sentence representations. TAGGIT, the first large rule based tagger, used context-pattern rules. TAGGIT used a set of 71 tags and 3300 disambiguation rules. These rules disambiguated 77% of words in the million-word Brown University corpus.