CORPUS LINGUISTICS

WHAT IS CORPUS LINGUISTICS?

Corpus Linguistics is a branch of linguistics that uses a large collection of natural texts known as corpus for analysis. It is a complementary approach to traditional approaches. Corpus linguistics gets its real power by using computers for analysis.

CHARACTERISTICS OF CORPUS LINGUISTICS

CORPUS ANNOTATION

Corpus annotation is an area of corpus linguistics. This is to annotate corpus texts with linguistic information. The resultant annotated corpus is extremely useful for corpus based machine translation. Annotation by hand is painful and time-consuming process. So corpus annotation is usually done either automatically or semi-automatically. NLP tools such as lemmatizers and part-of-speech taggers are used for this purpose. As these tools are not accurate, manual correction is required to have the corpus accurate.

ADVANTAGES OF CORPUS LINGUISTICS

APPLICATIONS OF CORUPUS LINGUISTICS

Corpus linguistics is used to study a wide variety of topics within linguistics. Corpus based techniques allow to study core areas of linguistic structure such as lexicography and grammar. Dictionary makers use these techniques to include information about the most common uses, frequency of related words, and the contexts in which words and meanings are most commonly found. Corpus based techniques allow socio-linguists to investigate dialect and register patterns. Corpus linguistics is also used for language acquisition. With the use of corpora of learner’s language, studies can be based on large number of learners and it is also possible to examine general pattern across learners. Corpus based studies are applied to educational linguistics to design effective materials and activities for classroom.

ANALYTICAL TOOLS

Corpusbench: It is used for word counts, concordancing, grammatical and morphological analyses.
LEXA: It is a powerful shareware program for concordancing and simple tagging.
MicroConcord: It is used for word counts, concordancing, syntactic and morphological analyses.
TACT: It is a shareware program for frequency counts, concordancing, and collocations.
WordCruncher: It is a concordancing program. It also produces frequency lists and collocations.