PostHeaderIcon Corpus Linguistics

What is Corpus Linguistics?

Corpus Linguistics is a branch of linguistics that uses a large collection of natural texts known as corpus for analysis. It is a complementary approach to traditional approaches. Corpus linguistics gets its real power by using computers for analysis.

Characteristics Of Corpus Linguistics

    It is empirical analyzing the actual patterns of use in natural texts.
    It utilizes a corpus as the basis for analysis.
    It makes use of computers for analysis. It uses both automatic and interactive techniques.
    It depends on both quantitative and qualitative analytical techniques.

Corpus Annotation

Corpus annotation is an area of corpus linguistics. This is to annotate corpus texts with linguistic information. The resultant annotated corpus is extremely useful for corpus based machine translation. Annotation by hand is painful and time-consuming process. So corpus annotation is usually done either automatically or semi-automatically. NLP tools such as lemmatizers and part-of-speech taggers are used for this purpose. As these tools are not accurate, manual correction is required to have the corpus accurate.

Advantages Of Corpus Linguistics

    Linguistic analysis is consistent and reliable.
    As computers are used it is possible to identify and analyze complex patterns of language use.
    Computers allow linguists to store and analyse larger database of natural language.
    Interaction between computers and linguists gives double advantage: while computers manage data, linguists can make difficult linguistic judgements.

Applications of Corupus Linguistics

Corpus linguistics is ued to study a wide variety of topics within linguistics. Corpus based techniques allow to study core areas of linguistic structure such as lexicography and grammar. Dictionary makers use these techniques to include information about the most common uses, frequency of related words, and the contexts in which words and meanings are most commonly found. Corpus bsed techniques allow socio-linguists to investigate dialect and register patterns. Copus linguistics is also used for language acquisition. With the use of corpora of learner’s language, studies can be based on large number of learners and it is also possible to examine general pattern across learners. Corpus based studies are applied to educational linguistics to design effective materials and activities for classroom.

Analytical Tools

Corpusbench: It is used for word counts, concordancing, grammatical and morphological analyses.
LEXA: It is a powerful shareware program for concordancing and simple tagging.
MicroConcord: It is used for word counts, concordancing, syntactic and morphological analyses.
TACT: It is a shareware program for frequency counts, concordancing, and collocations.
WordCruncher: It is a concordancing program. It also produces frequency lists and collocations.

Related Articles

Lingustics: Overview
Tokenization

For Further Study

Corpus Linguistics

Corpus Linguistics: Method, Theory and Practice (Cambridge Textbooks in Linguistics)