PostHeaderIcon Computational Linguistics

Computational linguistics is considered as synonym of natural language processing. The main task of Computational linguistics is to construct programs in order to process words and texts in natural language. It is an interdisciplinary field.

The term, computational linguistics, was coined by David Hays, member of Automatic Language Processing Advisory Committee (ALPAC). The final report of ALPAC proposed a new field called computational linguistics and machine translation should be considered as short-term engineering goal. This progression from machine translation to computational linguistics occurred in 1974.

A computer system is considered as linguistic if it uses some data or procedures that are language dependent and large. Therefore not every program that process natural language text is related to linguistics. Word processors, for example, process natural language text but they are not sufficiently language dependent.

PostHeaderIcon Extra-linguistic Universe

Extra-linguistic universe means the subjectively understood world of each individual, which provides him with things to talk about. It can consist of his actual or imagined personal experiences, observed or reported experiences of others, visions, aspirations and so on. A lot of what is needed to understand these is shared by all mankind, which is what makes communication possible across individual and group boundaries.

If a person, for example, says to a rock, “Move out of the way” he might not have performed an illocutionary act to the satisfaction of the philosopher. But that is beside the point as far as the linguist is concerned. For linguist it is as good a command sentence (expressed by the imperative mood in this particular case) as any, whether it is addressed to a man, a dog or a rock.

PostHeaderIcon Universal Grammar

Introduction To Universal Grammar

Noam Chomsky

Universal grammar is the brainchild of Noam Chomsky. In contrast to taxonomic approach of traditional grammar, universal grammar adopts cognitive approach. Cognition is the study of processes by means of which what human beings get to know the world. Human beings have tacit (i.e. Subconscious) knowledge of grammar. That is humans know how to form and interpret expressions in their native languages. they know but they may not explain how they get this ability. This is because they have no conscious awareness of the processes involved.

Competence and Performance

Chomsky says that native speakers have grammatical competence (i.e. Tacit knowledge) in their native language. Chomsky made difference between competence and performance. According to him while competence is knowledge of language performance is the actual use of language in concrete situations. Universal grammar is concerned with competence in that it tells what someone should know to have competence in a language. Note that performance is properly studied in psycholinguistics. Theoretically Universal Grammar (UG) generalises from the grammars of particular I-languages (i.e. Internalised linguistic system) to the grammars of all possible natural I- languages.

Universal Grammar Theory

Universal grammar is a theory of knolwedge. It is not a theory of behaviour. It mainly concerns with the internal structure of human mind. Universal grammar theory holds that the speaker knows a set of principles that apply to all languages, and parameters that vary from one language to another. Universal grammar theory is making precise statements about properties of the mind based on specific evidence. It important to note that the theory attempts to integrate grammar, mind and langauge at very moment.

Chomsky’s Questions on Linguistics

Following questions of Chomsky summarizes the aims of linguistics.

1. What constitutes knowledge of language? The linguists duty is to describe what people know about language.
2. How is such knowledge acquired? A linguist has to discover how people acquire this knowledge.
3. How is such knowledge put to use? The linguists have to see how people use the language knowledge acquired.

Sometimes there is a fourth question also.

4. What are the physical mechanisms that serve as the material basis for this system of knowledge and for the use of this knowledge? There must be some physical correlate to this mental knowledge. That is there should be a link between mind and brain.

I-Language and E-Language

Chomsky distinguishes Externalized (E-) language from Internalized (I-) language. E language linguistics aims to collect samples of language and then to describe their properties. The linguist’s task is to bring order to the set of external facts that make up the language. The resulting grammar is described in terms of properties of such data through ‘structures’ or ‘patterns’. I-language linguists on the other hand is concerned with what a speaker knows about a language and where this language knowledge comes from. I-language treats language as an internal property of the human mind rather than something external. Chomsky’s theories fall within the I-language tradition and aim at exploring the mind rather than environment. I-language theory claims that establishing knowledge itself logically precedes studying how people acquire and use that knowledge. Chomsky introduced the term pragmatic competence: knowledge of how language is related to the situation in which it is used. Knowledge of language use is different from knowledge of language itself. So it may be possible to have grammatical competence without pragmatic competence.

PostHeaderIcon What is grammar?


Grammar studies the way in which words/morphemes join to form meaningful sentences. Grammar is a set of constraints on the possible sequences of symbols expressed as rules or principles. Syntax is the basic ingredient of grammar. Grammar tells us the difference between sets of sentences.

Fundamental Units

There are five fundamental units of grammatical structure: morpheme, word, phrase, clause, and sentence. Morpheme is the lowest unit. Morphemes joined to form word. Phrase and clause are group of words. While phrase does not have subject and predicate, clause does have its own subject and predicate. In a sentence, Joe sings, Joe is subject and sings is predicate. Sentence is also a group of words that convey some meaning.

Note that what is described above is called traditional grammar. Subject, predicate, etc are called grammatical functions. Parts-of-speech such as verb, noun, adjective are called grammatical categories.

Computational grammars are those that are meant for Natural Language processing. They should be detailed, precise and exhaustive. They should be descriptive grammars so that computers can correctly interpret and apply them.

PostHeaderIcon Corpus Linguistics

What is Corpus Linguistics?

Corpus Linguistics is a branch of linguistics that uses a large collection of natural texts known as corpus for analysis. It is a complementary approach to traditional approaches. Corpus linguistics gets its real power by using computers for analysis.

Characteristics Of Corpus Linguistics

    It is empirical analyzing the actual patterns of use in natural texts.
    It utilizes a corpus as the basis for analysis.
    It makes use of computers for analysis. It uses both automatic and interactive techniques.
    It depends on both quantitative and qualitative analytical techniques.

Corpus Annotation

Corpus annotation is an area of corpus linguistics. This is to annotate corpus texts with linguistic information. The resultant annotated corpus is extremely useful for corpus based machine translation. Annotation by hand is painful and time-consuming process. So corpus annotation is usually done either automatically or semi-automatically. NLP tools such as lemmatizers and part-of-speech taggers are used for this purpose. As these tools are not accurate, manual correction is required to have the corpus accurate.

Advantages Of Corpus Linguistics

    Linguistic analysis is consistent and reliable.
    As computers are used it is possible to identify and analyze complex patterns of language use.
    Computers allow linguists to store and analyse larger database of natural language.
    Interaction between computers and linguists gives double advantage: while computers manage data, linguists can make difficult linguistic judgements.

Applications of Corupus Linguistics

Corpus linguistics is ued to study a wide variety of topics within linguistics. Corpus based techniques allow to study core areas of linguistic structure such as lexicography and grammar. Dictionary makers use these techniques to include information about the most common uses, frequency of related words, and the contexts in which words and meanings are most commonly found. Corpus bsed techniques allow socio-linguists to investigate dialect and register patterns. Copus linguistics is also used for language acquisition. With the use of corpora of learner’s language, studies can be based on large number of learners and it is also possible to examine general pattern across learners. Corpus based studies are applied to educational linguistics to design effective materials and activities for classroom.

Analytical Tools

Corpusbench: It is used for word counts, concordancing, grammatical and morphological analyses.
LEXA: It is a powerful shareware program for concordancing and simple tagging.
MicroConcord: It is used for word counts, concordancing, syntactic and morphological analyses.
TACT: It is a shareware program for frequency counts, concordancing, and collocations.
WordCruncher: It is a concordancing program. It also produces frequency lists and collocations.

