Contextual Word Similarity is nothing but identifying different types of similarities between words. It is one of the goals of Natural Language Processing. Statistical approaches are used for computing the degree of similarity between words. A word is represented by a word co-occurrence vector in which each entry corresponds to another word in the lexicon. The value of the entry tells the frequency of joint occurrence of the two words in the corpus. Similarity or distance measure is applied to the pairs of vectors in order to compute the similarity between the pair of words.
Corpus-based Approach
A typical corpus-based approach for computing word similarity is based on representing a word by a set of its word co-occurrence statistics. It assumes that the meaning of words is related to their patterns of co-occurrence with other words in the text. Harris proposed this assumption in his work, distributional hypothesis. According to Harris, the meaning of entities and the meaning of grammatical relations among them is related to the restriction of combinations of these entities relative to other entities. Therefore words that resemble each other in their meaning will have similar co-occurrence patterns with other words. For example, text and sentence co-occur frequently with verbs such as read, write, edit and process. To capture this similarity, each word is represented by a word co-occurrence vector which represents the statistics of its co-occurrence with all other words in the lexicon. The word similarity is then computed by applying some vector similarity measure to the two corresponding co-occurrence vectors.