## Markov Models

# Markov Models: Overview

Morkov models extract linguistic knowledge automatically from the large corpora and do POS tagging. Morkov models are alternatives for laborious and time-consuming manual tagging.

# Markov Property

The name Markov model is derived from the term Markov property. Markov property is an assumption that allows the system to be analyzed. According to Markov property, given the current state of the system, the future evolution of the system is independent of its past. The Markov property is assured if the transition probabilities are given by exponential distributions with constant failure or repair rates.

Assume that there is a sequence of random variables. And value of each variable depends on the previous elements in the sequence. In most of the cases, value of the present variable is sufficient to predict the next random variable. That is, future elements only depend on the present one and not on the past elements.

# What is Markov Model?

A Markov model is nothing but a finite-state machine. Each state has two probability distribution: the probability of emitting a symbol and probability of moving to a particular state. From one state, the Markov model emits a symbol and then moves to another state.

The objective of Markov model is to find optimal sequence of tags T = {t1, t2, t3,…tn} for the word sequence W = {w1,w2,w3,…wn}. That is to find the most probable tag sequence for a word sequence.

If we assume the probability of a tag depends only on one previous tag then the model developed is called bigram model. Each state in the bigram model corresponds to a POS tag. The probability of moving from one POS state to another can be represented as P(t_{i}|t_{j}). The probability of word being emitted from a particular tag state can be represented as P(w_{i}|t_{j}). Assume that the sentence, “The rabbit runs” is to be tagged. Obviously, the word, *The* is determiner so can be annotated with tag, say Det, *rabbit* is noun so the tag can be N, and *runs* is a verb so the tag can be V. So we get the tagged sentence as

The|Det rabbit|V runs|V

Given this model, P(Det N V | The rabbit runs) is estimated as

P(Det | START) * P(N | Det) * P(V | N) * P(The | Det) * P(rabbit | N) * P(runs | V)

This is how to derive probabilities required for the Morkov model.

# Hidden Markov Models (HMM)

Hidden Markov Models (HMM) are so called because the state transitions are not observable. HMM taggers require only a lexicon and untagged text for training a tagger. Hidden Markov Models aim to make a language model automatically with little effort. Disambiguation is done by assigning more probable tag. For example, the word help will be tagged as noun rather than verb if it comes after an article. This is because the probability of noun is much more than verb in this context.

In an HMM, we know only the probabilistic function of the state sequence. In the beginning of tagging process, some initial tag probabilities are assigned to the HMM. Then in each training cycle, this initial setting is refined using the Baum-Welch re-estimation algorithm.

**NEXT:** Maximum Entropy Method