bayesian analysis

Consider a text classification where texts are represented as a bag-of words representation. For

instance, a text x =“BDA exam is very very easy” is represented as x = {BDA, exam, is, very, very,

easy}. Each word in a document is sampled independently from an identical distribution. A popular

machine learning approach to text classification is Naive Bayes model which tries to predict the probability

of text (x) belonging to a class (y), p(y|x), using the Bayes theorem. It assumes the class conditional

distribution p(x|y) to be independently and identically distributed across the features (x_i), the

words in the text. Assume the vocabulary size to be V.

(b) Consider the spam classification problem where the email text belong to either of two classes

“ham” or “spam”. Here is the training data

ham d1: “good” ham d2: “very good”

spam d3: “bad” spam d4: “very bad” spam d5: “very bad very bad.”

Now consider the test data d6: “good? bad very bad”. Assuming the vocabulary to be V={very, good,

bad} (treat “good?” same as “good”), to which class does the test data d6 belong to under maximum

likelihood estimation of class conditional distribution and class distribution parameters?

Likelihood function is just product of three binominal distributions, which can be written analytically.

