Before going into Bayes and posterior probability let us first understand few terms we going to use:-

Conditional Probability:-

Conditional Probability and Independence:-

A conditional probability is the probability of one event if another event occurred. In the “die-toss” example, the probability of event A, three dots showing, is P(A) = 1/6 on a single toss. But what if we know that event B, at least three dots showing, occurred? Then there are only four possible outcomes, one of which is A. The probability of A = {3} is 1/4 , given that B = {3, 4, 5, 6} occurred. The conditional probability of A given B is written P(A|B).

Event A is independent of B if the conditional probability of A given B is the same as the unconditional probability of A. That is, they are independent if P(A|B) = P(A) In the die-toss example, P(A) = 1/6 and P(A|B) = 1/4 , so the events A and B are not independent.

Total Probability:

The events H1, . . . , Hn are usually called hypotheses and follows that P(H1) + · · · + P(Hn) = 1 (= P(S)). Let the event of interest A happens under any of the hypotheses Hi with a known (conditional) probability P(A|Hi). Assume, in addition, that the probabilities of hypotheses H1, . . . , Hn are known. Then P(A) can be calculated using the total probability formula.

The probability of A is the weighted average of the conditional probabilities P(A|Hi) with weights P(Hi).

Now below is the good example to understand the Total probability under hypotheses:-

Out of 100 coins one has heads on both sides. One coin is chosen at random and flipped two times. What is the probability to get (a) two heads? (b) two tails?

(a) Let A be the event that two heads are obtained. Denote by H1 the event (hypothesis) that a fair coin was chosen. The hypothesis H2 = H1c is the event that the two-headed coin was chosen. P(A) = P(A|H1)P(H1) + P(A|H2)P(H2) = 1/4 * 99/100 + 1 * 1/100 = 103/400 = 0.2575.

(b) Let B be the event that two tails are obtained. Denote H1 the event(hypothesis) that a fair coin was chosen. and H2 is the event = 0 because there is no chance we get a two head(same face on both side) coin and get a tail. P(B) = P(B|H1)P(H1) + P(B|H2)P(H2) = 1/4 * 99/100 + 0 = 0.2475 

Bayes Formula:-

Let the event of interest A happens under any of hypotheses Hi with a known (conditional) probability P(A|Hi). Assume, in addition, that the probabilities of hypotheses H1, . . . , Hn are known (prior probabilities). Then the conditional (posterior) probability of the hypothesis Hi , i = 1, 2, . . . , n, given that event A happened, is P(Hi |A) = P(A|Hi)P(Hi) / P(A) , where P(A) = P(A|H1)P(H1) + · · · + P(A|Hn)P(Hn).

Assume that out of N coins in a box, one has heads at both sides(“two-headed”). Lets say that a coin is selected at random from the box, and without inspecting it, flipped k times. All k times the coin landed up heads. What is the probability that two headed coin was selected? Denote with Ak the event that randomly selected coin lands heads up k times. The hypotheses are H1-the coin is two headed, and H2 the coin is fair. It is easy to see that P(H1) = 1/N and P(H2) = (N − 1)/N. The conditional probabilities are P(Ak|H1) = 1 for any k, and P(Ak|H2) = 1/(2)k . By total probability formula, 

For N = 1, 000, 000 and k = 1, 2, . . . , 30 the graph of posterior probabilities is given in Figure below. It is interesting that our prior probability P(H1) = 0.000001 jumps to posterior probability of 0.9991, after observing 30 heads in a row.

Let us build a basic model using Naive Bayes in Python and Applications of Naive Bayes Algorithms.

1. Naive Bayes is an eager learning classifier and it is sure fast. Thus, it could be used for making predictions in real time. You can browse my git here to get an idea of this classifier.

2. Multinomial Naive Bayes: This algorithm is also well known for multi class prediction feature. Here we can predict the probability of multiple classes of target variable. You can browse here to get more idea about application of this classifier.

3. Text classification/ Spam Filtering/ Sentiment Analysis: Naive Bayes classifiers mostly used in text classification (due to better result in multi class problems and independence rule) have higher success rate as compared to other algorithms. As a result, it is widely used in Spam filtering (identify spam e-mail) and Sentiment Analysis (in social media analysis, to identify positive and negative customer sentiments). See the basic text classifier here.

4. Recommendation System: Naive Bayes Classifier and Collaborative Filtering together builds a Recommendation System that uses machine learning and data mining techniques to filter unseen information and predict whether a user would like a given resource or not. You can browse here(A work in progress) for this classifier.

Happy Machine Learning…

Leave a Reply

Your email address will not be published. Required fields are marked *