Analytics, Data Science, Exploratory Data Analysis, Hadoop

Approach to execute Machine Learning project, “Halt the Hate”…

Disclaimer: The analysis was done in this project touches a sensitive issue in India. So I never convince anybody to trust my model. A real human society is so complex that “all the things may be interconnected in a different way than in the model.” Imagine you are presented with a dataset of “Hate Crimes” in India and asked how to minimize these crimes by analyzing other factors. This is the problem I am taking in hand to solve and analyze with a minimum number of resources. Some can say that education and providing jobs to youth in India by […]

Analytics, Bigdata, Data Science, Exploratory Data Analysis, Machine Learning

Bayesian-posterior imagination and applications…

Before going into Bayes and posterior probability let us first understand few terms we going to use:- Conditional Probability:- Conditional Probability and Independence:- A conditional probability is the probability of one event if another event occurred. In the “die-toss” example, the probability of event A, three dots showing, is P(A) = 1/6 on a single toss. But what if we know that event B, at least three dots showing, occurred? Then there are only four possible outcomes, one of which is A. The probability of A = {3} is 1/4 , given that B = {3, 4, 5, 6} occurred. […]

Best Practices, Bigdata, Data Science, Exploratory Data Analysis, Machine Learning

Ordinary least squares regression (OLSR)

Ordinary least squares regression (OLSR)  Invented in 1795 by Carl Friedrich Gauss, it is considered one of the earliest known general prediction methods. OLSR is a generalized linear modeling technique. It is used for estimating all unknown parameters involved in a linear regression model, the goal of which is to minimize the sum of the squares of the difference of the observed variables and the explanatory variables. Ordinary least squares regression is also known as ordinary least squares or least squared errors regression. Lets start with a Linear regression model like below:- Here is few terminology we use when we […]

Bigdata, Data Science, Exploratory Data Analysis, Machine Learning

ROC curve and performance parameters of a classification model…

When we evaluate a model we analysis few parameters to verify the performance of our model. These parameters demonstrate the performance of our model using confusion matrices. Few more frequently used performance parameters are Accuracy, Precision, Recall and F1 score. Let me give you an idea what they are in this article so that when we talk about our model in next articles would not be confused with terms. So let’s say our model is ready and we want to know how good our model is? These terms help the audience of our hypothesis to understand how good predictions are. […]

Data Science, Exploratory Data Analysis, Machine Learning

Understanding distribution functions…

This article helps to understand distribution functions and its usage in Exploratory Data Analysis in Data Science. In next article, I’ll take you to some of the practical usages on my sample project for the terms defined here. Exploratory Data Analysis is the combination of many small tasks like data cleansing, data munging and create visualization etc to understand the value in data. In the distribution of data, we actually try to extract value out of it. Also, distribution is important when the data is ready for analysis and we have received another set of sample data then we do […]