Analytics, Data Science, Exploratory Data Analysis, Hadoop

Approach to execute Machine Learning project, “Halt the Hate”…

Disclaimer: The analysis was done in this project touches a sensitive issue in India. So I never convince anybody to trust my model. A real human society is so complex that “all the things may be interconnected in a different way than in the model.” Imagine you are presented with a dataset of “Hate Crimes” in India and asked how to minimize these crimes by analyzing other factors. This is the problem I am taking in hand to solve and analyze with a minimum number of resources. Some can say that education and providing jobs to youth in India by […]

Bigdata, Data Science, Exploratory Data Analysis, Machine Learning

ROC curve and performance parameters of a classification model…

When we evaluate a model we analysis few parameters to verify the performance of our model. These parameters demonstrate the performance of our model using confusion matrices. Few more frequently used performance parameters are Accuracy, Precision, Recall and F1 score. Let me give you an idea what they are in this article so that when we talk about our model in next articles would not be confused with terms. So let’s say our model is ready and we want to know how good our model is? These terms help the audience of our hypothesis to understand how good predictions are. […]

Data Science, Exploratory Data Analysis, Machine Learning

Understanding distribution functions…

This article helps to understand distribution functions and its usage in Exploratory Data Analysis in Data Science. In next article, I’ll take you to some of the practical usages on my sample project for the terms defined here. Exploratory Data Analysis is the combination of many small tasks like data cleansing, data munging and create visualization etc to understand the value in data. In the distribution of data, we actually try to extract value out of it. Also, distribution is important when the data is ready for analysis and we have received another set of sample data then we do […]