Big Data

PowerShell script wrappers using the Microsoft Azure AzCopy.exe tool

Use case We are working on building data lake in Azure using Azure container, ADF, Azure DWH, Databricks and many other services of Azure. After ingesting wide variety of datasources using API, on premise databases, flate files, reporting servers, we come to know that clients have some requirement to push files in Azure Blob storage. …

PowerShell script wrappers using the Microsoft Azure AzCopy.exe tool Read More »

Azure Arc – redefine hybrid cloud…

Azure delivered 59% revenue growth in the latest quarter which is more than expected from its other Microsoft products. MSFT introducing various new cloud services and acquisitions giving it edge over the rivals Amazon and Google. https://www.zdnet.com/article/azure-synapse-analytics-combines-data-warehouse-lake-and-pipelines/ https://www.cnbc.com/2019/11/04/microsofts-azure-arc-lets-customers-use-its-tools-on-other-clouds.html “Azure Arc enables customers to have a central, unified, and self-service approach to manage their Windows and …

Azure Arc – redefine hybrid cloud… Read More »

Approach to execute Machine Learning project, “Halt the Hate”…

Disclaimer: The analysis was done in this project touches a sensitive issue in India. So I never convince anybody to trust my model. A real human society is so complex that “all the things may be interconnected in a different way than in the model.” Imagine you are presented with a dataset of “Hate Crimes” …

Approach to execute Machine Learning project, “Halt the Hate”… Read More »

Bayesian-posterior imagination and applications…

Before going into Bayes and posterior probability let us first understand few terms we going to use:- Conditional Probability:- Conditional Probability and Independence:- A conditional probability is the probability of one event if another event occurred. In the “die-toss” example, the probability of event A, three dots showing, is P(A) = 1/6 on a single …

Bayesian-posterior imagination and applications… Read More »

Tips and Tricks for Apache Spark RDD API, Dataframe API- Part -1

I am planning to share my knowledge on Apache Spark RDD, Dataframes API and some tips and tricks. If I combine everything into one then it would be a very lengthy article. Therefore I am dividing the long article into three separate articles and this article is the first series in that continuation. Spark RDD API …

Tips and Tricks for Apache Spark RDD API, Dataframe API- Part -1 Read More »

In-depth Kafka Message queue principles of high-reliability

 At present many open source distributed processing systems such as Cloudera, Apache Storm, Spark and others support the integration with Kafka. Kafka is increasingly being favored by many internet shops and they use Kafka as one of its core messaging engines. The reliability of the Kafka message can be imagined as a commercial-grade messaging middleware …

In-depth Kafka Message queue principles of high-reliability Read More »