Demo Delta Lake on big data workloads…

First what’s the difference between a Delta Lake and Change Data Capture? CDC is just the log of changes on a relational table. Delta Lake is to provide more native administrative capabilities to a data lake implementation (schemas, transactions, cataloging). Delta Lake is an open-source storage layer that brings ACID transactions to Apache Spark™ and… Continue reading Demo Delta Lake on big data workloads…

Part -2: Operators teach Kubernetes how to simplify stateful application…

I hope you have enjoyed my first article(link below) on Operator extension and Kubernetes Introduction. Now level up to this series another article where I’ll show case a Kafka operator and the Operator capabilities on Kubernetes to achieve a stateful behaviour. In case you miss first article here is the link. Now let’s jump on… Continue reading Part -2: Operators teach Kubernetes how to simplify stateful application…

Operators teach Kubernetes how to simplify the stateful application…

This is the first article to a series of articles to showcase how we use Operator that can leverage Kubernetes to create a stateful application such as Kafka Cluster. An Operator is a way to package, run, and maintain a Kubernetes application. An Operator builds on Kubernetes to automate the entire lifecycle of the software… Continue reading Operators teach Kubernetes how to simplify the stateful application…

LAMP stack in Cloud: Building a Scalable, Secure and Highly Available architecture using AWS

1. Requirement Overview The acronym LAMP (Linux, Apache, MySQL, PHP) refers to an open-source stack, used to run dynamic and static content of servers. A small startup organization uses the LAMP stack of software. The dynamic nature of demand and projected future growth in traffic drives the need for a massively scalable solution to enable… Continue reading LAMP stack in Cloud: Building a Scalable, Secure and Highly Available architecture using AWS

Reference architecture of bigdata solution in GCP and Azure…

This article is a showcase of a Reference architecture approach for the financial sector where stream and batch processing is a common part of its solution with other designs. Firstly the requirement analysis is the step to define the implementation of any use case. Therefore before moving to reference architecture we first need to understand… Continue reading Reference architecture of bigdata solution in GCP and Azure…

Error resolution of Zalando Research Flair NLP package installation on Centos 7, “Failed building wheel for regex…”​

I was working on an NLP tool for evaluation purposes and found an issue in creating the environment. They had set up everything on Ubuntu so they might not face this issue but I am replicating on Centos 7 and found an error. Hope this will help someone. The project is based on PyTorch 0.4+… Continue reading Error resolution of Zalando Research Flair NLP package installation on Centos 7, “Failed building wheel for regex…”​

How to create an Apache Beam data pipeline and deploy it using Cloud Dataflow in Java

Cloud Dataflow is a fully managed google service for executing data processing pipelines using Apache Beam. What do you mean by fully managed? Cloud dataflow like BigQuery dynamically provisions the optimal quantity and type of resource(i.e CPU or memory instances) based on volume and specific resource requirements for your job. Cloud dataflow is a server-less… Continue reading How to create an Apache Beam data pipeline and deploy it using Cloud Dataflow in Java