Big Data

LAMP stack in Cloud: Building a Scalable, Secure and Highly Available architecture using AWS

1. Requirement Overview The acronym LAMP (Linux, Apache, MySQL, PHP) refers to an open-source stack, used to run dynamic and static content of servers. A small startup organization uses the LAMP stack of software. The dynamic nature of demand and projected future growth in traffic drives the need for a massively scalable solution to enable …

LAMP stack in Cloud: Building a Scalable, Secure and Highly Available architecture using AWS Read More »

Reference architecture of bigdata solution in GCP and Azure…

This article is a showcase of a Reference architecture approach for the financial sector where stream and batch processing is a common part of its solution with other designs. Firstly the requirement analysis is the step to define the implementation of any use case. Therefore before moving to reference architecture we first need to understand …

Reference architecture of bigdata solution in GCP and Azure… Read More »

Error resolution of Zalando Research Flair NLP package installation on Centos 7, “Failed building wheel for regex…”​

I was working on an NLP tool for evaluation purposes and found an issue in creating the environment. They had set up everything on Ubuntu so they might not face this issue but I am replicating on Centos 7 and found an error. Hope this will help someone. The project is based on PyTorch 0.4+ …

Error resolution of Zalando Research Flair NLP package installation on Centos 7, “Failed building wheel for regex…”​ Read More »

How to create an Apache Beam data pipeline and deploy it using Cloud Dataflow in Java

Cloud Dataflow is a fully managed google service for executing data processing pipelines using Apache Beam. What do you mean by fully managed? Cloud dataflow like BigQuery dynamically provisions the optimal quantity and type of resource(i.e CPU or memory instances) based on volume and specific resource requirements for your job. Cloud dataflow is a server-less …

How to create an Apache Beam data pipeline and deploy it using Cloud Dataflow in Java Read More »

Google Dataflow Python ValueError: Unable to get the Filesystem for path gs://myprojetc/digport/ports.csv.gz

I am using google cloud to create an event on Cloud Storage to Big Query using Apache Beam pythons library. I was executing an ETL in the “DirectRunner” mode and found no issue. But later when I take everything on dataflow to execute found an error. Below command used to upload the file and I …

Google Dataflow Python ValueError: Unable to get the Filesystem for path gs://myprojetc/digport/ports.csv.gz Read More »

Python: Stream the ingest of data into the database in real-time using dataflow.

In my previous articles, we solve real-time data ingestion problems using various tools like Apache Kafka, Storm, Flink and Spark. I have shown you in detail that how to create such pipelines for real-time processing. In this blog, we will try to simulate a similar problem using Apache Beam and Dataflow using Python. Let’s say …

Python: Stream the ingest of data into the database in real-time using dataflow. Read More »

Sample Dataflow Pipeline featuring Cloud Pub/Sub, Dataflow, and BigQuery…

Streaming data in Google Cloud Platform is typically published to Cloud Pub/Sub, a serverless real-time messaging service. Cloud Pub/Sub provides reliable delivery and can scale to more than a million messages per second. It stores copies of messages in multiple zones to provide “at least once” guaranteed delivery to subscribers, and there can be many …

Sample Dataflow Pipeline featuring Cloud Pub/Sub, Dataflow, and BigQuery… Read More »

Solved: Protocol tcp Port Exclusion issues when running Hadoop on Windows Docker

If you’re looking for simple and painless Hadoop deployment, Docker is the right tool for you. deployment. We mostly use Docker community edition-CE (https://docs.docker.com/docker-for-windows/install/) on Microsoft Windows, under system requirement it clearly says “Hyper-V and Containers Windows features must be enabled.” to run Docker on Windows. In case you are using Docker Engine – Enterprise(EE) you …

Solved: Protocol tcp Port Exclusion issues when running Hadoop on Windows Docker Read More »