Analytics, Apache Spark, Hadoop, Kafka, Python, Spark

Consume JSON Messages From Kafka Using Kafka-Python’s Deserializer

Hope you are here when you want to take a ride on Python and Apache Kafka. Kafka-Python is most popular python library for Python. For documentation on this library visit to page¬†https://kafka-python.readthedocs.io/en/master/. kafka-python is designed to function much like the official java client. kafka-python is best used with newer brokers (0.9+), but is backwards-compatible with older versions (to 0.8.0). Some features will only be enabled on newer brokers. So instead of showing you a simple example to run Kafka Producer and Consumer separately, I’ll show the JSON serializer and deserializer. Preparing the Environment Lets start with Install python package using […]

Apache Spark, Spark

Introduction to Spark

Introduction to Apache Spark:- Spark As a Unified Stack and Computational Engine is responsible for scheduling, distributing, and monitoring applications consisting of many computational tasks across many worker machines. Eventually the big data exports around the world have derived the specialized systems on top of Hadoop to solve certain problems like graph processing, implementation of efficient iterative algorithms, real time query engines etc.. As you may know all the other components like Impala, Mahout, Tez, GraphLab etc are derived from Hadoop for different purposes. What is Apache Spark? Apache spark is the generalized engine which combines the specialties of all […]