Hadoop

Fundamantals of Apache Spark…

You can view my other articles on Spark RDD at below links… Apache Spark RDD API using Pyspark…Tips and Tricks for Apache Spark RDD API, Dataframe API How did Spark become so efficient in data processing as compared to MapReduce? It comes with a very advanced Directed Acyclic Graph (DAG) data processing engine. What it means is that for every Spark job, a DAG of tasks is created to be executed by the engine. The DAG in mathematical parlance consists of a set of vertices and directed edges connecting them. The tasks are executed as per the DAG layout. In […]

Apache Spark, Spark

Introduction to Spark

Introduction to Apache Spark:- Spark As a Unified Stack and Computational Engine is responsible for scheduling, distributing, and monitoring applications consisting of many computational tasks across many worker machines. Eventually the big data exports around the world have derived the specialized systems on top of Hadoop to solve certain problems like graph processing, implementation of efficient iterative algorithms, real time query engines etc.. As you may know all the other components like Impala, Mahout, Tez, GraphLab etc are derived from Hadoop for different purposes. What is Apache Spark? Apache spark is the generalized engine which combines the specialties of all […]