Analytics, Apache Spark, Best Practices, Bigdata, Framework

Tips and Tricks for Apache Spark RDD API, Dataframe API- Part -1

I am planning to share my knowledge on Apache Spark RDD, Dataframes API and some tips and tricks. If I combine everything into one then it would be a very lengthy article. Therefore I am dividing the long article into three separate articles and this article is the first series in that continuation. Spark RDD API Dataframe API Tips and tricks on RDD API and Dataframe API. Let us start with basics of RDD API. Resilient Distributed Dataset(RDD) is, essentially, the Spark representation of a set of data, spread across multiple machines, with APIs to let you act on it. RDD could […]

Analytics, Bigdata, Framework, Hadoop

Sumo Logic : Log Management Tool

This is my first face off with “Sumo Logic”. If you want a quick introduction on “Sumo Logic”, this topic will be helpful without going into details documentation. Sumo Logic designed to help you manage and analyze your log files. It has started out attempting to be a SaaS version of Splunk and have gone their own way as matured, but as a result of their beginnings, it is one of the most feature-rich and enterprise-focused SaaS log management tools. Installation: Sumo Logic is a SaaS model, which means you’ll be setting up a communication out to the Sumo Logic […]

Analytics, Bigdata, Framework, Hadoop, RHadoop

Install and smoketest R and RHadoop on Hortonworks Data Platform (HDP25-CentOS7)

Before going to Installation steps i’d like to give a small introduction on RHADOOP. What is RHadoop? RHadoop is an open source project for combine R and Hadoop together. It contains 4 different packages to combine different project from Hadoop and 1 package to enhance some functions to fit MapReduce framework. rhdfs: Combine Hadoop’s HDFS with R. rhbase: Combine Hadoop’s HBase with R. rmr2: Combine Hadoop’s MapReduce 2 with R. ravro: Combine Hadoop’s Avro with R. plyrmr: Provides a familiar plyr-like interface with MapReduce. You can reference the official GitHub of RHadoop: https://github.com/RevolutionAnalytics/RHadoop Requirements First at all, I have installed HDP2.5 […]

Framework, Python

Almost Everything in Python!!!

A curated list of Python frameworks, libraries, software and resources. Inspired by awesome-php. Awesome Python Environment Management Package Management Package Repositories Distribution Build Tools Interactive Interpreter Files Date and Time Text Processing Specific Formats Processing Natural Language Processing Documentation Configuration Command-line Tools Downloader Imagery OCR Audio Video Geolocation HTTP Database Database Drivers ORM Web Frameworks Serverless Frameworks Permissions CMS E-commerce RESTful API Serialization Authentication Template Engine Queue Search News Feed Asset Management Caching Email Internationalization URL Manipulation HTML Manipulation Web Crawling Web Content Extracting Forms Data Validation Anti-spam Tagging Admin Panels Static Site Generator Processes Concurrency and Parallelism Networking WebSocket […]