Struggling to conquer Apache Spark?

Learning is hard enough as it is but when you bring in distributed computing frameworks in sophisticated programming languages - things don't get any easier. While self-study can certainly help, without a good guide, things are always more difficult than they should be. That's why I created Spark Tutorials, to make it easier to learn and use Apache Spark. is here to provide simple, easy to follow tutorials to help you get up and running quickly. You'll learn the foundational abstractions in Apache Spark from RDDs to DataFrames and MLLib. Start off with some of the articles below.

Spark MLLib - Predict Store Sales with ML Pipelines

In this tutorial we're going to be doing a full-stack machine learning project. We're going all the way from data manipulation to feature creation and finally serving predictions.

Visit Article »

Spark Will Not Start with Spark Error-java.lang.OutOfMemoryError PermGen space

This article will walk you through how to resolve the java.lang.OutOfMemoryError: PermGen space exception that can occur when you're trying to start Spark.

Visit Article »

Analyzing Flight Data: A Gentle Introduction to GraphX in Spark

Graphs are a simple way of representing relationships in data and Apache Spark provides a simple way of creating and manipulating them. This tutorial will walk you through the basics of GraphX in Apache Spark using Scala. You'll analyze flight data from 2008 and run algorithms like PageRank to better understand all the flights that took place!

Visit Article »