Struggling to conquer Apache Spark?

Learning is hard enough as it is but when you bring in distributed computing frameworks in sophisticated programming languages - things don't get any easier. While self-study can certainly help, without a good guide, things are always more difficult than they should be. That's why I created Spark Tutorials, to make it easier to learn and use Apache Spark.

SparkTutorials.net is here to provide simple, easy to follow tutorials to help you get up and running quickly. You'll learn the foundational abstractions in Apache Spark from RDDs to DataFrames and MLLib. Start off with some of the articles below.

The Simplest Explanation of and Approaches to Optimizing Spark Shuffles

This post will dive into some of the details of the Spark Shuffle and what it means for you while using Apache Spark to perform your data analysis in a cluster setting.

Visit Article »

Spark Will Not Start with Spark Error-java.net.BindException: Address already in use

This article will walk you through how to resolve the somewhat common java.net.BindException: Address already in use exception that can occur when you're trying to start Spark.

Visit Article »

Building Spark for your Cluster to Support Hive SQL and YARN

This article will walk you through how to build Apache Spark to support the HIVE SQL execution engine as well as YARN. After that it should be ready to get up and running on your hadoop cluster.

Visit Article »