Course Review: Getting Started with Spark 2

This is a review of Janani Ravi’s Pluralsight course entitled Getting Started with Spark 2.

I think this course was a good length and includes a lot of quality content. Janani presents everything in Python but I was able to follow along and do similar examples in Scala. The entire course length is just over 2 hours, which I think is fine for a “getting started” type of course.

This course contains a great overview of Spark 2 and how it differs from Spark 1, including the expansion of DataFrames, the combination of Spark contexts into a single SparkSession, and the Tungsten and Catalyst engines. Those two, in particular, were fascinating topics. I wish Janini had a chance to dig into them further, though I suppose that probably would not have been in the scope of a single course.

The use of Jupyter Notebooks for demos was a smart idea. Spark doesn’t have a “natural” UI and Jupyter lets you interact smoothly with the Python kernel and Spark cluster. I’m more used to Apache Zeppelin for Spark notebooks, so it was nice to see Jupyter in use here.

The one thing I regret about the course is the overuse of SnagIt-style boxes and circles. Showing something with a rectangle makes some sense when there’s a lot on the screen and you need to direct the user to a particular element. I think there was an over-use of this functionality, as we would regularly see Janini type something in and then immediately see a rectangle or circle around it. I think this was overkill and was more distracting than illuminating.

Despite that nitpick, if you are in the market for some introductory content on Spark and don’t mind working in Python, this is a great course to review.