DEV Community

James McPherson
James McPherson

Posted on

2

How I started learning Apache Spark

I've realised over the years that the best way for me to start learning a new language, toolkit or technology is to dive right in and start trying to solve problems with it.

This is most definitely true for Apache Spark, which I had to do recently in order to prepare for a #DataScience interview.

I wrote a utility to Extract information from my 6+ years of PV Inverter data, Transform it and Load it (#ETL) into #DataFrames which I query for record dates, minimum and maximum output as well as daily average output. Keeping with my standard practice, I've put that code on GitHub, and written a blog post about the process. See more (much more!) at https://www.jmcpdotcom.com/blog/posts/2019-10-11-apache-spark-init/

Apache #Spark, #ETL, #Python

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Billboard image

The Next Generation Developer Platform

Coherence is the first Platform-as-a-Service you can control. Unlike "black-box" platforms that are opinionated about the infra you can deploy, Coherence is powered by CNC, the open-source IaC framework, which offers limitless customization.

Learn more