Kevin Risden

Posted on Mar 26, 2018 • Originally published at risdenk.github.io on Mar 24, 2018

Apache Livy - Simplified Apache Spark Integration

#bigdata #apache #livy #spark

Overview

Apache Livy provides a REST interface for interacting with Apache Spark. Prior to Livy, Apache Spark typically required running spark-submit from the command line or required tools to run spark-submit. This was not feasible in many situations and made security around Spark hard.

Apache Livy History

Cloudera originally built Livy to solve these problems by providing an interface by which Spark jobs can be submitted and monitored easily. Hortonworks decided to support and improve Livy as indicated here and here. Livy to the Apache Software Foundation and is in the incubator process currently. Many other companies and tools have started using Apache Livy as an integration point for interacting with Apache Spark. Outlined below is an example of what Apache Livy enables.

Apache Livy Architecture

Integration with Apache Livy

As diagramed above, Apache Livy integrates with many different tools to enable users to quickly and securely use Apache Spark. Microsoft with Azure HDInsight supports Apache Livy for connecting to Spark clusters. Jupyter Notebook, an open source web based notebook, can use Livy with sparkmagic to interact with Spark. Another web based notebook solution, Apache Zeppelin integrates natively with Livy. Anaconda, which supports both Jupyter and Apache Zeppelin, works with Livy (video) as well. Recently Apache NiFi added support for submitting Spark jobs via Livy. Finally, Apache Knox can provide LDAP authentication in front of Apache Livy.

All of the integrations above make it easier to use Apache Spark without requiring spark-submit due to Apache Livy. Building on top of Apache Livy provides a great abstraction to not worry about where the Spark job will be run.

What is next?

Over the past year, I have been working with my team and multiple analytics teams to simplify the experience of getting started and using Apache Spark. Apache Livy provides the capabitilies necessary to do this without compromising on ease of use or security. Since much of the documentation for Apache Spark revolves around spark-submit, I have been looking into converting those examples to work with Apache Livy.

DEV Community