Three things from today - 8/30

#devjournal #spark #kubernetes

8/30/2019

Happy Friday!

I had a bit too much 🍷 yesterday so i'm a bit foggy, however that's not stopping me from learning today!

1. Spark S3 tweaking

Spark appears to use ParquetOutputCommitter - a function of Hadoop to write parquet files to S3.

Digging into an issue we've had writing Spark to S3 we came across a fix described here which involves setting a config value in hadoop:

Testing it - it appears to work.

I made a PR to set this going forward from the Hadoop / Hive side of things.

 hadoop_conf.set('spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version', '2')

This can also be set in Spark with the following property:

'spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version':2

2. Helm 3 beta 2 is released

The buzz around the office is that Helm 3 beta 2 has been released. Helm 3 is an important release for Helm as it removes the dependency on Tiller.
It is so important it seems to have warranted a 7 part blog series on their website

Also Microsoft has a great article describing why Helm 3 is important.

3. What is Apache Zeppelin? How is it different than Jupyter?

Apache Zeppelin and Jupyter are both interactive notebooks that you can use to do data science things like perform calculation, plot graphs, etc.

Jupyter notebooks run python in the background. Apache Zeppelin uses JVM underneath the hood.

As for features I enjoy, Jupyter is an offshoot of iPython, which I enjoy quite a bit for doing Python work.
Apache Zeppelin seems to be a little more robust for non-python languages, and also their demo is pretty sweet, being able to use their sweet Angular graph UI is pretty swell.

Check out these graphs!

4. Bonus thing!!

I have been promising myself i'm going to learn prometheus for too long. It's time to dig into the awesome-prometheus list....

I also created a new GKE cluster on my own Google Cloud account for testing.. Compared to Azure and AWS it's Kubernetes easy mode. More to come with monitoring and my new test cluster next week :)

Happy Labor day - seeya Tuesday!

DEV Community

Three things from today - 8/30

8/30/2019

Happy Friday!

1. Spark S3 tweaking

2. Helm 3 beta 2 is released

3. What is Apache Zeppelin? How is it different than Jupyter?

4. Bonus thing!!

Top comments (0)

Read next

Kubernetes Horizontal Pod Autoscaler

Service Internal Traffic Policy in Kubernetes: Enhancing Cluster Traffic Management

The Home Server Journey - 6: Your New Blogging Career

Prometheus Stack Components Usage in K8 Cluster using Helm