DEV Community: Shad Amez

Heapland 0.2.0 released

Shad Amez — Mon, 25 Apr 2022 11:16:53 +0000

You can now connect to multiple Kafka cluster, view brokers and topics and much more with the latest release of Heapland.

Heapland is an open source project, that brings a single interface for different data services. This release of Heapland, adds support for managing multiple Kafka clusters, and ability to view brokers, topics and messages.

Add a Kafka Cluster

Simply click on the add kafka connection to setup the service as shown below.

Browse brokers, topics and messages

Once connected with the cluster, you can view the brokers, topics and messages.

To view the messages and partitions of a topic, click on the topic link.

You can also view — partitions, configurations of a topic and the consumer groups.

Checkout the Github repository to learn more.

Introducing Heapland - Universal interface for data services

Shad Amez — Sat, 16 Apr 2022 08:24:51 +0000

The Problem

We as backend developers, must have faced situations, where the data we're working with, is stored in all over the place like - VMs, databases, file systems, object storages and messaging infrastructure and logs.

The Solution

To clean up this mess, I created and open sourced Heapland to provide an unified interface to browse file system, query databases and watch message streams.

The first release (v0.1.0) gives you the following features

Browse, upload, delete files in Amazon S3
Browse tables, save and execute queries against popular databases - MySQL, Postgres and MariaDB.

Head over to the Github repository to know more.

3 Reasons why we need an Open Source Data Infrastructure Platform

Shad Amez — Mon, 07 Mar 2022 09:58:10 +0000

TL;DR Speeding up the setup, commoditising and enhancing the developer experience of the data infrastructure is the need of the hour, and open sourcing Gigahex is a first step towards this.

Being in the Big Data industry for more than a decade has made me realize that managing open source distributed systems is indeed a painful experience that forces you to have sleepless nights. But the cloud vendors like — AWS, GCP and Azure have come to rescue by offering managed services with some extra platform fee, generally paid per hour per compute instance. This seems reasonable, and large organizations with deep pockets may keep up with cloud bills, but many SMBs and research institutes may not have such funding to support their research work.

I want to highlight the three main reasons why its time to build the Data Infrastructure Platform in open.

Launch Data Infrastructure under 60 seconds

We have been living in a world of super computers and Google, where we get answers to the most fascinating questions at the click of a button. But when it comes to setting up a development or testing environment for the data engineers, it takes hours or even days after exchanging multiple slack messages and email threads and escalations.

Why can’t we get things up and running under 60 seconds?

Pay based on criticality of data application

Open source software is free, but deploying and managing is extremely costly and time consuming. Cloud vendors have provided managed services for most of the popular data services — Databricks, AWS EMR, GCP Dataproc, Azure Analytics and few others.

Why is there not an established open source alternative that provides end-to-end solution for setting up data infrastructure and analytics engine?
This gives the businesses to choose the right data platform, based on the need for speed and SLA for these services.

Stay sane in the world of multiple browser tabs

Data Engineers have been constantly mastering the skill of Cmd+Tab / Win + Tab in order to find the right window which can help them find why a job failed, lost executor, session terminated, received OOM error. Is it the application or infrastructure issue?
As data applications are tightly coupled to the infrastructure, so each data engineer also needs to be good at Data Ops. This brings them to the world of total chaos, demanding them to jump from tab to tab, mail to slack, slack to Zoom and finally they demand to bring Friday earlier :)

So why can’t we have an open source data platform to marry the data infrastructure to the data applications?

The new Gang in the Open source street

Gigahex is making a debut in the world of Open source, to solve the above issues. The first release enables developers to launch Apache Spark, Kafka and Hadoop single node clusters on your local machine.

Give it a try and let us know your feedback.

Why are we building DevOps platform for Big Data?

Shad Amez — Fri, 10 Jul 2020 05:53:12 +0000

Statutory warning: Staring at screen for long hours to identify bugs is not good for eyes. It's better to build software to find bugs.

Typical dev story

If you are like me, who has spent hours looking for bugs in the log statement or finding a smart reason to explain the failure of a long running job like Spark in production, then you must read on.

We live in a world where things can go wrong at an unexpectable time, and it's acceptable, but what is not acceptable is not knowing the reason behind it. Giving reasons like, the job failed because it ran out of memory is not enough. And hence, adding more disk, more ram or more CPU is not always the right answer. Getting the right answer should not be difficult, as the application consuming the memory is not a black box, but just leveraging another open source tool.

But guess what, quite often it is difficult, inspite of the source code being open. Lot of times, we are in the fire-fighting mode and we are unable to get answers in few minutes which could have helped critical business operations and saved the lovely evening for something special. And when we do find the root cause and fix the bug, its like party time. Time to relax and chill and have some pizza or a Biryani ( a higly seasoned rice dish).

Hey! Hold on for a second. Why can't we just track the job's progress as we track the status of our biryani order. It must be straight forward. ß

Time to build a Dev-Ops platform on steroids

So we, where we = myself and my co-founder + life partner decided to use my programming and her UI designing chops to build one stop Dev-Ops platform for Big Data with great aesthetics. But there are already so many deployment, monitoring and logging services out there. So why not just combine these pieces to get going.

Well, I am not really a big fan of having to manage too many services for doing one thing. Apart from that, building intelligence into these segmented services, brings its own set of challenges. Finally the team ends up spending considerable time maintaining each of these services independently. Why not just use one platform or let the platform take care of making these independent services work together. This platform is what we are building, so that you focus on development and we manage the dependent services like CI/CD, secrets manager, configuration store, performance monitoring, log management and Big Data clusters.

Are you still there?

Yes? Great! Patience is the key.

Being responsible

So were these reasons enough to push me to become a full stack co-founder from a Spark developer. Obviously not.

I would like to take the responsibility of every penny spent on these massive clusters, running analytics jobs. And this was the most important reason to bootstrap this project so that each developer can know how much of resources their job is using and eliminate the wastage all together. We both are hell bent on eliminating the wastage of clusters and save cost for all the enterprises.

If you can't measure, you can't manage - Marissa Mayer

So measurement is the key, which drives the motivation behind the Gigahex platform.

If you can't manage, someone might loose their job - Me

Fast setup - under 60 seconds

Integrating with other tools have been quite time consuming, if not a nightmare. One of the benchmark that I have stick to, is setting up all from scratch under 60 seconds. No more downloading binaries and installing agents on your cluster for basic logs and metrics. Just one binary, at one place, with one command and one dashboard, you should be able to find answers to hidden questions.

Being billioncare not billionaire

We aspire to become billioncare - who genuinely care about saving billion minutes spent on running massive clusters worth of billion dollars for no special reason.

Let's talk

This platform would be incomplete without your valuable suggestion and ideas. We would love to hear more about the challenges you are facing while developing and running Big Data applications in production. Just shoot an email at [shad][at][gigahex.com] to spark off the discussion.

spark-submit command builder with live preview

Shad Amez — Sun, 12 Jan 2020 03:53:28 +0000

As a spark developer, you might need to add numerous configuration parameters to run your Apache Spark application with optimal settings. If you look at the number of configuration options available in the spark-submit command, you would definitely appreciate, the kind of optimisations you could do.

There is a simple tool, just to build this spark-submit command at Gigahex. Gigahex is an upcoming platform for monitoring and receiving alerts for Spark based application.

Here's the video tutorial.