<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shad Amez</title>
    <description>The latest articles on DEV Community by Shad Amez (@shadamez).</description>
    <link>https://dev.to/shadamez</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F290417%2F1b5beba1-b672-426c-8232-b80860e4f88d.jpg</url>
      <title>DEV Community: Shad Amez</title>
      <link>https://dev.to/shadamez</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/shadamez"/>
    <language>en</language>
    <item>
      <title>Heapland 0.2.0 released</title>
      <dc:creator>Shad Amez</dc:creator>
      <pubDate>Mon, 25 Apr 2022 11:16:53 +0000</pubDate>
      <link>https://dev.to/shadamez/heapland-020-released-1e87</link>
      <guid>https://dev.to/shadamez/heapland-020-released-1e87</guid>
      <description>&lt;p&gt;You can now connect to multiple Kafka cluster, view brokers and topics and much more with the latest release of Heapland.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/heapland/heapland"&gt;Heapland&lt;/a&gt; is an open source project, that brings a single interface for different data services. This release of Heapland, adds support for managing multiple Kafka clusters, and ability to view brokers, topics and messages.&lt;/p&gt;

&lt;h2&gt;
  
  
  Add a Kafka Cluster
&lt;/h2&gt;

&lt;p&gt;Simply click on the add kafka connection to setup the service as shown below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VOEtnXiL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/khrok37phk0r6kumwxnj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VOEtnXiL--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/khrok37phk0r6kumwxnj.png" alt="Image description" width="880" height="504"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Browse brokers, topics and messages
&lt;/h2&gt;

&lt;p&gt;Once connected with the cluster, you can view the brokers, topics and messages.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--fp3eNwuN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2etb8la5lck55p4pkhzb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--fp3eNwuN--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2etb8la5lck55p4pkhzb.png" alt="Image description" width="880" height="390"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;To view the messages and partitions of a topic, click on the topic link.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sTWDGpo0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y4hhsouiiaea2nrl5i95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sTWDGpo0--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/y4hhsouiiaea2nrl5i95.png" alt="Image description" width="880" height="358"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;You can also view — partitions, configurations of a topic and the consumer groups.&lt;/p&gt;

&lt;p&gt;Checkout the &lt;a href="https://github.com/heapland/heapland"&gt;Github&lt;/a&gt; repository to learn more.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>kafka</category>
      <category>database</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Introducing Heapland - Universal interface for data services</title>
      <dc:creator>Shad Amez</dc:creator>
      <pubDate>Sat, 16 Apr 2022 08:24:51 +0000</pubDate>
      <link>https://dev.to/shadamez/introducing-heapland-universal-interface-for-data-services-47bj</link>
      <guid>https://dev.to/shadamez/introducing-heapland-universal-interface-for-data-services-47bj</guid>
      <description>&lt;h2&gt;
  
  
  The Problem
&lt;/h2&gt;

&lt;p&gt;We as backend developers, must have faced situations, where the data we're working with, is stored in all over the place like -  VMs, databases, file systems, object storages and messaging infrastructure and logs. &lt;/p&gt;

&lt;h2&gt;
  
  
  The Solution
&lt;/h2&gt;

&lt;p&gt;To clean up this mess, I created and open sourced &lt;a href="https://github.com/heapland/heapland"&gt;Heapland&lt;/a&gt; to provide an unified interface to browse file system, query databases and watch message streams.&lt;/p&gt;

&lt;p&gt;The first release (v0.1.0) gives you the following features&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Browse, upload, delete files in Amazon S3&lt;/li&gt;
&lt;li&gt;Browse tables, save and execute queries against popular databases - MySQL, Postgres and MariaDB.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Head over to the &lt;a href="https://github.com/heapland/heapland"&gt;Github&lt;/a&gt; repository to know more.&lt;/p&gt;

</description>
      <category>opensource</category>
      <category>devops</category>
      <category>cloud</category>
      <category>productivity</category>
    </item>
    <item>
      <title>3 Reasons why we need an Open Source Data Infrastructure Platform</title>
      <dc:creator>Shad Amez</dc:creator>
      <pubDate>Mon, 07 Mar 2022 09:58:10 +0000</pubDate>
      <link>https://dev.to/shadamez/3-reasons-why-we-need-an-open-source-data-infrastructure-platform-3khp</link>
      <guid>https://dev.to/shadamez/3-reasons-why-we-need-an-open-source-data-infrastructure-platform-3khp</guid>
      <description>&lt;blockquote&gt;
&lt;p&gt;TL;DR Speeding up the setup, commoditising and enhancing the developer experience of the data infrastructure is the need of the hour, and open sourcing &lt;a href="https://gigahex.com"&gt;Gigahex&lt;/a&gt; is a first step towards this.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Being in the Big Data industry for more than a decade has made me realize that managing open source distributed systems is indeed a painful experience that forces you to have sleepless nights. But the cloud vendors like — AWS, GCP and Azure have come to rescue by offering managed services with some extra platform fee, generally paid per hour per compute instance. This seems reasonable, and large organizations with deep pockets may keep up with cloud bills, but many SMBs and research institutes may not have such funding to support their research work.&lt;/p&gt;

&lt;p&gt;I want to highlight the three main reasons why its time to build the Data Infrastructure Platform in &lt;strong&gt;open&lt;/strong&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Launch Data Infrastructure under 60 seconds
&lt;/h2&gt;

&lt;p&gt;We have been living in a world of super computers and Google, where we get answers to the most fascinating questions at the click of a button. But when it comes to setting up a development or testing environment for the data engineers, it takes hours or even days after exchanging multiple slack messages and email threads and escalations.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why can’t we get things up and running under 60 seconds?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Pay based on criticality of data application
&lt;/h2&gt;

&lt;p&gt;Open source software is free, but deploying and managing is extremely costly and time consuming. Cloud vendors have provided managed services for most of the popular data services — Databricks, AWS EMR, GCP Dataproc, Azure Analytics and few others.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Why is there not an established open source alternative that provides end-to-end solution for setting up data infrastructure and analytics engine?&lt;br&gt;
This gives the businesses to choose the right data platform, based on the need for speed and SLA for these services.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Stay sane in the world of multiple browser tabs
&lt;/h2&gt;

&lt;p&gt;Data Engineers have been constantly mastering the skill of Cmd+Tab / Win + Tab in order to find the right window which can help them find why a job failed, lost executor, session terminated, received OOM error. Is it the application or infrastructure issue?&lt;br&gt;
As data applications are tightly coupled to the infrastructure, so each data engineer also needs to be good at Data Ops. This brings them to the world of total chaos, demanding them to jump from tab to tab, mail to slack, slack to Zoom and finally they demand to bring Friday earlier :)&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;So why can’t we have an open source data platform to marry the data infrastructure to the data applications?&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  The new Gang in the Open source street
&lt;/h2&gt;

&lt;p&gt;Gigahex is making a debut in the world of Open source, to solve the above issues. The first release enables developers to launch Apache Spark, Kafka and Hadoop single node clusters on your local machine.&lt;/p&gt;

&lt;p&gt;Give it a &lt;a href="https://gigahex.com"&gt;try&lt;/a&gt; and let us know your &lt;a href="https://github.com/GigahexHQ/gigahex/discussions/categories/ideas"&gt;feedback&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>devops</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Why are we building DevOps platform for Big Data?</title>
      <dc:creator>Shad Amez</dc:creator>
      <pubDate>Fri, 10 Jul 2020 05:53:12 +0000</pubDate>
      <link>https://dev.to/shadamez/why-are-we-building-devops-platform-for-big-data-4ke8</link>
      <guid>https://dev.to/shadamez/why-are-we-building-devops-platform-for-big-data-4ke8</guid>
      <description>&lt;p&gt;&lt;strong&gt;Statutory warning:&lt;/strong&gt; Staring at screen for long hours to identify bugs is not good for eyes. It's better to build software to find bugs.&lt;/p&gt;

&lt;h2&gt;
  
  
  Typical dev story
&lt;/h2&gt;

&lt;p&gt;If you are like me, who has spent hours looking for bugs in the log statement or finding a smart reason to explain the failure of a long running job like Spark in production, then you must read on.&lt;/p&gt;

&lt;p&gt;We live in a world where things can go wrong at an unexpectable time, and it's acceptable, but what is not acceptable is not knowing the reason behind it. Giving reasons like, the job failed because it ran out of memory is not enough. And hence, adding more disk, more ram or more CPU is not always the right answer. Getting the right answer should not be difficult, as the application consuming the memory is not a black box, but just leveraging another open source tool.&lt;/p&gt;

&lt;p&gt;But guess what, quite often it is difficult, inspite of the source code being open. Lot of times, we are in the fire-fighting mode and we are unable to get answers in few minutes which could have helped critical business operations and saved the lovely evening for something special. And when we do find the root cause and fix the bug, its like party time. Time to relax and chill and have some pizza or a Biryani ( a higly seasoned rice dish).&lt;/p&gt;

&lt;p&gt;Hey! Hold on for a second. Why can't we just track the job's progress as we track the status of our biryani order. It must be straight forward. ß&lt;/p&gt;

&lt;h2&gt;
  
  
  Time to build a &lt;a href="https://gigahex.com"&gt;Dev-Ops platform&lt;/a&gt; on steroids
&lt;/h2&gt;

&lt;p&gt;So we, where we = &lt;a href="https://www.linkedin.com/in/shadamez/"&gt;myself&lt;/a&gt; and my &lt;a href="https://www.linkedin.com/in/ashayesta/"&gt;co-founder + life partner&lt;/a&gt; decided to use my programming and her UI designing chops to build one stop Dev-Ops platform for Big Data with great aesthetics. But there are already so many deployment, monitoring and logging services out there. So why not just combine these pieces to get going.&lt;/p&gt;

&lt;p&gt;Well, I am not really a big fan of having to manage too many services for doing one thing. Apart from that, building intelligence into these segmented services, brings its own set of challenges. Finally the team ends up spending considerable time maintaining each of these services independently. Why not just use one platform or let the platform take care of making these independent services work together. This platform is what we are building, so that you focus on development and we manage the dependent services like CI/CD, secrets manager, configuration store, performance monitoring, log management and Big Data clusters.&lt;/p&gt;

&lt;p&gt;Are you still there?&lt;/p&gt;

&lt;p&gt;Yes? Great! Patience is the key.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being responsible
&lt;/h2&gt;

&lt;p&gt;So were these reasons enough to push me to become a full stack co-founder from a Spark developer. Obviously not.&lt;/p&gt;

&lt;p&gt;I would like to take the responsibility of every penny spent on these massive clusters, running analytics jobs. And this was the most important reason to bootstrap this project so that each developer can know how much of resources their job is using and eliminate the wastage all together. We both are hell bent on eliminating the wastage of clusters and save cost for all the enterprises.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you can't measure, you can't manage - Marissa Mayer&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;So measurement is the key, which drives the motivation behind the Gigahex platform.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;If you can't manage, someone might loose their job - Me&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Fast setup - under 60 seconds
&lt;/h2&gt;

&lt;p&gt;Integrating with other tools have been quite time consuming, if not a nightmare. One of the benchmark that I have stick to, is setting up all from scratch under 60 seconds. No more downloading binaries and installing agents on your cluster for basic logs and metrics. Just one binary, at one place, with one command and one dashboard, you should be able to find answers to hidden questions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Being billioncare not billionaire
&lt;/h2&gt;

&lt;p&gt;We aspire to become billioncare - who genuinely care about saving billion minutes spent on running massive clusters worth of billion dollars for no special reason.&lt;/p&gt;

&lt;h2&gt;
  
  
  Let's talk
&lt;/h2&gt;

&lt;p&gt;This platform would be incomplete without your valuable suggestion and ideas. We would love to hear more about the challenges you are facing while developing and running Big Data applications in production. Just shoot an email at [shad][at][gigahex.com] to spark off the discussion.&lt;/p&gt;

</description>
      <category>spark</category>
      <category>devops</category>
      <category>bigdata</category>
      <category>motivation</category>
    </item>
    <item>
      <title>spark-submit command builder with live preview</title>
      <dc:creator>Shad Amez</dc:creator>
      <pubDate>Sun, 12 Jan 2020 03:53:28 +0000</pubDate>
      <link>https://dev.to/shadamez/spark-submit-command-builder-with-live-preview-31ic</link>
      <guid>https://dev.to/shadamez/spark-submit-command-builder-with-live-preview-31ic</guid>
      <description>&lt;p&gt;As a spark developer, you might need to add numerous configuration parameters to run your Apache Spark application with optimal settings. If you look at the number of &lt;a href="https://spark.apache.org/docs/latest/configuration.html"&gt;configuration&lt;/a&gt; options available in the spark-submit command, you would definitely appreciate, the kind of optimisations you could do. &lt;/p&gt;

&lt;p&gt;There is a simple &lt;a href="https://gigahex.com/tools/spark-submit"&gt;tool&lt;/a&gt;, just to build this spark-submit command at &lt;a href="https://gigahex.com"&gt;Gigahex&lt;/a&gt;. Gigahex is an upcoming platform for monitoring and receiving alerts for Spark based application.&lt;/p&gt;

&lt;p&gt;Here's the video tutorial.&lt;/p&gt;

&lt;p&gt;&lt;iframe width="710" height="399" src="https://www.youtube.com/embed/lYixQA3sgyY"&gt;
&lt;/iframe&gt;
&lt;/p&gt;

</description>
      <category>spark</category>
      <category>bigdata</category>
      <category>productivity</category>
      <category>scala</category>
    </item>
  </channel>
</rss>
