<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Michael Schuster</title>
    <description>The latest articles on DEV Community by Michael Schuster (@schustmi).</description>
    <link>https://dev.to/schustmi</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F754714%2F33f851e9-4dc4-4dca-b2a7-27fb55c54f89.png</url>
      <title>DEV Community: Michael Schuster</title>
      <link>https://dev.to/schustmi</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/schustmi"/>
    <language>en</language>
    <item>
      <title>How we track our todo comments using GitHub Actions</title>
      <dc:creator>Michael Schuster</dc:creator>
      <pubDate>Wed, 01 Dec 2021 12:37:43 +0000</pubDate>
      <link>https://dev.to/schustmi/how-we-track-our-todo-comments-using-github-actions-2bei</link>
      <guid>https://dev.to/schustmi/how-we-track-our-todo-comments-using-github-actions-2bei</guid>
      <description>&lt;p&gt;If you're a software developer, you're probably familiar with the following scenario: You're working on a new feature or trying to fix a bug, and while reading through some code existing code you notice that there's a nicer way to write it, or maybe a potential edge case isn't handled.&lt;br&gt;
But where to go from here? Write a todo comment and let your future self handle it of course!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zoNlYZl8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/aqveaauub88s9ddp7uch.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zoNlYZl8--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/aqveaauub88s9ddp7uch.jpg" alt="Problems for future me" width="702" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;While this might not be the optimal solution, I still regularly use todo comments if the fix is too complicated to implement right away as I find it can get quite distracting to repeatedly switch to my browser and create an issue with a meaningful description.&lt;/p&gt;
&lt;h2&gt;
  
  
  How to keep todo comments in sync with Jira issues
&lt;/h2&gt;

&lt;p&gt;This however brings a problem with it: these todos are separated from our Jira board so we did not take them into account when planning our sprints. &lt;br&gt;
Keeping the comments in code in sync with our Jira issues manually would require a considerable amount of effort. We would have to periodically go over the entire codebase and create issues for new todos as well as delete issues and todos if their counterpart was removed.&lt;br&gt;
Instead, we looked at multiple GitHub integrations in the Jira marketplace but couldn't find an existing solution with similar features, so we decided to implement a GitHub Action that helps us track todos automatically.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T5m7ixGd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vnmn5jv2ggx3r0xv0s2m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T5m7ixGd--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/vnmn5jv2ggx3r0xv0s2m.png" alt="GitHub Action" width="880" height="287"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  GitHub Actions to the rescue
&lt;/h2&gt;

&lt;p&gt;Each time something is pushed to the main branch, a GitHub workflow is triggered which simply calls a python script to do the heavy lifting. &lt;br&gt;
The script itself uses the following regular expression to find todo comments in our python files:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;pattern&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;r&lt;/span&gt;&lt;span class="s"&gt;"(^[ \t]*#) TODO ?\[(LOWEST|LOW|MEDIUM|HIGH|HIGHEST|[A-Z]*?-[0-9]*?)\]:(.*$\n(\1 {2}.*$\n)*)"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Don't worry I won't bore you with the details of how this expression works, but it essentially means that our todo comments have to conform to a certain syntax (a comment starting with a capital TODO followed by a priority in square brackets and a colon) in order for the script to detect them.&lt;br&gt;
Once all syntactically correct todos are found, they are processed as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;&lt;strong&gt;Create issues for new todos:&lt;/strong&gt; Each time new code gets merged into the main branch of our repository, our script detects all new todos and creates Jira issues with the specified priority and description. The created issues include a github link to the actual comment for more context and are tagged with a separate label so we can quickly find them later. Additionally, we modify the comments to include a reference to the created issue which is not only used to avoid creating duplicated issues but also comes in quite handy if you come across a comment and want to for example check if there's already someone working on it.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# before
# TODO [HIGH]: Do something very important here
&lt;/span&gt;
&lt;span class="c1"&gt;# after
# TODO [ENG-123]: Do something very important here
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Delete todos for closed issues:&lt;/strong&gt; Our codebase is evolving quite quickly at the moment and we closed some obsolete issues from time to time. To automatically keep the todo comments and issues in sync, the script also deletes todo comments when the corresponding issue was closed.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Tag issues when a todo is deleted:&lt;/strong&gt; Now there is just one case left to handle: what if a todo comment gets deleted and the corresponding issue is still open? We decided to handle this with caution and not close the issue automatically to guard against accidentally deleted comments. Instead, our script adds a separate label to these "orphan" issues so we can easily discuss whether they should actually be closed during our planning meetings. &lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If you're interested in more details or having something similar in your projects, check out the &lt;a href="https://github.com/zenml-io/zenml/blob/f5e7f688e102db80d87a6d4ba4513fcff84a242d/scripts/update_todos.py"&gt;script&lt;/a&gt; and the accompanying &lt;a href="https://github.com/zenml-io/zenml/blob/f5e7f688e102db80d87a6d4ba4513fcff84a242d/.github/workflows/update_todos.yml"&gt;GitHub workflow&lt;/a&gt;.  &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Michael Schuster is a Machine Learning Engineer at ZenML.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>python</category>
      <category>tooling</category>
      <category>github</category>
      <category>showdev</category>
    </item>
    <item>
      <title>Introducing the revamped ZenML 0.5.x</title>
      <dc:creator>Michael Schuster</dc:creator>
      <pubDate>Wed, 17 Nov 2021 14:03:32 +0000</pubDate>
      <link>https://dev.to/schustmi/introducing-the-revamped-zenml-05x-22ka</link>
      <guid>https://dev.to/schustmi/introducing-the-revamped-zenml-05x-22ka</guid>
      <description>&lt;p&gt;We've been hard at work for the last few months to finalize the 0.5.0 release and we're super excited to finally share some details regarding this all-new ZenML version with you!&lt;/p&gt;

&lt;p&gt;We'll go over the main new features in this blog post but if you're looking for a detailed list make sure to take a look at our &lt;a href="https://github.com/zenml-io/zenml/blob/main/RELEASE_NOTES.md"&gt;release notes&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Completely reworked API
&lt;/h2&gt;

&lt;p&gt;If you're familiar with previous versions of ZenML, you'll be in for a huge surprise. &lt;br&gt;
No more tedious subclassing for every step in your machine learning pipeline, the new ZenML functional API allows you to simply decorate your existing functions in order to run them in a ZenML pipeline.&lt;br&gt;
As long as the inputs and outputs of your functions are part of the continuously expanding set of supported datatypes, ZenML automatically takes care of serializing and deserializing your step outputs.&lt;br&gt;
And if a datatype is currently not supported, ZenML enables you to easily create a custom &lt;a href="https://docs.zenml.io/framework-design#using-materializers-to-abstract-away-serialization-and-deserialization-logic"&gt;materializer&lt;/a&gt; to run your code anyway.&lt;/p&gt;

&lt;p&gt;Let's take a look at a simple step that normalizes images for training to see how the new API looks in practice:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="s"&gt;"""Normalize images so the values are between 0 and 1."""&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mf"&gt;255.0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Notice the &lt;code&gt;@step&lt;/code&gt; above the normalization function? That's all that was needed to transform this into a ZenML step that can be used in all your pipelines.&lt;br&gt;
Now all that's left to do is creating a pipeline that uses this step and running it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;step&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="p"&gt;...&lt;/span&gt;

&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;load_and_normalize_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;load_data_step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;normalize_step&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="c1"&gt;# Connect the inputs and outputs of our pipeline steps
&lt;/span&gt;    &lt;span class="n"&gt;images&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;load_data_step&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;normalize_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;images&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Create and run our pipeline
&lt;/span&gt;&lt;span class="n"&gt;load_and_normalize_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="n"&gt;normalize&lt;/span&gt;&lt;span class="p"&gt;()).&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Our &lt;a href="https://docs.zenml.io/quickstart-guide"&gt;quickstart&lt;/a&gt; and &lt;a href="https://docs.zenml.io/guides/low-level-api"&gt;low-level guide&lt;/a&gt; are the perfect place if you want to learn more about our new API.&lt;/p&gt;

&lt;h2&gt;
  
  
  Stacks
&lt;/h2&gt;

&lt;p&gt;Stacks are one of ZenMLs new &lt;a href="https://docs.zenml.io/core-concepts"&gt;core concepts&lt;/a&gt;. A stack consists of three components that define where to store data and run ZenML pipelines:  &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A metadata store: Stores metadata like pipeline names and parameters used to execute steps of a pipeline.&lt;/li&gt;
&lt;li&gt;An artifact store: Stores output data of all steps executed as part of a pipeline.&lt;/li&gt;
&lt;li&gt;An orchestrator: Executes a pipeline locally or in a cloud environment.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The diagrams below show two exemplary stacks and their components:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s---kDFQoL3--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/j0g6l5ielshh7iamb38t.png" alt="Development and production stack" width="880" height="498"&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;em&gt;Figure 1: Example stacks for local development (left) and production using Apache Airflow and GCP (right)&lt;/em&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;While the development stack uses your local machine to execute pipelines and store data, the production stack runs pipelines using Apache Airflow and stores their resulting data in GCP.&lt;br&gt;
In future versions of ZenML we will integrate many popular tools for each of these components so you can easily create stacks that match your requirements.&lt;/p&gt;

&lt;p&gt;After setting up multiple stacks for development and production, it is as easy as calling&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;  zenml stack set production_stack
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;to switch from executing pipelines locally to running them in the cloud!&lt;br&gt;
Check out our &lt;a href="https://docs.zenml.io/guides/low-level-api"&gt;low-level guide&lt;/a&gt; to learn more about the remaining core concepts or skip straight to &lt;a href="https://docs.zenml.io/guides/low-level-api/chapter-7"&gt;chapter 7&lt;/a&gt; to see the magic of stacks in action. &lt;/p&gt;
&lt;h2&gt;
  
  
  New post-execution workflow
&lt;/h2&gt;

&lt;p&gt;Inspecting and comparing pipelines after they were executed is an essential part of working with machine learning pipelines.&lt;br&gt;
That is why we've added a completely new &lt;a href="(https://docs.zenml.io/guides/post-execution-workflow)"&gt;post-execution workflow&lt;/a&gt; that allows you to easily &lt;strong&gt;query metadata&lt;/strong&gt; like the parameters used to execute a step and &lt;strong&gt;read artifact data&lt;/strong&gt; like the evaluation accuracy of your model.&lt;br&gt;
This is how it works:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Get a pipeline from our ZenML repository
&lt;/span&gt;&lt;span class="n"&gt;pipeline&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;Repository&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="n"&gt;get_pipeline&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;pipeline_name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"my_pipeline"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Get the latest run of our pipeline
&lt;/span&gt;&lt;span class="n"&gt;pipeline_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;runs&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# Get a specific step of the pipeline run
&lt;/span&gt;&lt;span class="n"&gt;evaluation_step&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;pipeline_run&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get_step&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s"&gt;"evaluation_step"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Use the step parameters or outputs
&lt;/span&gt;&lt;span class="n"&gt;class_weights&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evaluation_step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;parameters&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="s"&gt;"class_weights"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;evaluation_accuracy&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;evaluation_step&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;read&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In future versions, this will be the basis on which we will build visualizations that allow you to easily compare different runs of a pipeline, catch data drift and so much more!&lt;/p&gt;

&lt;h2&gt;
  
  
  Type hints
&lt;/h2&gt;

&lt;p&gt;Starting with version 0.5.1, ZenML now has type hints for the entire code base! &lt;br&gt;
Apart from helping us make the codebase more robust, type hints in combination with unit tests allow us to implement new features and integrations quickly and confidently.&lt;br&gt;
Type hints also &lt;strong&gt;increase code comprehensibility&lt;/strong&gt; and &lt;strong&gt;improve autocompletion&lt;/strong&gt; in many places so working with ZenML is now even easier and quicker!&lt;/p&gt;

&lt;h2&gt;
  
  
  What lies ahead
&lt;/h2&gt;

&lt;p&gt;It has been a huge undertaking to rework the entire ZenML API but we're super happy with how it turned out (join our &lt;a href="https://zenml.io/slack-invite/"&gt;Slack&lt;/a&gt; to let us know if you agree or have some suggestions on how to improve it)!&lt;/p&gt;

&lt;p&gt;There are however a few features that are still missing from previous versions of ZenML, but now that we have a solid foundation to work on it should be a quick process to reintegrate them. So keep your eyes open for future releases and make sure to &lt;a href="https://github.com/zenml-io/zenml/discussions/categories/roadmap"&gt;vote&lt;/a&gt; on your favorite feature of our &lt;a href="https://zenml.io/roadmap"&gt;roadmap&lt;/a&gt; to make sure it gets implemented as soon as possible.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Michael Schuster is a Machine Learning Engineer at ZenML.&lt;/em&gt;&lt;/p&gt;

</description>
      <category>mlops</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>pipelines</category>
    </item>
  </channel>
</rss>
