<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Jakub Krajniak</title>
    <description>The latest articles on DEV Community by Jakub Krajniak (@jkrajniak).</description>
    <link>https://dev.to/jkrajniak</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F324411%2Fb0f642ac-004d-4c4d-b965-f197245aeef9.jpg</url>
      <title>DEV Community: Jakub Krajniak</title>
      <link>https://dev.to/jkrajniak</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jkrajniak"/>
    <language>en</language>
    <item>
      <title>Convert Google Slides to PDF</title>
      <dc:creator>Jakub Krajniak</dc:creator>
      <pubDate>Thu, 19 Sep 2024 19:35:28 +0000</pubDate>
      <link>https://dev.to/jkrajniak/convert-google-slides-to-pdf-2jkd</link>
      <guid>https://dev.to/jkrajniak/convert-google-slides-to-pdf-2jkd</guid>
      <description>&lt;p&gt;How to convert Google Slides to PDF? Well, just open Google Slides, then click in menu &lt;strong&gt;File -&amp;gt; Downdload -&amp;gt; PDF Document&lt;/strong&gt;. That's all ;)&lt;/p&gt;




&lt;p&gt;This sounds easy, but what if you cannot open the slides in Google Slides? What if you only have the embedded interface, the one that is created when you click &lt;strong&gt;File -&amp;gt; Share -&amp;gt; Publish&lt;/strong&gt; to web? In fact, there is no easy solution. You cannot print; it just results in a blank, white page. There is also no download option from the interface.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl91lv1obvndcenl1q0l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fpl91lv1obvndcenl1q0l.png" alt="Example of the interface of embeded slides" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In this short article, I will show you how to solve this problem by using a simple Python script.&lt;/p&gt;

&lt;h2&gt;
  
  
  Ingredients
&lt;/h2&gt;

&lt;p&gt;If we cannot download the PDF from the slides or print it, we can still take screenshots ;) The only thing is that we don't want to do this manually. Instead, we'll just use the tool that is used to perform tests - &lt;a href="https://selenium-python.readthedocs.io/" rel="noopener noreferrer"&gt;Selenium&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Selenium is a tool that allows you to automate your web application tests. It uses webdrivers, which are headless versions of web browsers. The interactions with the web, like clicking, inputting text, etc., are coded using e.g., Python or other supported languages. Here, I will use Python.&lt;/p&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;The main function is &lt;code&gt;convert_slides_to_png&lt;/code&gt;, which takes the URL as input and stores the screenshots in the specified directory.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.chrome.service&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Service&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.chrome.options&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Options&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.common.by&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;By&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.common.keys&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Keys&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support.ui&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;WebDriverWait&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;selenium.webdriver.support&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;expected_conditions&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;EC&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;argparse&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ArgumentParser&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;convert_slides_to_png&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_prefix&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;service&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Service&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;/usr/local/bin/chromedriver&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with the actual path
&lt;/span&gt;    &lt;span class="n"&gt;options&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;Options&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--disable-search-engine-choice-screen&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;--headless=new&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Run in headless mode (no visible browser window)
&lt;/span&gt;    &lt;span class="n"&gt;driver&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;webdriver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Chrome&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;service&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;options&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_window_size&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1492&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1119&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# Replace with your desired URL
&lt;/span&gt;
    &lt;span class="nc"&gt;WebDriverWait&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;until&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EC&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;presence_of_element_located&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TAG_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;

    &lt;span class="c1"&gt;# Find the element to hide (replace with your element locator)
&lt;/span&gt;    &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;execute_script&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
        var elements = document.querySelectorAll(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.punch-viewer-navbar&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;);
        for (var i = 0; i &amp;lt; elements.length; i++) {
            elements[i].style.display = &lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;none&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;;
        }
    &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;get_page_from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;?&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;page_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;.&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;replace&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;p&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;page_id&lt;/span&gt;


    &lt;span class="n"&gt;body_element&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;find_element&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;By&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;TAG_NAME&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;body_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;END&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;new_url&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;current_url&lt;/span&gt;
    &lt;span class="n"&gt;end_page_num&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;get_page_from_url&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;new_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="n"&gt;body_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;HOME&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;end_page_num&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;file_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_prefix&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;_page_&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.png&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;
        &lt;span class="n"&gt;driver&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;save_screenshot&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Page &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; of &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;end_page_num&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; saved to &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;file_path&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;body_element&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_keys&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Keys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ARROW_RIGHT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Done&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, we actually don't click on the prev/next buttons; instead, we simulate pressing keys (HOME, ARROW_RIGHT, END) to interact with the Google Slides interface.&lt;/p&gt;

&lt;p&gt;Let's just add a simple CLI, and we are ready to get the screenshots.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ArgumentParser&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;description&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Convert Google Slides to PNG&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--url-file&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;File with urls of the Google Slides&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;add_argument&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;--output-dir&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;type&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nb"&gt;str&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;default&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;slides&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;help&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Output directory&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="n"&gt;args&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;parser&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;parse_args&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;url_file&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;urls&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;readlines&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; urls&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;urls&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Processing &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_prefix&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;strip&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;split&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;,&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="nf"&gt;convert_slides_to_png&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;file_prefix&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;args&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;output_dir&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The list of slides is provided as a CSV file with columns: &lt;code&gt;url&lt;/code&gt; and &lt;code&gt;file_prefix&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;https://docs.google.com/presentation/d/1rTqxnE6_IrTVF8xjYz_ejdrFWbgdjCJgVLD34PVgifw/embed;presentation1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To run the script, you need to&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;python3 cli.py &lt;span class="nt"&gt;--url-file&lt;/span&gt; urls.txt &lt;span class="nt"&gt;--output-dir&lt;/span&gt; slides
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This will read URLs from &lt;code&gt;urls.txt&lt;/code&gt; and output slides to &lt;code&gt;slides/&lt;/code&gt;, where each slide page will be prefixed by presentation1.&lt;/p&gt;




&lt;p&gt;Happy scrapping!&lt;/p&gt;




&lt;p&gt;If you liked the post, then you can &lt;a href="https://www.buymeacoffee.com/jkrajniak" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;. Thanks in advance.&lt;/p&gt;

</description>
      <category>googleslides</category>
      <category>pdf</category>
      <category>converter</category>
    </item>
    <item>
      <title>FOSDEM 2024: What You Need to Know</title>
      <dc:creator>Jakub Krajniak</dc:creator>
      <pubDate>Tue, 06 Feb 2024 16:22:23 +0000</pubDate>
      <link>https://dev.to/jkrajniak/fosdem-2024-what-you-need-to-know-2o1a</link>
      <guid>https://dev.to/jkrajniak/fosdem-2024-what-you-need-to-know-2o1a</guid>
      <description>&lt;p&gt;Just back from the bustling &lt;a href="https://fosdem.org/2024/" rel="noopener noreferrer"&gt;FOSDEM 2024&lt;/a&gt; in Brussels, here's some practical information to help you navigate this annual open-source conference effectively.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Be Prepared for Crowds&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;FOSDEM is free to attend, so expect large crowds. This means trams might be full, talks might have limited seating, and there might be queues for food and coffee. Consider it part of the open-source community's inclusivity.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Planning is Essential&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;With 61 concurrent tracks, planning your schedule in advance is crucial. While attending multiple tracks might seem appealing, focusing on a few will allow you to learn more efficiently and avoid rushing between buildings. Remember, the ULB campus is large, so factor in travel time.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Tips for Attendees&lt;/strong&gt;&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Plan effectively: Use the online schedule to create a focused schedule within 2-3 tracks you're interested in.&lt;/li&gt;
&lt;li&gt;Stay energized: Start your day with a substantial breakfast to keep you going through sessions and potential waiting times.&lt;/li&gt;
&lt;li&gt;Avoid coffee queues: Bring a reusable mug and your preferred coffee to skip the inevitable waiting line for coffee. Saving time means more learning and networking opportunities.&lt;/li&gt;
&lt;li&gt;Choose convenient accommodation: Consider staying near ULB, ideally with good public transport options or within walking distance. This minimizes travel time and maximizes conference time.&lt;/li&gt;
&lt;li&gt;Wear a mask: Due to the crowds, bringing and wearing a mask in common areas and talks is recommended, especially if you're concerned about illness. I've caught immediately a cold, nothing nice.&lt;/li&gt;
&lt;/ul&gt;




&lt;p&gt;So, was attending FOSDEM 2024 worth it? &lt;/p&gt;

&lt;p&gt;It depends.&lt;/p&gt;

&lt;p&gt;Sure, all the lectures are streamed online with live chat, so physically being there isn't strictly necessary. But, as with many online experiences, something gets lost. Being in the room forces you to focus, take notes by hand, and even participate. No laptop distractions, just pen and paper engagement.&lt;/p&gt;

&lt;p&gt;Then there's the social aspect. Connecting with fellow developers, project contributors, and potential collaborators is a huge part of the event, and that's something you miss online.&lt;/p&gt;

&lt;p&gt;Ultimately, the decision comes down to your priorities. If convenience and affordability are key, online attendance is great. But if you crave energy, engagement, and social connections, the in-person experience can be truly valuable.&lt;/p&gt;







&lt;p&gt;If you liked the post, then you can &lt;a href="https://www.buymeacoffee.com/jkrajniak" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;. Thanks in advance.&lt;/p&gt;

</description>
      <category>fosdem</category>
      <category>fosdem2024</category>
      <category>conference</category>
    </item>
    <item>
      <title>How to debug your LAMMPS simulation using CLion</title>
      <dc:creator>Jakub Krajniak</dc:creator>
      <pubDate>Sat, 06 Mar 2021 18:36:51 +0000</pubDate>
      <link>https://dev.to/jkrajniak/how-to-debug-your-lammps-simulation-using-clion-mmo</link>
      <guid>https://dev.to/jkrajniak/how-to-debug-your-lammps-simulation-using-clion-mmo</guid>
      <description>&lt;p&gt;From time to time I need to debug &lt;a href="https://lammps.sandia.gov/doc/Intro.html" rel="noopener noreferrer"&gt;LAMMPS&lt;/a&gt; code to find out why a simulation does not work as expected. The easiest way is to jump into the runtime using a debugger. Although &lt;code&gt;gdb&lt;/code&gt; does the job, it is not the most intuitive and easy to use the tool. &lt;/p&gt;

&lt;p&gt;In my daily work, whenever I can, I use JetBrains toolbox, for C++ programming - &lt;a href="https://www.jetbrains.com/clion/" rel="noopener noreferrer"&gt;CLion&lt;/a&gt;. It definitely simplifies work, allowing as well to easy debug the code.&lt;/p&gt;

&lt;p&gt;LAMMPS uses &lt;a href="https://cmake.org/" rel="noopener noreferrer"&gt;CMake&lt;/a&gt; to control the build process. Fortunately for us, CLion integrates with CMake projects very well.&lt;/p&gt;

&lt;p&gt;So let's start.&lt;/p&gt;

&lt;h1&gt;
  
  
  Setup project
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;Open CLion, if you end up in the Welcome Window then click on &lt;strong&gt;Open&lt;/strong&gt;
&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fimqvkrh3brbc1mzscqmz.png" alt="welcome-screen" width="" height=""&gt;
and select the root of LAMMPS directory.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5pajt8kpf1wiop3baro.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fk5pajt8kpf1wiop3baro.png" alt="select-project" width="" height=""&gt;&lt;/a&gt;&lt;br&gt;
If you end up in the main window with an already opened project, you can go to &lt;strong&gt;Open ➡️ File ➡️ Open...&lt;/strong&gt; and select the root of LAMMPS directory.&lt;/p&gt;

&lt;p&gt;2. To enable CMake project, open &lt;code&gt;cmake/CMakeLists.txt&lt;/code&gt;. You should see in the editor window on the top link to &lt;strong&gt;Load CMake project&lt;/strong&gt;, click on it.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flckvky2ub3iglnt0w5sl.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Flckvky2ub3iglnt0w5sl.png" alt="Alt Text" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That is it, now you can compile your LAMMPS code directly from CLion, run and debug.&lt;/p&gt;
&lt;h1&gt;
  
  
  Enable/disable different compilation options
&lt;/h1&gt;

&lt;p&gt;LAMMPS contains multiple packages, which enables various of capabilities. This modular design allows compiling the final binary executables with only these functions that are needed to simulate your system.&lt;br&gt;
You can enable/disable features and settings by &lt;em&gt;-D&lt;/em&gt; argument. The other, more convenient option, is to use &lt;a href="https://lammps.sandia.gov/doc/Build_package.html#cmake-presets" rel="noopener noreferrer"&gt;CMake presets&lt;/a&gt;, which contains predefined variables.&lt;/p&gt;

&lt;p&gt;The preset can be enabled from command line:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;build&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;build
&lt;span class="nv"&gt;$ &lt;/span&gt;cmake &lt;span class="nt"&gt;-C&lt;/span&gt; ../cmake/presets/minimal.cmake ../cmake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;and to enable multiple one at the same time&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;cmake &lt;span class="nt"&gt;-C&lt;/span&gt; ../cmake/presets/nolib.cmake &lt;span class="nt"&gt;-C&lt;/span&gt; ../cmake/presets/gcc.cmake &lt;span class="nt"&gt;-C&lt;/span&gt; ../cmake/presets/most.cmake  ../cmake
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can attach the same settings in the Clion project configuration by putting CMake command line option in &lt;em&gt;CMake options&lt;/em&gt; field.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltulrogl7ejdl7opx38u.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fltulrogl7ejdl7opx38u.png" alt="Alt Text" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Run your simulation script
&lt;/h1&gt;

&lt;p&gt;Now, when we set up our project, we can run our simulation script. &lt;/p&gt;

&lt;p&gt;The easiest way is to make a copy of an existing configuration and include the simulation script.&lt;br&gt;
Go to &lt;strong&gt;Run ➡️ Edit Configurations...&lt;/strong&gt;&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhf3o18yfhxwzb9laktz4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fhf3o18yfhxwzb9laktz4.png" alt="Alt Text" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the list of targets, search for &lt;strong&gt;lmp&lt;/strong&gt; and click on &lt;strong&gt;Copy&lt;/strong&gt; icon, change the name of the target.&lt;br&gt;
Next, set the &lt;strong&gt;Working directory&lt;/strong&gt; to the path where your simulation scripts are located. In the &lt;strong&gt;Program arguments&lt;/strong&gt; you can specify which simulation script to run, e.g. &lt;code&gt;-in in.lmp&lt;/code&gt; if your simulation script is in &lt;code&gt;in.lmp&lt;/code&gt; file. &lt;/p&gt;

&lt;h1&gt;
  
  
  Run debugger
&lt;/h1&gt;

&lt;p&gt;Running debugger is pretty simple. In the main window select the target to run, and click on the Debug icon.&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjliwg2dhesx5k14a8xvt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjliwg2dhesx5k14a8xvt.png" alt="Alt Text" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;I have guided you step by step on how to set up debugging for LAMMPS package. However, you can debug your &lt;a href="https://manual.gromacs.org/documentation/" rel="noopener noreferrer"&gt;GROMACS&lt;/a&gt; or &lt;a href="https://github.com/espressopp/espressopp" rel="noopener noreferrer"&gt;ESPResSo++&lt;/a&gt; simulations.&lt;/p&gt;




&lt;p&gt;Happy debugging!&lt;/p&gt;




&lt;p&gt;If you liked the post, then you can &lt;a href="https://www.buymeacoffee.com/jkrajniak" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;. Thanks in advance.&lt;/p&gt;

</description>
      <category>lammps</category>
      <category>debugging</category>
      <category>clion</category>
      <category>cpp</category>
    </item>
    <item>
      <title>Watch your stream</title>
      <dc:creator>Jakub Krajniak</dc:creator>
      <pubDate>Thu, 10 Sep 2020 15:55:54 +0000</pubDate>
      <link>https://dev.to/jkrajniak/watch-your-stream-1d7p</link>
      <guid>https://dev.to/jkrajniak/watch-your-stream-1d7p</guid>
      <description>&lt;p&gt;In the &lt;a href="https://medium.com/nordcloud-engineering/keep-track-on-your-cloud-computations-67dd8f172479" rel="noopener noreferrer"&gt;last story&lt;/a&gt;, I have discussed how to observe distributed cloud computation. Here, I will focus on a bit similar topic. Mainly, how to observe your stream of data flowing into your system, precisely how to decide that at a certain moment there won't be any more data flowing, and we are ready to do the post-processing.&lt;/p&gt;

&lt;p&gt;This perhaps sounds in contradiction to the common approach in dealing with streams, where one could expect to process elements one-by-one. But let us consider the following situation:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;An initial event invokes a set of producers to push data into a stream;&lt;/li&gt;
&lt;li&gt;Producers simultaneously push data to the stream. Let then assume that the consumer stores the records in some permanent storage like S3;&lt;/li&gt;
&lt;li&gt;Only after all expected data are in permanent storage, we can run the next transformation, e.g. some aggregations;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A very simple approach would be to track the execution of the producers (e.g. using the method which I showed in my &lt;a href="https://medium.com/nordcloud-engineering/keep-track-on-your-cloud-computations-67dd8f172479" rel="noopener noreferrer"&gt;previous story&lt;/a&gt;). The third step will be executed only if all of the producers finish computations. &lt;/p&gt;

&lt;p&gt;The major drawback of this approach is not taking into account any delays related to passing records through the stream and storing them in permanent storage. This could, in the end, leads to computing aggregations on a partially completed data set.&lt;/p&gt;




&lt;p&gt;A better idea could be to use an observer to track the number of records that are stored permanently in the storage.&lt;br&gt;
Let us consider the following architecture, using a common component from AWS: Kinesis Firehose, S3, Lambda, DynamoDB.&lt;br&gt;
The overall diagram is shown below&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3d1mi6ea8uj3rqwhvrhu.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F3d1mi6ea8uj3rqwhvrhu.png" alt="Architecture Diagram" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We have a bunch of lambdas which produce some data and push them to Kinesis Firehose. This is then stored in S3 bucket.&lt;br&gt;
A &lt;em&gt;monitor&lt;/em&gt; lambda is triggered whenever Firehose creates a new object in S3 bucket. We count such events and update the counter in DynamoDB table. The primary key of the record is the observed bucket name.&lt;/p&gt;

&lt;p&gt;A prototype handler for such event can be defined as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;

&lt;span class="n"&gt;dynamodb&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;boto3&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;resource&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;dynamodb&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;observerTableName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;observerTableName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;dynamodb&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;observerTableName&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;


&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;monitor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[]):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;eventName&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;''&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ObjectCreated:Put&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;bucketName&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;s3&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bucket&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;name&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;update_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;bucketName&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
                &lt;span class="n"&gt;UpdateExpression&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;ADD num_records :val&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;ExpressionAttributeValues&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;:val&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A second part of the diagram is the &lt;em&gt;observer&lt;/em&gt;.&lt;br&gt;
This lambda uses a SQS queue to run recurrently and reads the content of the &lt;em&gt;storage counter&lt;/em&gt; DynamoDB table.&lt;br&gt;
The algorithm behind the observer is quite simple and can be described by the following diagram&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzz438edmeaesnjti99v4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fzz438edmeaesnjti99v4.png" alt="Window observer" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It can be implemented as follows&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;observer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;context&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;event&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
        &lt;span class="n"&gt;payload&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;record&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;body&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="n"&gt;message&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;loads&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;payload&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;repeated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;=&lt;/span&gt; &lt;span class="n"&gt;MAX_NUM_REPEAT&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# call external service, ready to handle the data in storage.
&lt;/span&gt;            &lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt; finished - calling external service&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="k"&gt;continue&lt;/span&gt;

        &lt;span class="n"&gt;res&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;table&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_item&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;Key&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;id&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;bucket&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;item&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;res&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Item&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;num_records&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;item&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;num_records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;num_records&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;last_num_records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]:&lt;/span&gt;
                &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;repeated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
            &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;repeated&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;  &lt;span class="c1"&gt;# Reset the repeat counter.
&lt;/span&gt;            &lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;last_num_records&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;num_records&lt;/span&gt;
            &lt;span class="n"&gt;sqs&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;send_message&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QueueUrl&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;environ&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;selfSQSURL&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;MessageBody&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;dumps&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;message&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;DelaySeconds&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;with the SQS message in the following structure&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"repeated"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"last_num_records"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"bucket"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"bucket name"&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Two parameters have to be adjusted. The first is the length of the observation window (declared in the above code as &lt;code&gt;MAX_NUM_REPEAT&lt;/code&gt;). The second parameter is the delay between reads from the DynamoDB table, here set to &lt;code&gt;30&lt;/code&gt; seconds.&lt;/p&gt;

&lt;p&gt;Let me comment on these two parameters.&lt;br&gt;
If the producer is a slow process, and we sample too fast (with short delay time), then we can falsely consider the process to be finished.&lt;br&gt;
On the other hand, if the &lt;em&gt;producer&lt;/em&gt; is a fast process, we can unnecessarily wait &lt;code&gt;MAX_NUM_REPEAT * delay&lt;/code&gt; seconds before the &lt;em&gt;observer&lt;/em&gt; sends a notification that the data are ready.&lt;/p&gt;

&lt;p&gt;You can use different optimization strategies for the delay time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;divide initial delay time by the &lt;code&gt;repeated&lt;/code&gt; counter&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;repeated&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="n"&gt;repeated&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;using exponential function&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;delay&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;ceil&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;30&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;numpy&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;exp&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;repeated&lt;/span&gt;&lt;span class="p"&gt;)).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;




&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;It depends on the nature of the production process which parameters are appropriate.&lt;/p&gt;




&lt;p&gt;The method I have presented here can be useful in data processing, where we need to run some post-processing tasks after we have a dataset ready in the storage (like S3, Elasticsearch, etc.).&lt;br&gt;
It needs some tuning to be applicable, and one has to consider as well, how to deal with any false-positive cases.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;On the cover image, the Dijle river near Leuven, Belgium&lt;/em&gt;&lt;/p&gt;




&lt;p&gt;If you liked the post, then you can &lt;a href="https://www.buymeacoffee.com/jkrajniak" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;. Thanks in advance.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>distributedsystems</category>
      <category>aws</category>
      <category>cloud</category>
    </item>
    <item>
      <title>Efficient message broadcasting with SNS filtering</title>
      <dc:creator>Jakub Krajniak</dc:creator>
      <pubDate>Thu, 20 Aug 2020 07:34:05 +0000</pubDate>
      <link>https://dev.to/jkrajniak/efficient-message-broadcasting-with-sns-filtering-1c14</link>
      <guid>https://dev.to/jkrajniak/efficient-message-broadcasting-with-sns-filtering-1c14</guid>
      <description>&lt;p&gt;Broadcasting messages from one source to multiple destinations (or consumers) is a commonly seen pattern in distributed computing solutions.&lt;br&gt;
A very simple case is when we would like to trigger multiple workers at the same time to run a parallel job. The trigger can be executed, e.g., as a &lt;em&gt;CloudWatch cron-like event&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjzsppa4r2z706nmsmd57.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fjzsppa4r2z706nmsmd57.png" alt="Simple broadcaster" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here, the same message is delivered to multiple consumers. In terms of a specific implementation, using AWS cloud components, the above diagram can be transformed into the following architecture diagram:&lt;br&gt;
&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwb84eyt9va3r1801i7uy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fwb84eyt9va3r1801i7uy.png" alt="Architecture diagram for broadcaster" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;



&lt;p&gt;Now, let say that we would like that pattern an event bus, on which messages to different consumers are published into the same communication channel, as depicted on the figure below.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsgt1t72yhooedusx96cg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/cdn-cgi/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsgt1t72yhooedusx96cg.png" alt="Bus system" width="" height=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The communication channel could be, e.g. SNS topic, and the consumers, as in the previous example, can be an AWS Lambdas. Multiple producers (marked with different colours) push messages to the communication bus and the consumers read the messages from it.&lt;/p&gt;

&lt;p&gt;This solution has one major drawback, the same message will be delivered to multiple consumers, even if the consumer cannot handle that message. So for example, an &lt;em&gt;orange&lt;/em&gt; consumer will receive message marked with &lt;em&gt;green&lt;/em&gt;, or &lt;em&gt;red&lt;/em&gt;. In principle, nothing bad could happen if the consumers can filter out such messages. However, this will involve additional computational time and cost.&lt;/p&gt;

&lt;p&gt;We could think about several solutions to this kind of problem. The filtering, as said before, can be delegated to the consumers. There could be multiple communication channels, one per every message type, so the consumers can be linked directly with the appropriate channel. But on the cost of complicated architecture.&lt;/p&gt;



&lt;p&gt;Instead of making the architecture complex, or move the filtering to the consumer code, we could use &lt;a href="https://docs.aws.amazon.com/sns/latest/dg/sns-message-filtering.html" rel="noopener noreferrer"&gt;message filtering&lt;/a&gt; available in SNS service. By this, the consumers can get only a subset of the messages that are pushed to the topic.&lt;/p&gt;

&lt;p&gt;The filtering is done on the attributes and values of the attributes assigned to the published message. A very basic example, using a &lt;a href="https://serverless.com" rel="noopener noreferrer"&gt;serverless framework&lt;/a&gt;, is shown below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;functions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hndl1&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;handler.fun1&lt;/span&gt;
    &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
     &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;sns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
          &lt;span class="na"&gt;arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;NotifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;topicName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;filterPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;messageType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;hndl1&lt;/span&gt;
            &lt;span class="na"&gt;messageContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;running&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the lambda hndl1 will be triggered by an incoming message on SNS topic only if the message will have two attributes: &lt;code&gt;messageType&lt;/code&gt;, &lt;code&gt;messageContext&lt;/code&gt; of values &lt;code&gt;hndl1&lt;/code&gt;, &lt;code&gt;running&lt;/code&gt; respectively.&lt;/p&gt;

&lt;p&gt;Currently, SNS policy filtering can handle only attributes of types: &lt;code&gt;String&lt;/code&gt;, &lt;code&gt;String.Array&lt;/code&gt; and &lt;code&gt;Number&lt;/code&gt;.&lt;br&gt;
The filter values can be either matched exactly or treated as prefixes. In the previous example, the values were matched exactly. In the following example, we use keywords &lt;code&gt;prefix&lt;/code&gt; to indicate that the value should be treated as a string prefix. Moreover, the keyword &lt;code&gt;anything-but&lt;/code&gt; allows defining the values the attribute should not have (blacklist).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;functions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;  
  &lt;span class="na"&gt;hndl2&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;handler.fun2&lt;/span&gt;
    &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;sns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
          &lt;span class="na"&gt;arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;NotifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;topicName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;filterPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;messageType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;hndl2&lt;/span&gt;
            &lt;span class="na"&gt;messageContext&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;incontext&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;prefix&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;type_&lt;/span&gt;
            &lt;span class="na"&gt;messageStage&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;anything-but&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;restart&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is worth to mention that &lt;code&gt;prefix&lt;/code&gt; and constant values can be joined together. In the above example, &lt;code&gt;messageContext&lt;/code&gt; matches either with &lt;code&gt;incontext&lt;/code&gt; value or with &lt;code&gt;type_1&lt;/code&gt;, &lt;code&gt;type_2&lt;/code&gt;, &lt;code&gt;type_b&lt;/code&gt;, &lt;code&gt;type_abc&lt;/code&gt; … etc. &lt;br&gt;
The same &lt;strong&gt;does not hold&lt;/strong&gt; for &lt;code&gt;anything-but&lt;/code&gt; keyword, which cannot be mixed with exact values.&lt;/p&gt;



&lt;p&gt;Apart from working with strings, filters can also operate on numerical values. This is done by using keyword numeric, as in the example below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;functions&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;hndl3&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;handler&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;handler.fun3&lt;/span&gt;
    &lt;span class="na"&gt;events&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;sns&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
          &lt;span class="na"&gt;arn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="kt"&gt;!Ref&lt;/span&gt; &lt;span class="s"&gt;NotifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;topicName&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;notifyTopic&lt;/span&gt;
          &lt;span class="na"&gt;filterPolicy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="na"&gt;messageType&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;hndl3&lt;/span&gt;
            &lt;span class="na"&gt;cost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
              &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;numeric&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;"&lt;/span&gt;
                  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the message with attributes messageType equal to hndl3 and cost greater than zero. The numeric filter accepts the following operators: &lt;code&gt;=&lt;/code&gt;, &lt;code&gt;&amp;gt;&lt;/code&gt;, &lt;code&gt;&amp;gt;=&lt;/code&gt;,&lt;code&gt;&amp;lt;&lt;/code&gt;, and &lt;code&gt;&amp;lt;=&lt;/code&gt;. In addition, you can define a range of values. In the example below, the value of cost attribute has to be in the range (0, 200) to be accepted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;cost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;numeric&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; 
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;gt;"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;0&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;&amp;lt;"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;200&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can even mix numeric and string attribute values, as in the example below&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;cost&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;unknown&lt;/span&gt;
  &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;numeric&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;="&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;100&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;where the &lt;code&gt;cost&lt;/code&gt; attribute has to be either &lt;code&gt;100&lt;/code&gt; or &lt;code&gt;unknown&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;SNS filtering can help you distribute the messages to right consumers, reducing your computational cost on a little price of writing filters. &lt;br&gt;
It is quite robust tool and can be used for all kind of receiver the SNS subscription can handle.&lt;/p&gt;




&lt;p&gt;If you liked the post, then you can &lt;a href="https://www.buymeacoffee.com/jkrajniak" rel="noopener noreferrer"&gt;buy me a coffee&lt;/a&gt;. Thanks in advance.&lt;/p&gt;

</description>
      <category>serverless</category>
      <category>distributedsystems</category>
      <category>aws</category>
      <category>cloud</category>
    </item>
  </channel>
</rss>
