<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Matt</title>
    <description>The latest articles on DEV Community by Matt (@matteo).</description>
    <link>https://dev.to/matteo</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F340661%2Fe9886d05-7e0f-46c7-80e5-bb5fcbcea090.png</url>
      <title>DEV Community: Matt</title>
      <link>https://dev.to/matteo</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/matteo"/>
    <language>en</language>
    <item>
      <title>Writing a chainable CLI For Data Processing</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Mon, 28 Dec 2020 19:27:42 +0000</pubDate>
      <link>https://dev.to/matteo/writing-a-chainable-cli-for-data-processing-31m3</link>
      <guid>https://dev.to/matteo/writing-a-chainable-cli-for-data-processing-31m3</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--T334zriS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nu0ipu4ly1h13k4txi1m.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--T334zriS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_66%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/i/nu0ipu4ly1h13k4txi1m.gif" alt="Pandas CLI"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although some complex datasets or data exploratory require going to Jupyter notebooks, on the other hand, some datasets require simple processing like simple filtering, creating a pivot table.&lt;/p&gt;

&lt;p&gt;So you probably end up opening it in a spreadsheet. However if spreadsheets are accommodating, they are difficult to automate and do not offer as many features as pandas.&lt;/p&gt;

&lt;p&gt;Find out &lt;a href="https://medium.com/swlh/start-using-pandas-from-the-command-line-5dcae6b2ccca"&gt;in this tutorial&lt;/a&gt; how to write a command-line interface to wrap some pandas features and automate simple tasks.&lt;/p&gt;

</description>
      <category>python</category>
      <category>productivity</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Serverless Machine Learning Engineering Project On AWS</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Sat, 26 Dec 2020 22:51:47 +0000</pubDate>
      <link>https://dev.to/matteo/serverless-machine-learning-engineering-project-on-aws-3ipj</link>
      <guid>https://dev.to/matteo/serverless-machine-learning-engineering-project-on-aws-3ipj</guid>
      <description>&lt;p&gt;Lots of tutorials on how to deploy a model in production directly integrates the serialized model into the API. This way of proceeding has the disadvantage of making the API coupled to the model. Another way to do this is to delegate the prediction load to workers using a queue. The schema below shows the solution architecture on the AWS environment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6fb6jc3dbnwzda7fulkw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F6fb6jc3dbnwzda7fulkw.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The machine learning model is stored in an s3 bucket. It is loaded by workers, which are Lambda functions when a message containing prediction data is put in the SQS queue by the client through the API gateway/Lambda REST endpoint.&lt;/p&gt;

&lt;p&gt;When the worker has finished the prediction job, it puts the result in a DynamoDb table in order to be accessed. Finally, the client requests the prediction result through an API endpoint that will read the DynamoDb table to fetch the result.&lt;/p&gt;

&lt;p&gt;As you can see we are delegating the loading and prediction work to a worker and we do not integrate the model into the REST API. This is because a model can take a long time to load and predict. Therefore we manage them asynchronously thanks to the addition of an SQS queue and a DynamoDb table.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Complete article &lt;a href="https://medium.com/swlh/how-to-deploy-your-scikit-learn-model-to-aws-44aabb0efcb4" rel="noopener noreferrer"&gt;here&lt;/a&gt; including code.&lt;/p&gt;
&lt;/blockquote&gt;

</description>
      <category>python</category>
      <category>aws</category>
      <category>machinelearning</category>
      <category>datascience</category>
    </item>
    <item>
      <title>Asynchronous HTTP requests in Python</title>
      <dc:creator>Matt</dc:creator>
      <pubDate>Tue, 25 Feb 2020 09:49:22 +0000</pubDate>
      <link>https://dev.to/matteo/async-request-with-python-1hpo</link>
      <guid>https://dev.to/matteo/async-request-with-python-1hpo</guid>
      <description>&lt;p&gt;In python, you can make HTTP request to API using the requests module&lt;br&gt;
or native urllib3 module.&lt;/p&gt;

&lt;p&gt;However, requests and urllib3 are synchronous. It means that only one HTTP call can be made at a time in a single thread. Sometimes you have to make multiples HTTP call and synchronous code will perform baldy. To avoid this, you can use multi-threading or since python 3.4 asyncio module.&lt;/p&gt;
&lt;h2&gt;
  
  
  Test case
&lt;/h2&gt;

&lt;p&gt;In order to show the difference of time between sync and async code, i made a script that read a file with 500 cities names and perform HTTP call to an API to retrieve information about location, population and so on from the city name.&lt;/p&gt;
&lt;h3&gt;
  
  
  Sync code performance
&lt;/h3&gt;

&lt;p&gt;Here is the sync code version with the requests module&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
    &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;requests&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"https://geo.api.gouv.fr/communes?nom=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;fields=nom,region&amp;amp;format=json&amp;amp;geometry=centr"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;responses&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;resp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;responses&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Finished 'fetch_all' in 38.7053 secs&lt;/code&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Async code performance
&lt;/h3&gt;

&lt;p&gt;I used aiohttp module to make the async code as the requests module doesn't support asyncio for now.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;"""Execute an http call async
    Args:
        session: contexte for making the http call
        url: URL to call
    Return:
        responses: A dict like object containing http response
    """&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;resp&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;response&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;json&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;resp&lt;/span&gt;

&lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;fetch_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="s"&gt;""" Gather many HTTP call made async
    Args:
        cities: a list of string 
    Return:
        responses: A list of dict like object containing http response
    """&lt;/span&gt;
    &lt;span class="k"&gt;async&lt;/span&gt; &lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;aiohttp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ClientSession&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;tasks&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;city&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;fetch&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="s"&gt;"https://geo.api.gouv.fr/communes?nom=&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;city&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;&amp;amp;fields=nom,region&amp;amp;format=json&amp;amp;geometry=centr"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;await&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;gather&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;tasks&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;return_exceptions&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;responses&lt;/span&gt;
&lt;span class="o"&gt;@&lt;/span&gt;&lt;span class="n"&gt;timeit&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="n"&gt;responses&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;asyncio&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;fetch_all&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;cities&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;responses&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;Finished 'run' in 3.0706 secs&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;As you can see, the async version is a lot faster than the sync version so if you run into a situation where your code is performing multiple I/O calls then you should consider concurrency to improve performance. However asynchronous version requires more work as you can see.&lt;/p&gt;

&lt;p&gt;If you want to see the threading version that works with the requests module and also see how to implement automatic retry and caching on your API call check out &lt;a href="https://medium.com/me/stats/post/70a555fecc97"&gt;this tutorial&lt;/a&gt;.&lt;/p&gt;

</description>
      <category>python</category>
      <category>beginners</category>
      <category>datascience</category>
    </item>
  </channel>
</rss>
