<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Victoria Ubaldo</title>
    <description>The latest articles on DEV Community by Victoria Ubaldo (@vikyale).</description>
    <link>https://dev.to/vikyale</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F196429%2Fb6329c17-ca37-433f-850e-8d15e37a0f4a.png</url>
      <title>DEV Community: Victoria Ubaldo</title>
      <link>https://dev.to/vikyale</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/vikyale"/>
    <language>en</language>
    <item>
      <title>5 software engineering practices for Data Science</title>
      <dc:creator>Victoria Ubaldo</dc:creator>
      <pubDate>Fri, 22 Jan 2021 04:38:17 +0000</pubDate>
      <link>https://dev.to/vikyale/5-software-engineering-practices-for-data-science-4578</link>
      <guid>https://dev.to/vikyale/5-software-engineering-practices-for-data-science-4578</guid>
      <description>&lt;p&gt;When we work in projects in data science we will find some roles and functions. With my experience in software engineering I was able to apply differents tasks and activities in data science projects. Depends on the project and the time that is counted, here are five basic concepts that every analyst or developer should learn or review about software engineering. Here we go!&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/LmNwrBhejkK9EFP504/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img width="300" src="https://i.giphy.com/media/LmNwrBhejkK9EFP504/giphy.gif" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  Documentation:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/pOZhmE42D1WrCWATLK/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img width="400" src="https://i.giphy.com/media/pOZhmE42D1WrCWATLK/giphy.gif" height="400"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The documentation allows us to be clear about the parts of a code, in addition to knowing the purpose of each component of the code. What will we see in data science projects? Differents files in Python, R, SQL, or Scala that are usually passed from analyst to analyst. Having this documentation saves us time in understanding and improves productivity.&lt;/p&gt;

&lt;p&gt;The types of documentation are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Line level:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Lets read the dataset and check the content
titanic_data = pd.read_csv('drive/My Drive/Datasets/titanic-data.csv')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;#Having the mean of ages in the dataset
titanic_data["Age"].mean()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Level Function or module:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def getExchangeRates(amount, exchange_rate): 
        """
        Parameters
        ----------
        amount : float
            a quantity of money
        exchange_rate : float
             rate at which one currency will be exchanged for another
        return : float
            amount with the exchange rate
        """

    return (amount*exchange_rate)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;Project level:
&lt;/li&gt;
&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;project_ml_bank/
│
├── project/  # Project source code
├── docs/
├── datasets/
├──────── train/
├──────── test/
├── README
├── HOW_TO_CONTRIBUTE
├── CODE_OF_CONDUCT
├── model.py
├── test.py
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Version control:
&lt;/h3&gt;

&lt;p&gt;As we mentioned before, you will work with a lot of code, how do we keep it in an orderly manner, whether it is individual or team work? Ideally, use a version control repository. We install GIT and with a Github, Gitlab or Bitbucket account we create a repository that contains our code. Using the terminal or cmd with few command lines we can do these tasks.&lt;br&gt;
In data science, it is also essential to use this version control because we iterate frequently until we obtain the appropriate indicators from our prediction model and possibly test with previous versions, improving the precision with other features or variables.&lt;/p&gt;

&lt;h3&gt;
  
  
  Testing:
&lt;/h3&gt;

&lt;p&gt;When working with code, it is important to detect faults early and have confidence in the result. In data science finding errors is not always easy to detect them, what can we avoid with testing in data science projects? Here are some examples:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Incorrect encoding: the code does not detect UTF-8 encoding problems of the data (typically in dates, emails, coordinates).&lt;/li&gt;
&lt;li&gt;Inappropriate results: the code does not perform a correct cleaning of the data.&lt;/li&gt;
&lt;li&gt;Unexpected Results: The model code has a lot of BIAS (bias) when evaluated with real data.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How is it solved with testing?
&lt;/h3&gt;

&lt;p&gt;To detect these errors in data science we must review the quality and accuracy (precision) of our analysis, in addition to the quality of the code.&lt;br&gt;
The most used techniques are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Test Driven Development: It is a development process where you write tests for each task before writing code that implements these tasks.&lt;/li&gt;
&lt;li&gt;Unit Test: it is a type of test that only covers a unit of code, it can be a function, independently of the rest of the program.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Logging:
&lt;/h3&gt;

&lt;p&gt;The logs help us to review the events during the execution of our program. For example, if you need to run a model with a super large dataset, you will leave it running overnight and will only review the log to see what is happening, if it finished successfully or if it has errors. Logging is the process of recording the messages that describe the events that occur while the software is running.&lt;/p&gt;

&lt;p&gt;The log levels are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DEBUG: level where you should use it to review each step or event that happens in the program.&lt;/li&gt;
&lt;li&gt;ERROR: level where all the errors that have occurred are recorded.&lt;/li&gt;
&lt;li&gt;INFO: level where all the actions that are suggestions or informative are recorded by the system, regularly scheduled operations are typical.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Some of the tips for a proper log is: be professional and clear, concise and use normal capitalization (upper and lowercase), provide any useful information and choose the appropriate logging level.&lt;/p&gt;

&lt;h3&gt;
  
  
  Code Review:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/cklf6DC5GH1Ob95u2o/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img width="480" src="https://i.giphy.com/media/cklf6DC5GH1Ob95u2o/giphy.gif" height="360"&gt;&lt;/a&gt;&lt;br&gt;
Code reviews help a team promote best programming practices and prepare code for production. It also helps in reviewing standards, ensuring code is readable, and sharing knowledge with the entire team.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusions
&lt;/h3&gt;

&lt;p&gt;These points are basic to apply in some tasks in data science projects and it has helped me to have order and quality. In data science the software perspective is essential, with your software engineering experience you can contribute and grow in this exciting world of data, reducing errors and improving you and your team in productivity.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://i.giphy.com/media/VeNDat4n4Kre76oS1g/giphy.gif" class="article-body-image-wrapper"&gt;&lt;img width="480" src="https://i.giphy.com/media/VeNDat4n4Kre76oS1g/giphy.gif" height="192"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I invite you to follow me:&lt;/p&gt;

&lt;p&gt;Linkedin: &lt;a href="https://www.linkedin.com/in/victoriaubaldo/" rel="noopener noreferrer"&gt;https://www.linkedin.com/in/victoriaubaldo/&lt;/a&gt;&lt;br&gt;
Twitter: &lt;a href="https://twitter.com/VikyAle" rel="noopener noreferrer"&gt;https://twitter.com/VikyAle&lt;/a&gt;&lt;/p&gt;

</description>
      <category>datascience</category>
      <category>programming</category>
      <category>productivity</category>
      <category>computerscience</category>
    </item>
    <item>
      <title>4 Great initiatives to find open source projects</title>
      <dc:creator>Victoria Ubaldo</dc:creator>
      <pubDate>Fri, 23 Aug 2019 21:27:37 +0000</pubDate>
      <link>https://dev.to/vikyale/4-great-initiatives-to-find-open-source-projects-2cf1</link>
      <guid>https://dev.to/vikyale/4-great-initiatives-to-find-open-source-projects-2cf1</guid>
      <description>&lt;p&gt;Hello, recently I wanted search more projects to practice python and javascript, so here share you some resources to find great projects to collaborate and improve our tech skills :)  : &lt;/p&gt;

&lt;p&gt;CodeTriage (&lt;a href="https://www.codetriage.com" rel="noopener noreferrer"&gt;https://www.codetriage.com&lt;/a&gt;)&lt;br&gt;
You can take one issue from your favorite repo per day to help you understand and learn more, and stay involved with the code you rely on.&lt;/p&gt;

&lt;p&gt;issuehub.io (&lt;a href="http://issuehub.io" rel="noopener noreferrer"&gt;http://issuehub.io&lt;/a&gt;)&lt;br&gt;
You can search issue labels to find the right project you want to help.&lt;/p&gt;

&lt;p&gt;Pull Request Roulette (&lt;a href="http://www.pullrequestroulette.com" rel="noopener noreferrer"&gt;http://www.pullrequestroulette.com&lt;/a&gt;) &lt;br&gt;
Find open source pull requests that need a reviewer before merging.&lt;/p&gt;

&lt;p&gt;Contrib (&lt;a href="https://gauger.io/contrib" rel="noopener noreferrer"&gt;https://gauger.io/contrib&lt;/a&gt;)&lt;br&gt;
Browse open source projects with issues for beginners.&lt;/p&gt;

&lt;p&gt;Good luck :) &lt;/p&gt;

</description>
      <category>opensource</category>
      <category>productivity</category>
      <category>beginners</category>
      <category>github</category>
    </item>
    <item>
      <title>Python Data Science Toolbox, my review</title>
      <dc:creator>Victoria Ubaldo</dc:creator>
      <pubDate>Wed, 17 Jul 2019 20:23:02 +0000</pubDate>
      <link>https://dev.to/vikyale/python-data-science-toolbox-my-review-1bbl</link>
      <guid>https://dev.to/vikyale/python-data-science-toolbox-my-review-1bbl</guid>
      <description>&lt;p&gt;Hello everyone! Currenly I learn more about how use python to analyze data. So, I catch some Datacamp's courses. One of the last taken was "Python Data Science Toolbox - Par 1", to learn the art of writing functions and some key concepts like scoping and error handling in Python.&lt;/p&gt;

&lt;p&gt;Contain three parts :&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Writing your own functions.&lt;/li&gt;
&lt;li&gt;Default arguments-Scope. &lt;/li&gt;
&lt;li&gt;Lambda functions and handle errors. &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The course it's good if you know the basics of python and want to improve your skills for become a Data Science or Data Engineer.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Writing your own functions: Actually, we listen about Pandas, Numpy and Matplotlib libraries ,more common using in data analytics. But, like a developer, you need to write your own functions to solve problems in your data.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Default arguments-Scope: this part is important for build custom functions, when you need multiple parameters and multiple return values.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Lambda functions and handle errors: "Lambda" it's a buzzword, in this part your can learn step by step how build one and handle errors in your own function with a lot of samples.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Each part help you to write fantastic functions to analyze dataframes, and you'll have super powers to practice in the real world with big projects.&lt;/p&gt;

&lt;p&gt;The course link is here: &lt;br&gt;
&lt;a href="https://www.datacamp.com/courses/python-data-science-toolbox-part-1" rel="noopener noreferrer"&gt;https://www.datacamp.com/courses/python-data-science-toolbox-part-1&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thanks for read :) &lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
    </item>
  </channel>
</rss>
