<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Roberto Preste</title>
    <description>The latest articles on DEV Community by Roberto Preste (@robertopreste).</description>
    <link>https://dev.to/robertopreste</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F177337%2Ffe183f08-055f-4b00-a53d-4cca2f6818ae.jpeg</url>
      <title>DEV Community: Roberto Preste</title>
      <link>https://dev.to/robertopreste</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/robertopreste"/>
    <language>en</language>
    <item>
      <title>Adding users to the sudo group</title>
      <dc:creator>Roberto Preste</dc:creator>
      <pubDate>Tue, 28 May 2019 16:54:43 +0000</pubDate>
      <link>https://dev.to/robertopreste/adding-users-to-the-sudo-group-2914</link>
      <guid>https://dev.to/robertopreste/adding-users-to-the-sudo-group-2914</guid>
      <description>&lt;p&gt;One of the most important things to do after setting up a new Linux server (or after taking over an existing one) is to create a new user, possibly with &lt;em&gt;sudo powers&lt;/em&gt;. &lt;a href="https://en.wikipedia.org/wiki/Sudo"&gt;Sudo&lt;/a&gt; is a special Linux command that allows users to perform administrator tasks even if they are not system admins.&lt;/p&gt;

&lt;p&gt;The main reason for having a sudo user (or &lt;em&gt;sudoer&lt;/em&gt;) is because logging in as root is usually not desirable, since it can cause troubles more often than not, but we may still want to be able to perform administrator tasks with a non-root user. Moreover, adding one or more users to the sudo group can avoid the need of spreading root credentials, because a sudo command will require the user’s own password, not the root’s one.&lt;/p&gt;

&lt;p&gt;All the members of the sudo group and their restrictions and permissions are in the &lt;code&gt;/etc/sudoers&lt;/code&gt; configuration file. Explaining this file and in general the sudo usage is quite an extensive topic, so we will only cover the case where we want to create a new user (or we already have it) and add it to the sudoers.&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a new user
&lt;/h2&gt;

&lt;p&gt;&lt;em&gt;If you already have a fully functioning non-root user and you just want to give it sudo privileges, you can skip to the next section.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;First of all, we may want to create a new user, that we will later add to the sudoers. In order to do this, we can use the following command in the terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;adduser &amp;lt;username&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;A new user called &lt;code&gt;&amp;lt;username&amp;gt;&lt;/code&gt; will be created, together with his own home folder, usually located in &lt;code&gt;/home/&amp;lt;username&amp;gt;/&lt;/code&gt;. This new user will of course require a password, that we will need to type twice; the password will not be visible for security reasons.&lt;/p&gt;

&lt;p&gt;The command will also prompt us for some basic information about the new user, such as name, telephone number, etc. It is possible to leave this fields blank, though it is recommended to at least fill in the name field.&lt;/p&gt;

&lt;h2&gt;
  
  
  Adding a user to the sudo group
&lt;/h2&gt;

&lt;p&gt;It is possible to add a user to the sudo group without having to mess around with the &lt;code&gt;/etc/sudoers&lt;/code&gt; file. This can be accomplished using the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; &lt;span class="nb"&gt;sudo&lt;/span&gt; &amp;lt;username&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This command will add the user &lt;code&gt;&amp;lt;username&amp;gt;&lt;/code&gt; to the sudo group, and that’s it.&lt;/p&gt;

&lt;p&gt;From now on, the &lt;code&gt;&amp;lt;username&amp;gt;&lt;/code&gt; user will be able to access administrator privileges just by prepending sudo to any command, and providing his own password.&lt;/p&gt;

</description>
      <category>shell</category>
      <category>bash</category>
      <category>informatics</category>
      <category>linux</category>
    </item>
    <item>
      <title>Phred quality score</title>
      <dc:creator>Roberto Preste</dc:creator>
      <pubDate>Tue, 28 May 2019 16:06:39 +0000</pubDate>
      <link>https://dev.to/robertopreste/phred-quality-score-15f8</link>
      <guid>https://dev.to/robertopreste/phred-quality-score-15f8</guid>
      <description>&lt;p&gt;Next Generation Sequencing techniques have brought new insights into -omics data analysis, mostly thanks to their reliability in detecting biological variants. This reliability is usually measured using a value called &lt;a href="https://en.wikipedia.org/wiki/Phred_quality_score"&gt;Phred quality score&lt;/a&gt; (or Q score).&lt;/p&gt;

&lt;p&gt;The Phred score of a base is an integer value that represents the estimated probability of an error in base calling. Mathematically, a Q score is logarithmically related to the base-calling error probabilities P, and can be calculated using the following formula:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Q = -10 log10 P&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In the real world, a quality score of 20 means that there is a possibility in 100 that the base in incorrect; a quality score of 40 means the chances that the base is called incorrectly is 1 in 10000.&lt;/p&gt;

&lt;p&gt;The Phred score is also inversely related to the base call accuracy, thus a higher Q score means a more reliable base call. Here is a useful table which shows this simple relationship:  &lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Phred Quality Score&lt;/th&gt;
&lt;th&gt;Incorrect base call prob&lt;/th&gt;
&lt;th&gt;Base call accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;1 in 10&lt;/td&gt;
&lt;td&gt;90%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;1 in 100&lt;/td&gt;
&lt;td&gt;99%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;1 in 1000&lt;/td&gt;
&lt;td&gt;99.9%&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;1 in 10000&lt;/td&gt;
&lt;td&gt;99.99%&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In &lt;a href="https://dev.to/robertopreste/counting-sequences-in-fasta-fastq-files-6oo"&gt;fastq&lt;/a&gt; files, Phred quality scores are usually represented using &lt;a href="https://en.wikipedia.org/wiki/ASCII"&gt;ASCII characters&lt;/a&gt;, such that the quality score of each base can be specified using a single character. While older Illumina data used to apply the ASCII_BASE 64, nowadays the ASCII_BASE 33 table has been universally adopted for NGS data:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Q Score&lt;/th&gt;
&lt;th&gt;ASCII char&lt;/th&gt;
&lt;th&gt;Q Score&lt;/th&gt;
&lt;th&gt;ASCII char&lt;/th&gt;
&lt;th&gt;Q Score&lt;/th&gt;
&lt;th&gt;ASCII char&lt;/th&gt;
&lt;th&gt;Q Score&lt;/th&gt;
&lt;th&gt;ASCII char&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;!&lt;/td&gt;
&lt;td&gt;11&lt;/td&gt;
&lt;td&gt;,&lt;/td&gt;
&lt;td&gt;22&lt;/td&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;32&lt;/td&gt;
&lt;td&gt;A&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;"&lt;/td&gt;
&lt;td&gt;12&lt;/td&gt;
&lt;td&gt;-&lt;/td&gt;
&lt;td&gt;23&lt;/td&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;33&lt;/td&gt;
&lt;td&gt;B&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;#&lt;/td&gt;
&lt;td&gt;13&lt;/td&gt;
&lt;td&gt;.&lt;/td&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;34&lt;/td&gt;
&lt;td&gt;C&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;$&lt;/td&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;/&lt;/td&gt;
&lt;td&gt;25&lt;/td&gt;
&lt;td&gt;:&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;D&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;%&lt;/td&gt;
&lt;td&gt;15&lt;/td&gt;
&lt;td&gt;0&lt;/td&gt;
&lt;td&gt;26&lt;/td&gt;
&lt;td&gt;;&lt;/td&gt;
&lt;td&gt;36&lt;/td&gt;
&lt;td&gt;E&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;&amp;amp;&lt;/td&gt;
&lt;td&gt;16&lt;/td&gt;
&lt;td&gt;1&lt;/td&gt;
&lt;td&gt;27&lt;/td&gt;
&lt;td&gt;&amp;lt;&lt;/td&gt;
&lt;td&gt;37&lt;/td&gt;
&lt;td&gt;F&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;'&lt;/td&gt;
&lt;td&gt;17&lt;/td&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;28&lt;/td&gt;
&lt;td&gt;=&lt;/td&gt;
&lt;td&gt;38&lt;/td&gt;
&lt;td&gt;G&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;(&lt;/td&gt;
&lt;td&gt;18&lt;/td&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;29&lt;/td&gt;
&lt;td&gt;&amp;gt;&lt;/td&gt;
&lt;td&gt;39&lt;/td&gt;
&lt;td&gt;H&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;)&lt;/td&gt;
&lt;td&gt;19&lt;/td&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;30&lt;/td&gt;
&lt;td&gt;?&lt;/td&gt;
&lt;td&gt;40&lt;/td&gt;
&lt;td&gt;I&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;*&lt;/td&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;31&lt;/td&gt;
&lt;td&gt;@&lt;/td&gt;
&lt;td&gt;41&lt;/td&gt;
&lt;td&gt;J&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;10&lt;/td&gt;
&lt;td&gt;+&lt;/td&gt;
&lt;td&gt;21&lt;/td&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;td&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Even though there are lots of Python, &lt;a href="https://biopython.org"&gt;Biopython&lt;/a&gt; and stand-alone softwares for dealing with Phred quality scores, a simple command to convert an ASCII character to its correspondent quality score is the following (from the terminal):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s1"&gt;'print(ord("&amp;lt;ASCII&amp;gt;")-33)'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, when working in a Python3 session:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;ord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"&amp;lt;ASCII&amp;gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;33&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In both cases, just replace &lt;code&gt;&amp;lt;ASCII&amp;gt;&lt;/code&gt; with the actual ASCII character and that will do the trick.&lt;/p&gt;

</description>
      <category>python</category>
      <category>analysis</category>
      <category>bioinformatics</category>
    </item>
    <item>
      <title>Counting sequences in Fasta/Fastq files</title>
      <dc:creator>Roberto Preste</dc:creator>
      <pubDate>Mon, 27 May 2019 18:05:57 +0000</pubDate>
      <link>https://dev.to/robertopreste/counting-sequences-in-fasta-fastq-files-6oo</link>
      <guid>https://dev.to/robertopreste/counting-sequences-in-fasta-fastq-files-6oo</guid>
      <description>&lt;p&gt;A well-established bioinformatician usually has a handful of appropriate informatics tools to manipulate and analyse genomic data, for example counting sequences in a file.&lt;/p&gt;

&lt;p&gt;Nonetheless, in some cases it may be useful to rely on standard Unix commands, for example when your trusty laptop is not available or you’re working on someone else’s machine.  &lt;/p&gt;




&lt;h2&gt;
  
  
  FASTA files
&lt;/h2&gt;

&lt;p&gt;A &lt;a href="https://en.wikipedia.org/wiki/FASTA_format"&gt;.fasta file&lt;/a&gt; is a simple plain text file in which every sequence is represented by a header line, beginning with &amp;gt; and containing the sequence identifier and details, followed by a number of lines containing the actual sequence:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;gt;SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL

&amp;gt;SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So if you want to count the number of sequences contained in a .fasta file, you can easily have it done using the &lt;code&gt;grep&lt;/code&gt; command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;"&lt;/span&gt; file.fasta | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What this line does is just selecting all the &lt;code&gt;&amp;gt;&lt;/code&gt; characters, and then count all their occurrences. More specifically, the grep command will find all the lines starting with &lt;code&gt;&amp;gt;&lt;/code&gt;, and its output will then be piped to the &lt;code&gt;wc&lt;/code&gt; (word count) command, that thanks to the &lt;code&gt;-l&lt;/code&gt; option will count lines instead of words.&lt;/p&gt;

&lt;p&gt;Another way of using &lt;code&gt;grep&lt;/code&gt; on modern systems is to use the following command:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;grep&lt;/span&gt; &lt;span class="nt"&gt;-c&lt;/span&gt; &lt;span class="s2"&gt;"&amp;gt;"&lt;/span&gt; file.fasta
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;-c&lt;/code&gt; option will instruct the command to count the matching lines, instead of just printing them to the screen, without the need for &lt;code&gt;wc -l&lt;/code&gt; as seen above.&lt;/p&gt;

&lt;h2&gt;
  
  
  FASTQ files
&lt;/h2&gt;

&lt;p&gt;It’s not uncommon to work with &lt;a href="https://en.wikipedia.org/wiki/FASTQ_format"&gt;.fastq files&lt;/a&gt; too, which are somehow just like .fasta files, but they also report bases quality. In this case the &lt;code&gt;&amp;gt;&lt;/code&gt; character, used to specify the beginning of a sequence in .fasta files, is replaced by &lt;code&gt;@&lt;/code&gt;; however, searching for its occurrences as shown above may be misleading, because the &lt;code&gt;@&lt;/code&gt; character is also used as a &lt;a href="https://dev.to/robertopreste/phred-quality-score-15f8"&gt;quality score&lt;/a&gt; symbol.&lt;/p&gt;

&lt;p&gt;There is a trick for counting sequences in a .fastq file, anyway, and it’s related to the usual layout of this kind of file. Each sequence is represented by four lines: the first one being a sequence identifier, the second one is the actual sequence, the third line is usually empty and only contains a placeholder &lt;code&gt;+&lt;/code&gt;, while the last line contains the sequence quality scores:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;@SEQ_ID1
GATTTGGGGTTCAAAGCAGTATCGATCAAATAGTAAATCCATTTGTTCAACTCACAGTTT
+
!''*((((***+))%%%++)(%%%%).1***-+*''))**55CCF&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;&amp;gt;CCCCCCC65
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This means that counting the number of sequences is easier than expected, and will only require dividing the number of lines in the file by four. This can be done on Bourne shells using these commands:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;LINES&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;file.fastq | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;span class="nv"&gt;READS&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sb"&gt;`&lt;/span&gt;&lt;span class="nb"&gt;expr&lt;/span&gt; &lt;span class="nv"&gt;$LINES&lt;/span&gt; / 4&lt;span class="sb"&gt;`&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nv"&gt;$READS&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;On modern shells, such as Bash, this can be done with a simple one-liner:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;expr&lt;/span&gt; &lt;span class="si"&gt;$(&lt;/span&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;file.fastq | &lt;span class="nb"&gt;wc&lt;/span&gt; &lt;span class="nt"&gt;-l&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt; / 4
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;With these simple tricks, you can easily find the number of sequences in your .fasta or .fastq files, right from your Unix shell.  &lt;/p&gt;

</description>
      <category>bioinformatics</category>
      <category>shell</category>
      <category>linux</category>
      <category>bash</category>
    </item>
  </channel>
</rss>
