<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gustavo Tavares</title>
    <description>The latest articles on DEV Community by Gustavo Tavares (@xguhx).</description>
    <link>https://dev.to/xguhx</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F701057%2F84ab1081-6031-4ec9-a811-34588f4a34c8.png</url>
      <title>DEV Community: Gustavo Tavares</title>
      <link>https://dev.to/xguhx</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/xguhx"/>
    <language>en</language>
    <item>
      <title>About Makefiles – Part I</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Fri, 22 Apr 2022 17:16:50 +0000</pubDate>
      <link>https://dev.to/xguhx/about-makefiles-part-i-4oo7</link>
      <guid>https://dev.to/xguhx/about-makefiles-part-i-4oo7</guid>
      <description>&lt;p&gt;Hello,&lt;/p&gt;

&lt;p&gt;Today I will briefly dissert about &lt;code&gt;Makefiles&lt;/code&gt; and &lt;code&gt;make&lt;/code&gt;, an efficient automation tool to compiles many programs at once.&lt;/p&gt;

&lt;p&gt;I am writing this because when doing my SPO600 Project, I encountered many &lt;code&gt;Makefiles&lt;/code&gt; with different functions and I was curious on how they work.&lt;/p&gt;

&lt;p&gt;What I noticed first is that not all the &lt;code&gt;Makefiles&lt;/code&gt; compiles things, most of them just have flags and configurations like this:&lt;br&gt;
A common use of &lt;code&gt;Makefiles&lt;/code&gt; is to set variables to be included in other &lt;code&gt;Makefiles&lt;/code&gt;:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6V3GY23Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8q277d2bbzg37elh098s.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6V3GY23Q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8q277d2bbzg37elh098s.png" alt="A1" width="880" height="577"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But first, lets understand how hey work from the beginning:&lt;/p&gt;
&lt;h2&gt;
  
  
  Whats the purpose of &lt;code&gt;Makefile&lt;/code&gt;?
&lt;/h2&gt;

&lt;p&gt;A &lt;code&gt;Makefile&lt;/code&gt; is a set of instructions that will be read when the &lt;code&gt;make&lt;/code&gt; command is called.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;Make&lt;/code&gt;, then will read the &lt;code&gt;Makefile&lt;/code&gt;  follow its instructions and compile and link the program it wants to do. &lt;/p&gt;

&lt;p&gt;A function of the &lt;code&gt;Makefile&lt;/code&gt; is to create several &lt;code&gt;make&lt;/code&gt; scripts like:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  &lt;code&gt;make&lt;/code&gt;: will run the &lt;code&gt;Makefile&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;make clean&lt;/code&gt;: will erase the binaries compiled by the &lt;code&gt;Makefile&lt;/code&gt; and others&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;make install&lt;/code&gt; will create and install tar packages and others&lt;/li&gt;
&lt;li&gt;  &lt;code&gt;make distclean&lt;/code&gt; and &lt;code&gt;make realclean&lt;/code&gt; will remove same files as make clean but  TAGS, makefiles and config.status files, &lt;code&gt;make realclean&lt;/code&gt; also removes info files generated by .texinfo
&lt;/li&gt;
&lt;/ul&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The makefile can also tell make how to run miscellaneous commands when explicitly asked (for example, to remove certain files as a clean-up operation).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;


&lt;p&gt;Source: &lt;a href="https://www.gnu.org/software/make/manual/html_node/Introduction.html"&gt;GNU&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  How &lt;code&gt;Make&lt;/code&gt; works?
&lt;/h2&gt;

&lt;p&gt;The command &lt;code&gt;make&lt;/code&gt; will read the directory and search for a &lt;code&gt;Makefile&lt;/code&gt;. Then it will start processing the first rule. A rule is a instruction with a target, a prerequisite and a recipe.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;A target is usually the name of a file that is generated by a program; examples of targets are executable or object files. A target can also be the name of an action to carry out, such as ‘clean’ (see Phony Targets).
A prerequisite is a file that is used as input to create the target. A target often depends on several files.
A recipe is an action that make carries out. A recipe may have more than one command, either on the same line or each on its own line. Please note: you need to put a tab character at the beginning of every recipe line! This is an obscurity that catches the unwary. If you prefer to prefix your recipes with a character other than tab, you can set the .RECIPEPREFIX variable to an alternate character (see Special Variables).
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Source: &lt;a href="https://dev.toRule%20Introduction%20(GNU%20make)"&gt;GNU&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Looking easy right?&lt;br&gt;
Lets take a look at this next screenshot and change our minds:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--eZe8mVVY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pmrxpa8zvm2hjcjejhme.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--eZe8mVVY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/pmrxpa8zvm2hjcjejhme.png" alt="A2" width="880" height="488"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see, it is really hard to understand what’s going on because there is a lot of variables being used.&lt;/p&gt;
&lt;h2&gt;
  
  
  Can we use Variables to make our life easier?
&lt;/h2&gt;

&lt;p&gt;Yes we can, to avoid duplication and reduce redundancy we can set  a variable.&lt;br&gt;
If we take a look at the screenshot above, we can see that the second line is declaring a variable and assigning a value to it.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;SKIPHEADERS = compat/s32pthreads.h
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This way, everytime we want to use it we just need to do &lt;code&gt;$(SKIPHEADERS)&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;For example: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--nA3kJQq---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tn2e3pq8ie7pow93hkaa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--nA3kJQq---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/tn2e3pq8ie7pow93hkaa.png" alt="A3" width="880" height="74"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;TESTOOLS&lt;/code&gt; is being declared and right away its being used in &lt;code&gt;HOSTPROGS&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;I will continue explaining about &lt;code&gt;Makefiles&lt;/code&gt; and &lt;code&gt;make&lt;/code&gt; in the next post!&lt;br&gt;
See you there and Thank you for Reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO 600 – Project Step 3 – Analysis</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Wed, 20 Apr 2022 23:05:56 +0000</pubDate>
      <link>https://dev.to/xguhx/spo-600-project-step-3-analysis-29db</link>
      <guid>https://dev.to/xguhx/spo-600-project-step-3-analysis-29db</guid>
      <description>&lt;h2&gt;
  
  
  Hello!
&lt;/h2&gt;

&lt;p&gt;Is time for my final blog about SPO600.&lt;br&gt;
The analysis for what I have done on Step 2.&lt;br&gt;
A quick recap on what happened on step 2 was that I was able to implement auto-vectorization on FFmepg package. &lt;/p&gt;

&lt;p&gt;So lets start our analysis.&lt;/p&gt;
&lt;h2&gt;
  
  
  A snippet of the disassembly (using &lt;code&gt;objdump -d&lt;/code&gt;)
&lt;/h2&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuylcua9gt841wit32luw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuylcua9gt841wit32luw.png" alt="snippet"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Because I used auto-vectorization, there was many places on the code that got optimized to sve2. &lt;br&gt;
Lets take a look on the first &lt;code&gt;whilelo&lt;/code&gt; we have on the screenshot and see if we can understand what's going on, lets use the arm64 documentation for that:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“WHILELO
While incrementing unsigned scalar lower than scalar

Generate a predicate that starting from the lowest numbered element is true while the incrementing value of the first, unsigned scalar operand is lower than the second scalar operand and false thereafter up to the highest numbered element.

The full width of the scalar operands is significant for the purposes of comparison, and the full width first operand is incremented by one for each destination predicate element, irrespective of the predicate result element size. The first general-purpose source register is not itself updated.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From: &lt;a href="https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/WHILELO--While-incrementing-unsigned-scalar-lower-than-scalar-" rel="noopener noreferrer"&gt;https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/WHILELO--While-incrementing-unsigned-scalar-lower-than-scalar-&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Basically whilelo is a loop. And its taking Scalable predicate registers P0, WZR register and register w1.&lt;br&gt;
Next instruction is mov, which is moving #0x0 to the register x0.&lt;br&gt;
Again another mov moving #0 to register z1.&lt;/p&gt;

&lt;p&gt;Next instruction is LD1D (vector plus immediate),&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“Gather load doublewords to vector (immediate index)
Gather load of doublewords to active elements of a vector register from memory addresses generated by a vector base plus immediate index. The index is a multiple of 8 in the range 0 to 248. Inactive elements will not cause a read from Device memory or signal faults, and are set to zero in the destination vector.
“
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From: &lt;a href="https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LD1D--vector-plus-immediate---Gather-load-doublewords-to-vector--immediate-index--" rel="noopener noreferrer"&gt;https://developer.arm.com/documentation/ddi0596/2021-12/SVE-Instructions/LD1D--vector-plus-immediate---Gather-load-doublewords-to-vector--immediate-index--&lt;/a&gt; &lt;br&gt;
This one is complex; This will load doubleword to a vector as it said, it is preparing for the next instruction.&lt;br&gt;
Storing the doublewords into z0&lt;/p&gt;

&lt;p&gt;Next one is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;“ST1D (scalar plus immediate)
Contiguous store doublewords from vector (immediate index)

Contiguous store of doublewords from elements of a vector register to the memory address generated by a 64-bit scalar base and immediate index in the range -8 to 7 which is multiplied by the vector's in-memory size, irrespective of predication, and added to the base address. Inactive elements are not written to memory.”
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From: &lt;a href="https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/ST1D--scalar-plus-immediate---Contiguous-store-doublewords-from-vector--immediate-index--" rel="noopener noreferrer"&gt;https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/ST1D--scalar-plus-immediate---Contiguous-store-doublewords-from-vector--immediate-index--&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This one will get what was on z0 (used on the previous instruction), multiply by the predicator and add to the base address (z1).&lt;/p&gt;

&lt;p&gt;The next one:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;INCB, INCD, INCH, INCW (scalar)
Increment scalar by multiple of predicate constraint element count
Determines the number of active elements implied by the named predicate constraint, multiplies that by an immediate in the range 1 to 16 inclusive, and then uses the result to increment the scalar destination.
The named predicate constraint limits the number of active elements in a single predicate to:
* A fixed number (VL1 to VL256)
* The largest power of two (POW2)
* The largest multiple of three or four (MUL3 or MUL4)
* All available, implicitly a multiple of two (ALL).
Unspecified or out of range constraint encodings generate an empty predicate or zero element count rather than Undefined Instruction exception.
It has encodings from 4 classes: Byte , Doubleword , Halfword and Word

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;From: &lt;a href="https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/INCB--INCD--INCH--INCW--scalar---Increment-scalar-by-multiple-of-predicate-constraint-element-count-" rel="noopener noreferrer"&gt;https://developer.arm.com/documentation/ddi0596/2020-12/SVE-Instructions/INCB--INCD--INCH--INCW--scalar---Increment-scalar-by-multiple-of-predicate-constraint-element-count-&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The last instruction in the 'whilelo' is incrementing x0 by multiple of predicate constraint element count.&lt;/p&gt;




&lt;p&gt;As looking at the 'objdump', I can see that many pieces of it were optimized for sve2. Its instructions are using vectors and those vectors are using many registers at once as seem above. This means that it will execute faster than the previous implementation without vectors.&lt;/p&gt;

&lt;p&gt;As I showed on Step 2, I did already some tests on it to see if it was working. I processed a video using the new compiled sv2 FFmpeg and the output was working as intended. &lt;/p&gt;

&lt;p&gt;I have 2 directories, FFmpeg with sve2 implementation and FFmpeg0 without it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mtg2bza7lunjffq335v.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7mtg2bza7lunjffq335v.png" alt="directories"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I made the same test on both, I used the same input file “Flame.avi”.&lt;br&gt;
Here are some screenshots:&lt;br&gt;
On FFmpeg0 :&lt;br&gt;
 &lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fgzwoqpkkdtu4ua1sv7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0fgzwoqpkkdtu4ua1sv7.png" alt="input"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I used this command to compile my input file&lt;br&gt;
Here is the result of the command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio23480qqcotu7i61qbi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fio23480qqcotu7i61qbi.png" alt="result"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And here are two screenshots side by side of flami.avi and output.avi played on &lt;code&gt;vlc&lt;/code&gt; on the terminal:&lt;br&gt;
Flame.avi:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve9ronog18shyricwr6o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fve9ronog18shyricwr6o.png" alt="f1"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Output.avi:&lt;br&gt;
 &lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59085bqvpsv7jnt09796.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F59085bqvpsv7jnt09796.png" alt="o1"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;p&gt;On FFmpeg (with sve2):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf3hhzpa987wwzy8bm5o.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf3hhzpa987wwzy8bm5o.png" alt="sve2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Flame.avi:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6tp4w40adgt18lf26dv.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv6tp4w40adgt18lf26dv.png" alt="f2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Output.avi (made with sve2 instructions):&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3zpezat49dpgqlp5w0d.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fj3zpezat49dpgqlp5w0d.png" alt="o2"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Both output.avi files worked the exactly same way.&lt;/p&gt;




&lt;h2&gt;
  
  
  Other Analysis:
&lt;/h2&gt;

&lt;p&gt;For the future, when we have armv9 hardware available:&lt;br&gt;
Soon the auto vectorization will come to -O2 flag for gcc, this means that it will be considered a safe optimization. &lt;/p&gt;

&lt;p&gt;The developers will have to change the &lt;code&gt;Configure&lt;/code&gt; script so that they accept vectorization, and after that moment all the FFmepg compiled with that new flag will have sve2 instructions.&lt;/p&gt;

&lt;p&gt;A check if the hardware is armv9 or armv8 will be needed, if it armv9 then use the ‘configure’ file for armv9 with the sve2 optimizations, If armv8 hardware is present, then use sve optimizations only as it is right now.&lt;/p&gt;

&lt;p&gt;This is my analysis for Step 3 of our project.&lt;br&gt;
I hope you enjoyed reading this.&lt;/p&gt;

&lt;p&gt;Thank you&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 Project – Step 2 – SVE2 Implementation</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Sun, 10 Apr 2022 22:15:16 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-project-step-2-sve2-implementation-3nae</link>
      <guid>https://dev.to/xguhx/spo600-project-step-2-sve2-implementation-3nae</guid>
      <description>&lt;h2&gt;
  
  
  Hey There
&lt;/h2&gt;

&lt;p&gt;Its time for Step 2 of our SPO600 Project.&lt;br&gt;
Before we start, lets do a quick review on what we need to do in this project&lt;/p&gt;

&lt;p&gt;Step 1: Research a library level package to be a candidate for sve2 implementation.&lt;/p&gt;

&lt;p&gt;Step 2: Implement sve2 to the chosen package.&lt;/p&gt;

&lt;p&gt;Step 3: Upstream your changes or prepare it for future implementation.&lt;/p&gt;




&lt;h2&gt;
  
  
  Continuing with FFmpeg
&lt;/h2&gt;

&lt;p&gt;After choosing this package, I had to follow some steps to make sure it was able to receive sve2 through auto vectorization.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Check if there was previous implementation of sve (There was as said in Step 1)&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check if the compiler could apply the auto vectorization on this package.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Check the correct Makefile to change in order to apply the auto vectorization to all files in the package.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  My Approach
&lt;/h2&gt;

&lt;p&gt;After taking a look at the .S and .c files with neon optimization on them:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--y5kygei4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/csbg45700ynza89vpi3y.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--y5kygei4--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/csbg45700ynza89vpi3y.png" alt="S and c" width="747" height="1350"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I realized that this package was able to receive the auto vectorization from the compiler, so I decided to give it a try.&lt;/p&gt;

&lt;p&gt;Then I started looking for a Makefile, but for my surprise there was quite a few:&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--GTI9FyiX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8p2rqk8w6x5tdli8h8xg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--GTI9FyiX--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8p2rqk8w6x5tdli8h8xg.png" alt="makefile" width="811" height="1255"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Where should I start searching, many of these files have some configuration that I cannot even understand properly.&lt;/p&gt;

&lt;p&gt;So I decided to start from the beginning: &lt;code&gt;./Makefile&lt;/code&gt;&lt;br&gt;
It looked like this: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--W4ll0zBm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ecvb51lrdyd7hlj197r7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--W4ll0zBm--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ecvb51lrdyd7hlj197r7.png" alt="makefile" width="880" height="913"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And kept going and going, but no sign of compiler and optimizations.&lt;/p&gt;

&lt;p&gt;But there was something there that caught my attention:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--v4iR3elb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/81djcxi20h67vsh2nq07.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--v4iR3elb--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/81djcxi20h67vsh2nq07.png" alt="include" width="409" height="81"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;At the very top of the file there was an include that could help me, and then there I went to see if I could find the &lt;code&gt;gcc&lt;/code&gt; instructions.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;congif.mak&lt;/code&gt; is generated by the &lt;code&gt;./configure&lt;/code&gt; script and it enables the neon optimization and others:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--OIsWTPZK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/iywv0dn9vlbtu6ork6gx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--OIsWTPZK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/iywv0dn9vlbtu6ork6gx.png" alt="script" width="783" height="795"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;But when I checked the &lt;code&gt;config.mak&lt;/code&gt; file looking for the &lt;code&gt;gcc&lt;/code&gt; optimizations I found that they were disabling the vectorization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--0YEwEh13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jf07dowpw65r9a4zhtky.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--0YEwEh13--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jf07dowpw65r9a4zhtky.png" alt="disabled" width="880" height="84"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So I decide to change it to enable the vectorization:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--VVsQbpjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7p3on9nqfrkwdtt7baf4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--VVsQbpjs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7p3on9nqfrkwdtt7baf4.png" alt="enabled" width="880" height="167"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And then I run the &lt;code&gt;make&lt;/code&gt; command.&lt;/p&gt;

&lt;p&gt;After built, it was time to try it:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--9iw6haUq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g8481b2t5elie8lf60hj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--9iw6haUq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/g8481b2t5elie8lf60hj.png" alt="core dumped" width="880" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;For my surprise, the first run I got core dumped error, which this time was a very welcome error message.&lt;/p&gt;

&lt;p&gt;It meant that the program was built in a way that it could not be run by the current system.&lt;/p&gt;

&lt;p&gt;So I tried to run it using the &lt;code&gt;qemu-aarch64&lt;/code&gt; emulator and for my surprise the program worked fine!&lt;/p&gt;

&lt;p&gt;I tried to test it with a sample file few times to see if it worked and here is the result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6VTz4uoc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m1w2dpicu6xg7cywa8he.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6VTz4uoc--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/m1w2dpicu6xg7cywa8he.png" alt="testing" width="880" height="454"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It converted my sample.avi to output.avi with 24 framerates as I requested.&lt;/p&gt;

&lt;p&gt;It was time to check if there was indeed sv2 optimizations inside the binary file.&lt;/p&gt;

&lt;p&gt;So I used &lt;code&gt;Objdump -d&lt;/code&gt; and I found that there was really sve2 in there:&lt;/p&gt;

&lt;p&gt;Here are some examples:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AdCaxMpW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z6f02udx3pn6p2u9t76r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AdCaxMpW--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z6f02udx3pn6p2u9t76r.png" alt="whilelo1" width="880" height="307"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--EB1kI6hw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7raieniv1dkulx8sd2ia.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--EB1kI6hw--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/7raieniv1dkulx8sd2ia.png" alt="whilelo2" width="880" height="359"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--IMunWQ8R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jxysdr7k5kjcb3p6zax9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--IMunWQ8R--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jxysdr7k5kjcb3p6zax9.png" alt="whilelo3" width="880" height="475"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see &lt;code&gt;z&lt;/code&gt; and &lt;code&gt;p&lt;/code&gt; registers being used together with the &lt;code&gt;whilelo&lt;/code&gt; instruction.&lt;/p&gt;




&lt;h2&gt;
  
  
  To sum up
&lt;/h2&gt;

&lt;p&gt;Step 2 was an adventure. At first, I thought that going for auto-vectorization would be an easy task, I though that a Makefile would be waiting for me just to change the arguments of the compiler but in the end I had dozens of Makefiles, each with different configurations and it required a loot of reading and research to make it work.&lt;/p&gt;

&lt;p&gt;I had to learn that there was configure scripts needed to make the configurations appears and the file I was looking for was not even a &lt;code&gt;Makefile&lt;/code&gt;, it was a .mak one. &lt;/p&gt;

&lt;p&gt;I pretend to write a little more about Makefiles as it seems to me a powerful tool and way more complicated and deeper than I have imagined.&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 – Final Project - Step 1</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 28 Mar 2022 20:38:10 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-final-project-step-1-9i</link>
      <guid>https://dev.to/xguhx/spo600-final-project-step-1-9i</guid>
      <description>&lt;h2&gt;
  
  
  Hello!
&lt;/h2&gt;

&lt;p&gt;We are getting closer to an end here and we are finally starting our final Project (We will be working in the Open!).&lt;/p&gt;




&lt;h2&gt;
  
  
  Step 1
&lt;/h2&gt;

&lt;p&gt;For the first step of the project, we were supposed to choose some packages that would beneficiate from sve2 instructions.&lt;/p&gt;

&lt;p&gt;The ideal package is one that process massive amounts of data. This way the sve2 can be used at its maximum capabilities to improve performance.&lt;/p&gt;

&lt;p&gt;After some research, I found two candidates that could benefit from sve2:&lt;br&gt;
&lt;a href="https://gitlab.freedesktop.org/gstreamer/gstreamer"&gt;Gstreamer1&lt;/a&gt; and &lt;a href="http://ffmpeg.org/"&gt;FFmpeg&lt;/a&gt;.&lt;/p&gt;


&lt;h2&gt;
  
  
  Gstream1
&lt;/h2&gt;

&lt;p&gt;According to them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; GStreamer is a streaming media framework, based on graphs of filters which operate on media data. 

Applications using this library can do anything from real-time sound processing to playing videos, and just about anything else media-related.  

Its plugin-based architecture means that new data types or processing capabilities can be added simply by installing new plugins.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Gstream1 use inline assembler code for many functions, as we can see &lt;a href="https://gitlab.freedesktop.org/gstreamer/gstreamer/-/blob/main/subprojects/gst-plugins-base/gst-libs/gst/audio/audio-resampler-neon.h"&gt;here&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="kr"&gt;inline&lt;/span&gt; &lt;span class="kt"&gt;void&lt;/span&gt;
&lt;span class="nf"&gt;inner_product_gint16_full_1_neon&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;gint16&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;gint16&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;gint16&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gint&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="k"&gt;const&lt;/span&gt; &lt;span class="n"&gt;gint16&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;icoeff&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;gint&lt;/span&gt; &lt;span class="n"&gt;bstride&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;uint32_t&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;len&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;remainder&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="n"&gt;asm&lt;/span&gt; &lt;span class="k"&gt;volatile&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"      vmov.s32 q0, #0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      cmp %[len], #0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      beq 2f&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmov.s32 q1, #0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"1:"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vld1.16 {d16, d17, d18, d19}, [%[b]]!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vld1.16 {d20, d21, d22, d23}, [%[a]]!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      subs %[len], %[len], #16&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmlal.s16 q0, d16, d20&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmlal.s16 q1, d17, d21&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmlal.s16 q0, d18, d22&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmlal.s16 q1, d19, d23&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      bne 1b&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vadd.s32 q0, q0, q1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"2:"&lt;/span&gt;
                  &lt;span class="s"&gt;"      cmp %[remainder], #0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      beq 4f&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"3:"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vld1.16 {d16}, [%[b]]!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vld1.16 {d20}, [%[a]]!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      subs %[remainder], %[remainder], #4&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vmlal.s16 q0, d16, d20&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      bgt 3b&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"4:"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vadd.s32 d0, d0, d1&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vpadd.s32 d0, d0, d0&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vqrshrn.s32 d0, q0, #15&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="s"&gt;"      vst1.16 d0[0], [%[o]]&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;
                  &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="s"&gt;"+r"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="s"&gt;"+r"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                    &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="s"&gt;"+r"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;len&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;remainder&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="s"&gt;"+r"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;remainder&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                  &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="s"&gt;"r"&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                  &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="s"&gt;"cc"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"q0"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"q1"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"d16"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d17"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d18"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d19"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="s"&gt;"d20"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d21"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d22"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"d23"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;


&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  FFmpeg
&lt;/h2&gt;

&lt;p&gt;According to them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;FFmpeg is a complete and free Internet live audio and video broadcasting solution for Linux/Unix. It also includes a digital  VCR. It can encode in real time in many formats including MPEG1 audio and video, MPEG4, h263, ac3, asf, avi, real, mjpeg, and flash.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The Second option has many files that use &lt;code&gt;neon&lt;/code&gt; and use gcc to compile.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--5aEXqEy7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9kt7tfitea4nm3tw86b7.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--5aEXqEy7--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/9kt7tfitea4nm3tw86b7.png" alt="FFmpeg" width="520" height="498"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Planning my approach
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Auto-vectorization:&lt;/strong&gt; &lt;br&gt;
My plan for implementing sve2 in this project is to use the auto-vectorization, this means I will change the &lt;code&gt;Makefile&lt;/code&gt; and include options that will make the compiler applies the optimizations for me.&lt;br&gt;
Here is a exemple of Makefile from gstreamer1: &lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kU5UwvpB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sti4u1fzbagt8s9tdby8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kU5UwvpB--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/sti4u1fzbagt8s9tdby8.png" alt="Makefile" width="880" height="222"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As we can see they are using -O0 for optimizations which means no optimizations at all.&lt;/p&gt;

&lt;p&gt;So I will include the options &lt;code&gt;-O3 -march=armv8-a+sve2&lt;/code&gt; and test it to see if the improvements were made.&lt;/p&gt;


&lt;h2&gt;
  
  
  About Makefiles
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;Makefiles&lt;/code&gt; can be complicated, I thought this would be an easy approach first but there is so many &lt;code&gt;Makefiles&lt;/code&gt; in a project and they are linked to each other.&lt;br&gt;
Take a look at this example from FFmpeg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="n"&gt;MAIN_MAKEFILE&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;
&lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="n"&gt;ffbuild&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mak&lt;/span&gt;

&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;c&lt;/span&gt;    &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cpp&lt;/span&gt;  &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;    &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;inc&lt;/span&gt;  &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt;    &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;S&lt;/span&gt;    &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;asm&lt;/span&gt;  &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;rc&lt;/span&gt;   &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;    &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;texi&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;cu&lt;/span&gt;   &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ptx&lt;/span&gt;  &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metal&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;vpath&lt;/span&gt; &lt;span class="o"&gt;%/&lt;/span&gt;&lt;span class="n"&gt;fate_config&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;sh&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;template&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;TESTTOOLS&lt;/span&gt;   &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;audiogen&lt;/span&gt; &lt;span class="n"&gt;videogen&lt;/span&gt; &lt;span class="n"&gt;rotozoom&lt;/span&gt; &lt;span class="n"&gt;tiny_psnr&lt;/span&gt; &lt;span class="n"&gt;tiny_ssim&lt;/span&gt; &lt;span class="n"&gt;base64&lt;/span&gt; &lt;span class="n"&gt;audiomatch&lt;/span&gt;
&lt;span class="n"&gt;HOSTPROGS&lt;/span&gt;  &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TESTTOOLS&lt;/span&gt;&lt;span class="o"&gt;:%=&lt;/span&gt;&lt;span class="n"&gt;tests&lt;/span&gt;&lt;span class="o"&gt;/%&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;print_options&lt;/span&gt;

&lt;span class="cp"&gt;# $(FFLIBS-yes) needs to be in linking order
&lt;/span&gt;&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_AVDEVICE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;avdevice&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_AVFILTER&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;avfilter&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_AVFORMAT&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;avformat&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_AVCODEC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;avcodec&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_POSTPROC&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;   &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;postproc&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_SWRESAMPLE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;swresample&lt;/span&gt;
&lt;span class="n"&gt;FFLIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;CONFIG_SWSCALE&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;    &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;swscale&lt;/span&gt;

&lt;span class="n"&gt;FFLIBS&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="n"&gt;avutil&lt;/span&gt;

&lt;span class="n"&gt;DATA_FILES&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;wildcard&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;presets&lt;/span&gt;&lt;span class="o"&gt;/*&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ffpreset&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ffprobe&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;xsd&lt;/span&gt;

&lt;span class="n"&gt;SKIPHEADERS&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;compat&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;w32pthreads&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;

&lt;span class="cp"&gt;# first so "all" becomes default target
&lt;/span&gt;&lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;all&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;yes&lt;/span&gt;

&lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;Makefile&lt;/span&gt;
&lt;span class="n"&gt;include&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;SRC_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;ffbuild&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;common&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mak&lt;/span&gt;

&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FFEXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;  &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;FF_STATIC_DEP_LIBS&lt;/span&gt; &lt;span class="o"&gt;:=&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;STATIC_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;TOOLS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXTRALIBS&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;F&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;target_dec_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_dec_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;target_dec_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_dec_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LIBFUZZER_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_bsf_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_bsf_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LIBFUZZER_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;target_dem_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;target_dem_&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LIBFUZZER_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_dem_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_dem_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LIBFUZZER_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_io_dem_fuzzer&lt;/span&gt;&lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;EXESUF&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;&lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tools&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;target_io_dem_fuzzer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;o&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_DEP_LIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LDEXEFLAGS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LD_O&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ELIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;FF_EXTRALIBS&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="err"&gt;$&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;LIBFUZZER_PATH&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="err"&gt;…&lt;/span&gt; 
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;it&lt;/span&gt; &lt;span class="n"&gt;keeps&lt;/span&gt; &lt;span class="n"&gt;going&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="n"&gt;going&lt;/span&gt;&lt;span class="err"&gt;…&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Looks like some kind of Martian language to me.&lt;/p&gt;




&lt;h2&gt;
  
  
  Finally
&lt;/h2&gt;

&lt;p&gt;After some research I decided to go with &lt;code&gt;ffmpeg&lt;/code&gt;.&lt;br&gt;
Because gstream1 don't use &lt;code&gt;make&lt;/code&gt; and &lt;code&gt;Makefiles&lt;/code&gt; to compile, it uses &lt;code&gt;meson&lt;/code&gt; and &lt;code&gt;ninja&lt;/code&gt; which makes life difficult for me as I have no knowledge at all in those technologies.&lt;/p&gt;

&lt;p&gt;To change the ffmpeg &lt;code&gt;Makefile&lt;/code&gt;, I will have to change the 'config.mak' file inside the ffbuild directory as it send configurations to the &lt;code&gt;Makefile&lt;/code&gt; which builds the project.&lt;/p&gt;

&lt;p&gt;Thats it for now!&lt;br&gt;
Thank you for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 Lab06 - sve2</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 21 Mar 2022 02:14:44 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-lab06-sve2-31ke</link>
      <guid>https://dev.to/xguhx/spo600-lab06-sve2-31ke</guid>
      <description>&lt;h2&gt;
  
  
  Hi there,
&lt;/h2&gt;

&lt;p&gt;This lab build up on top of the past lab, this time we need to make a new volume function that will use &lt;a href="https://developer.arm.com/documentation/102340/0001/Introducing-SVE2?lang=en"&gt;SVE2&lt;/a&gt; syntax.&lt;/p&gt;

&lt;p&gt;Even though it looks simple, it is really hard to understand inline assembler and intrinsic code.&lt;/p&gt;

&lt;p&gt;For this lab, I decided to change one of the previous functions to use &lt;code&gt;sve2&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  My Approach
&lt;/h3&gt;

&lt;p&gt;The one function I decided to change was vol4(remembering that vol4.c was made by the professor, not by me):&lt;/p&gt;

&lt;p&gt;First things I had to change was adding the sve header file:&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--rrT_08Dp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bj3fhkjt0femq1mf2xyg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--rrT_08Dp--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/bj3fhkjt0femq1mf2xyg.png" alt="header" width="880" height="291"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;After that, instead of using the &lt;code&gt;v&lt;/code&gt; registers, I used &lt;code&gt;z&lt;/code&gt; as instructed in the sve2 instructions:&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--3xwYl2LQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/amch3mvh3geazdghi0uk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--3xwYl2LQ--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/amch3mvh3geazdghi0uk.png" alt="registers" width="880" height="641"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;(Image from: &lt;a href="https://developer.arm.com/documentation/102340/0001/SVE2-architecture-fundamentals?lang=en"&gt;https://developer.arm.com/documentation/102340/0001/SVE2-architecture-fundamentals?lang=en&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kEOymMKe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s9044br8f74ejh27g3in.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kEOymMKe--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/s9044br8f74ejh27g3in.png" alt="code" width="880" height="655"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The change was not huge, but without the sve2 instructions on the Makefile it could not compile:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--XZAwrxfY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z9z5wds3d92twjb6udth.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--XZAwrxfY--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/z9z5wds3d92twjb6udth.png" alt="Makefile" width="880" height="111"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And to run it we need to to use &lt;code&gt;qemu-aarch64&lt;/code&gt;: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--sU6zkE1h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1260kal1aiibrny5q4az.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--sU6zkE1h--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1260kal1aiibrny5q4az.png" alt="qemu" width="880" height="63"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Without it we have a illegal instruction error:&lt;br&gt;
Core dumped:&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CpimMJyH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d53613owfnkzc6o192ij.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CpimMJyH--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/d53613owfnkzc6o192ij.png" alt="coredump" width="684" height="75"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;On object dump, we can see that &lt;code&gt;sqrdmulh&lt;/code&gt; is using the &lt;code&gt;z&lt;/code&gt; registers.&lt;br&gt;
&lt;code&gt;Objdump -d&lt;/code&gt; &lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--tcbm557t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/32y4akpo99zoc7dkmekf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--tcbm557t--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/32y4akpo99zoc7dkmekf.png" alt="objdump" width="880" height="201"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Thank you for your reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 Lab 05 - Algorithm Selection</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Wed, 09 Mar 2022 02:50:49 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-lab-05-algorithm-selection-2bj7</link>
      <guid>https://dev.to/xguhx/spo600-lab-05-algorithm-selection-2bj7</guid>
      <description>&lt;h2&gt;
  
  
  Hello
&lt;/h2&gt;

&lt;p&gt;In this lab we are supposed to run different versions of the same program and analyze the performance difference.&lt;/p&gt;

&lt;p&gt;This program basically will scale a number of samples that will be decided later but that must be enough for the program to run for 20 seconds or more.&lt;/p&gt;

&lt;p&gt;Each version of the program has some variance in in the algorithm.&lt;br&gt;
Here are they: &lt;br&gt;
From our &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/SPO600_Algorithm_Selection_Lab"&gt;lab page&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1.  vol0.c is the basic or naive algorithm. This approach multiplies each sound sample by the volume scaling factor, casting from signed 16-bit integer to floating point and back again. Casting between integer and floating point can be expensive operations.

2.  vol1.c does the math using fixed-point calculations. This avoids the overhead of casting between integer and floating point and back again.


3.  vol2.c pre-calculates all 65536 different results, and then looks up the answer for each input value.

4.  vol3.c is a dummy program - it doesn't scale the volume at all. It can be used to determine some of the overhead of the rest of the processing (besides scaling the volume) done by the other programs.


5.  vol4.c uses Single Instruction, Multiple Data (SIMD) instructions accessed through inline assembley (assembly language code inserted into a C program). This program is specific to the AArch64 architecture and will not build for x86_64.

6.  vol5.c uses SIMD instructions accessed through Complier Intrinsics. This program is also specific to AArch64.

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In this lab we will use the AArch64 and x86_64 machines provided by the professor.&lt;/p&gt;

&lt;h2&gt;
  
  
  AArch64
&lt;/h2&gt;

&lt;p&gt;First, as said by the professor, we need to calculate the overhead of a dummy program. &lt;br&gt;
This way we will know how much overhead we have and this way we can decrease it from our other algorithms runs.&lt;/p&gt;

&lt;p&gt;So lets set the number of samples so our program can run for over 20 seconds:&lt;br&gt;
&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--Hu0EunKy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xd0ua0mv9a8t7j9vcipn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--Hu0EunKy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/xd0ua0mv9a8t7j9vcipn.png" alt="header" width="880" height="317"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Then we compile our programs using the &lt;code&gt;make&lt;/code&gt; command:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KDpIzxBs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dvbi21p39u0x17to4kh5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KDpIzxBs--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/dvbi21p39u0x17to4kh5.png" alt="make" width="880" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And now lets run vol3 (Dummy program) for 5 times and see how long it takes in average.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--6sz8A099--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/safxjv402tst14rt6c7f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--6sz8A099--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/safxjv402tst14rt6c7f.png" alt="aarch64vol3" width="778" height="956"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lets discard the highest and the lowest values and get the average: &lt;code&gt;26.946 seconds&lt;/code&gt; &lt;br&gt;
This is the value of our overhead that we will subtract from the other programs.&lt;/p&gt;

&lt;h3&gt;
  
  
  Now lets do the same with others:
&lt;/h3&gt;

&lt;h3&gt;
  
  
  vol0:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--qjNwUXS---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ehbhld879jnxjx0ljm71.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--qjNwUXS---/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ehbhld879jnxjx0ljm71.png" alt="vol0" width="789" height="970"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Again, lets discard the highest and lowest ones and get the average, then we subtract from the dummy overhead.&lt;br&gt;
Our Average is &lt;code&gt;27.684 seconds&lt;/code&gt;.&lt;br&gt;
&lt;code&gt;27.684&lt;/code&gt; - &lt;code&gt;26.946 (overhead)&lt;/code&gt; = &lt;code&gt;0.738s&lt;/code&gt;.&lt;br&gt;
We had &lt;code&gt;0.738s&lt;/code&gt; scaling time.&lt;/p&gt;

&lt;h3&gt;
  
  
  vol1:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--j9zrSFZC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/al9kpunbfi7ww1332uxm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--j9zrSFZC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/al9kpunbfi7ww1332uxm.png" alt="vol1" width="755" height="964"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Repeating the process, we have &lt;code&gt;27.501s&lt;/code&gt; average&lt;br&gt;
Subtracting the overhead we have:  &lt;code&gt;0.555s&lt;/code&gt; scaling time&lt;/p&gt;

&lt;h3&gt;
  
  
  vol2:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--zdmOOeon--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezt9o33mfc9eic7zhuzq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--zdmOOeon--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ezt9o33mfc9eic7zhuzq.png" alt="vol2" width="750" height="956"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Average: &lt;code&gt;33.748s&lt;/code&gt;&lt;br&gt;
Subtracting the overhead we have:  &lt;code&gt;6.802s&lt;/code&gt; scaling time.&lt;/p&gt;

&lt;h3&gt;
  
  
  vol4:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--HB8pWIoK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cxchdv9mpj6d7kn39tuy.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--HB8pWIoK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/cxchdv9mpj6d7kn39tuy.png" alt="vol4" width="775" height="994"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Average: &lt;code&gt;26.763s&lt;/code&gt;.&lt;br&gt;
Subtracting the overhead we have:  &lt;code&gt;-0.183s&lt;/code&gt; scaling time.&lt;/p&gt;

&lt;h3&gt;
  
  
  vol5:
&lt;/h3&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TH7NukHK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2udquejihlo3gfg2bdxt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TH7NukHK--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/2udquejihlo3gfg2bdxt.png" alt="vol5" width="794" height="964"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Average: &lt;code&gt;26.729s&lt;/code&gt;.&lt;br&gt;
Subtracting the overhead we have:  &lt;code&gt;-0.216s&lt;/code&gt; scaling time.&lt;/p&gt;




&lt;p&gt;I tried to change the &lt;code&gt;Makefile&lt;/code&gt; to try to make it run faster by using the &lt;code&gt;-Ofast&lt;/code&gt; and enable the vectorization &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--KwDvNbb9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/my3o3qd324y44l6o0z4w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--KwDvNbb9--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/my3o3qd324y44l6o0z4w.png" alt="Ofast" width="880" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;And had a 3 second average difference from the &lt;code&gt;-O2&lt;/code&gt;.&lt;/p&gt;




&lt;p&gt;Now lets go to x86_64 machine&lt;/p&gt;

&lt;h2&gt;
  
  
  x86_65
&lt;/h2&gt;

&lt;p&gt;The dummy (vol3) ovearhead value is &lt;code&gt;23.650s&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Vol0 is the naïve implementation:&lt;br&gt;
The average was &lt;code&gt;24.158 s&lt;/code&gt;.&lt;br&gt;
Minus the overhead, the time scaling is:  &lt;code&gt;0.508s&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Vol1 is the fixed point implementation:&lt;br&gt;
The average was &lt;code&gt;23.736s&lt;/code&gt;.&lt;br&gt;
Minus the overhead, the time scaling is:  &lt;code&gt;0.086s&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;Vol2 is the precalculated implementation:&lt;br&gt;
The average was &lt;code&gt;24.676sz&lt;/code&gt;.&lt;br&gt;
Minus the overhead, the time scaling is:  &lt;code&gt;1.026s&lt;/code&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  Finally
&lt;/h2&gt;

&lt;p&gt;Lets answer the questions on the c code:&lt;br&gt;
 &lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--jhV3ox3q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t48dxiiz461eapdlc7x0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--jhV3ox3q--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/t48dxiiz461eapdlc7x0.png" alt="q1" width="880" height="101"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The dummy program will help us measure the overhead, this is the process that every other vol program will go through so we need to know how much it is in order to know how much is the scaling process time.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--JGMoO-0p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8u3lj328gd205psqtkfp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--JGMoO-0p--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8u3lj328gd205psqtkfp.png" alt="q2" width="880" height="221"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Those parts are needed because if you don’t use the result of your calculations, the compiler will see it and avoid it as you are not using it anyway, so In the end your program will end up doing nothing at all.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--AziFPfQM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zb6x5rh6g7eb25ecpwsr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--AziFPfQM--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/zb6x5rh6g7eb25ecpwsr.png" alt="q3" width="880" height="119"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;That’s a good question. I would say to make sure it’s a positive number, but to be honest Im not sure. ☹&lt;/p&gt;

&lt;p&gt;I could notice that different algorithms produce different outputs. But running the same program multiple times will produce the same result even with different performance times. I would say that the difference results between algorithms, in terms of audio is not big enough for us to perceive.&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>Spo600 Lab 4</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 07 Mar 2022 00:42:25 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-lab-4-4nm6</link>
      <guid>https://dev.to/xguhx/spo600-lab-4-4nm6</guid>
      <description>&lt;h2&gt;
  
  
  Lab 04 is here
&lt;/h2&gt;

&lt;p&gt;In this lab we were supposed to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Login into Portugal and Israel servers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Unpack a tarball, which for a windows user like me can be a new experience.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Explore the files inside the spo600 folder and use the &lt;code&gt;makefiles&lt;/code&gt; and &lt;code&gt;make&lt;/code&gt; command to build the code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Use &lt;code&gt;objdump -d&lt;/code&gt; to inspect what was inside the binary file.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the Hello World! In AArch64 and x82_64 to make it loop.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Modify the loop to run until 30 iterations (which took hours for me)&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I started by the AArch64 as described in the lab, the professors lecture gave us the bread and the butter for this lab, which without it probably it was going to be impossible for me.&lt;/p&gt;




&lt;h2&gt;
  
  
  AArch64
&lt;/h2&gt;

&lt;p&gt;Here is my final AArch64 code looping from 00 to 29:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--iePZqwDq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/raaz7c08vg93l0onokcf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--iePZqwDq--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/raaz7c08vg93l0onokcf.png" alt="result first loop" width="677" height="928"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aarch64 Final code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.text
.globl _start
min = 0                          /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30
                        /* loop exits when the index hits this number (loop condition is i&amp;lt;max) */


_start:

        mov     x15, 10 // for division
        mov     x19, min // value of the counter
        add     x18, x19, '0' //counter
        adr     x17, msg+6
        adr     x20, msg+7

loop:
        cmp x19, x15

        b.lt inside //start from smallest decimals

        udiv x12, x19, x15 // divide x19/x15 and store in x12 // udiv r0,r1,r2     // unsigned - divide r1 by r2, places quotient into r0 - remainder is not calculated (use msub)
        add x14, x12, '0'
        strb w14,[x17]

inside:

        msub x10,x15,x12,x19 //msub r0,r1,r2,r3  // load r0 with r3-(r1*r2) (useful for calculating remainders)
        add x10, x10, '0'
        strb w10,[x20]

        //Printing msg

        mov     x0, 1           /* file descriptor: 1 is stdout */
        adr     x1, msg         /* message location (memory address) */
        mov     x2, len         /* message length (bytes) */


        // call syscall

        mov     x8, 64          /* write is syscall #64 */
        svc     0               /* invoke syscall */


        add x19, x19, 1 //counter +1
        cmp x19, max
        b.ne loop


        //return

        mov     x0, 0           /* status -&amp;gt; 0 */
        mov     x8, 93          /* exit is syscall #93 */
        svc     0               /* invoke syscall */

.data
msg:    .ascii      "Loop: #\n"
len=    . - msg

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As mentioned by the professor in the lab requirements, we had to use &lt;code&gt;udiv&lt;/code&gt; and calculate the reminder in order to get 2 decimal cases.&lt;br&gt;
Problems I have encountered:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;Assembly is hard, all these registers can get messy, so I started taking notes on what register means what. This way is easier for me to control how many I used and for what I have used them for.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/AArch64_Register_and_Instruction_Quick_Start"&gt;cheat sheet&lt;/a&gt; is essential, and even with it still confusing to understand sometimes how it works.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Infinite loops and characters, these two were my most common results when trying to make it work. My code was going over the ACII &lt;code&gt;:&lt;/code&gt; many times and sometimes the loop never ended.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;h2&gt;
  
  
  x86_64
&lt;/h2&gt;

&lt;p&gt;The x86_64 proved harder than the AArch64.&lt;br&gt;
As the professor said, we need more steps to make the division work which makes the code longer and harder to make it work.&lt;br&gt;
I could not make the code work, it compiles and run but there is no output printed.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.text
.globl  _start

min = 0                         /* starting value for the loop index; note that this is a symbol (constant), not a variable */
max = 30                       /* loop exits when the index hits this number (loop condition is i&amp;lt;max) */
_start:

        mov     $min,%r15           /* loop index */
        mov     $10, %r14
    mov     $10, %r13
        mov     %r15,%r13       /* move 15 to 13 */
        mov     %r15,%r14       /* move 15 to 14 */

        mov     $10,%r8
        movq    $len,%rdx                       /* message length */


loop:

        mov     $0, %r11
            mov     %r9, %r12

            div     %r8    

        add     $'0',%12

        movb    %r14b,msg+6      /* Put r14 into msg+6 first #*/


inner: 

        add     $'0', %11
        movb    %r13b,msg+7 



             movq    $msg,%rsi                       /* message location */
            movq    $1,%rdi                         /* file descriptor stdout */
        movq    $1,%rax                         /* syscall sys_write */
        syscall


        inc     %r15                /* increment index */
        cmp     $max,%r15           /* see if we're done */
        jne     loop                /* loop if we're not */



        movq    $0,%rdi                         /* exit status */
        movq    $60,%rax                        /* syscall sys_exit */
        syscall

.section .data

msg:    .ascii      "Loop: ##\n"
        len = . - msg

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;h2&gt;
  
  
  Finally
&lt;/h2&gt;

&lt;p&gt;It is hard. To learn assembly language is hard, and to learn two different systems at the same time makes it even harder.&lt;br&gt;
Its clear that my preference is the AArch64, I think its easier for me and the registers are easier to remember.&lt;br&gt;
Im pretty sure im close to make the x86-64 version of the loop to work, just need to figure out how to understand and work with the registers a little more.&lt;br&gt;
I will certainly go back to it.&lt;/p&gt;

&lt;p&gt;Thank you for reading.&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 - W5</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Tue, 01 Mar 2022 22:04:18 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-w5-20p1</link>
      <guid>https://dev.to/xguhx/spo600-w5-20p1</guid>
      <description>&lt;h2&gt;
  
  
  Hello there,
&lt;/h2&gt;

&lt;p&gt;This time we will talk about what we learned in Week 5 in SPO600 classes. Make, Makefiles, and different computer architectures (x86_64 and AARCH64).&lt;/p&gt;

&lt;h2&gt;
  
  
  Makes and Makefiles
&lt;/h2&gt;

&lt;p&gt;In week 5, we talked about Makes and Makefiles, both used when building software.&lt;br&gt;
Let’s take a look on both and see how they work:&lt;/p&gt;
&lt;h3&gt;
  
  
  Make
&lt;/h3&gt;

&lt;p&gt;Make is a &lt;code&gt;specialized scripting language used to build software&lt;/code&gt; according to or &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Make_and_Makefiles"&gt;WIKI&lt;/a&gt;. &lt;br&gt;
The interesting about &lt;code&gt;make&lt;/code&gt; is that its commands are not executed in a linear logic as  we are used to, instead they follow a input and output order and &lt;code&gt;make&lt;/code&gt; will automatically sequence the order for us.&lt;/p&gt;

&lt;p&gt;When we run the &lt;code&gt;make&lt;/code&gt; command, it will execute the &lt;code&gt;makefile&lt;/code&gt;, but what is it?&lt;/p&gt;
&lt;h3&gt;
  
  
  Makefile
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;makefile&lt;/code&gt; is a script, a set of commands with variables and targets to create a file.&lt;br&gt;
A &lt;code&gt;makefile&lt;/code&gt; looks like this (example from our &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Make_and_Makefiles"&gt;WIKI&lt;/a&gt;):&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;CC=cc
CFLAGS=-O3

all:         half double

half:        half.o sauce.o
             ${CC} ${CFLAGS} -o half half.o sauce.o

double:      double.o sauce.o
             ${CC} ${CFLAGS} -o double double.o sauce.o

half.o:      half.c number.h
             ${CC} ${CFLAGS} -c half.c

double.o:    double.c number.h
             ${CC} ${CFLAGS} -c double.c

sauce.o:     sauce.c
             ${CC} ${CFLAGS} -c sauce.c

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So when the &lt;code&gt;make&lt;/code&gt; is exectured, it will use this &lt;code&gt;makefile&lt;/code&gt; as guide and 5 compilations will be performed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;$ make
cc         -O3 -c half.c
cc         -O3 -c sauce.c
cc         -O3 -o half half.o sauce.o
cc         -O3 -c double.c
cc         -O3 -o double double.o sauce.o
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I like to imagine that &lt;code&gt;make&lt;/code&gt; is the cook and &lt;code&gt;makefile&lt;/code&gt; is the recipe for a software.&lt;/p&gt;




&lt;h2&gt;
  
  
  x86_64 vs AArch64
&lt;/h2&gt;

&lt;p&gt;Let’s talk about the register differences between x86_64 and AArch64:&lt;/p&gt;

&lt;h3&gt;
  
  
  x86_64
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;x86 - The Intel/AMD architecture which debuted with the Intel 8086 processor (16-bit), gained desktop and server dominance as the 386/486/x86 32-bit architecture, and was extended by AMD to the 64-bit x86_64 architecture. Intel and AMD vigorously compete with x86_64 CPUs, which continue as the preeminent server architecture and most popular desktop architecture.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;source: &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Computer_Architecture"&gt;WIKI&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The general-purpose registers:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• rax - register a extended
• rbx - register b extended
• rcx - register c extended
• rdx - register d extended
• rbp - register base pointer (start of stack)
• rsp - register stack pointer (current location in stack, growing downwards)
• rsi - register source index (source for data copies)
• rdi - register destination index (destination for data copies)
• r8 - register 8
• r9 - register 9
• r10 - register 10
• r11 - register 11
• r12 - register 12
• r13 - register 13
• r14 - register 14
• r15 - register 15

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  AArch64
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;ARM - An architecture which started with the Acorn computer company, became the dominant mobile and embedded architecture in its 32-bit incarnations, and was extended to 64-bit in version 8 (ARMv8) with the AArch64 mode. 64-bit ARM processors are dominant in smartphone applications and starting to be compete in server and high-performance computing systems.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;source: &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Computer_Architecture"&gt;WIKI&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;• r0 through r30 – general registers
• x0 through x30 - for 64-bit-wide access (same registers)
• w0 through w30 - for 32-bit-wide access (same registers - upper 32 bits are either cleared on load or sign-extended (set to the value of the most significant bit of the loaded value)).

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>SPO600 - Week 4</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Fri, 18 Feb 2022 19:43:20 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-week-4-ihj</link>
      <guid>https://dev.to/xguhx/spo600-week-4-ihj</guid>
      <description>&lt;h2&gt;
  
  
  Hey there,
&lt;/h2&gt;

&lt;p&gt;In week 4 we looked on how to compiler optimize our code. It was made clear to us that the way we write code actually will be rewritten by the compiler, so our code gets optimized.&lt;/p&gt;

&lt;p&gt;The optimization that the compiler does, most of the time is to make the code achieve the same result but with improved performance, in order to do that it changes our code in certain ways that we will take a look right now.&lt;/p&gt;

&lt;p&gt;It is important to say that we, as programmers, should not worry about doing this optimization ourselves, the compiler will automatically do it for us.&lt;/p&gt;

&lt;h2&gt;
  
  
  About Code Rewriting Optimizations
&lt;/h2&gt;

&lt;p&gt;There is a series of techniques of Code Rewriting Optimizations that the compiler does like: &lt;code&gt;Strength Reduction&lt;/code&gt;, &lt;code&gt;Hoisting&lt;/code&gt;, &lt;code&gt;Hoisting II - Loop-Invariant Expression&lt;/code&gt;, &lt;code&gt;Pre-calculation of Constants&lt;/code&gt; and others… More can be seen on our wiki page &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options"&gt;here&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Strength Reduction
&lt;/h3&gt;

&lt;p&gt;We all know that some operations are more expansive than others, for example a multiplication is more expansive than an addition.&lt;br&gt;
The Strength Reduction is nothing less than replacing a more expansive operation for a cheaper one but that will achieve the same result.&lt;/p&gt;

&lt;p&gt;Take a look on the example extracted from &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options"&gt;https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options&lt;/a&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;//BEFORE&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;//AFTER&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="o"&gt;+=&lt;/span&gt;&lt;span class="mi"&gt;6&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%d"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hoisting
&lt;/h3&gt;

&lt;p&gt;Hoisting involves moving an operation outside of the loop.&lt;br&gt;
If that operation inside the loop is resulting in the same result as if it was outside of the loop, the compiler will just move it outside.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="c1"&gt;//For example&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;

&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; 
&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;//Will become: &lt;/span&gt;

&lt;span class="n"&gt;var1&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;

&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="o"&gt;+&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Hoisting II - Loop-Invariant Expression
&lt;/h3&gt;

&lt;p&gt;Hoisting II - Loop-Invariant Expression will get an expansive calculation inside a loop and bring it outside so it can be made just once, then the result will be placed back on its former place.&lt;/p&gt;

&lt;p&gt;For example,&lt;/p&gt;

&lt;p&gt;Before:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;//After:&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;  &lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="mi"&gt;5&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="o"&gt;++&lt;/span&gt;&lt;span class="p"&gt;){&lt;/span&gt;
&lt;span class="n"&gt;foo&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Pre-calculation of Constants
&lt;/h3&gt;

&lt;p&gt;Pre-calculation of Constants is basically getting a constant expression that will never change and replace it for its result.&lt;/p&gt;

&lt;p&gt;See the example from &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options"&gt;https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Before:&lt;br&gt;
ff = (212-32)/100;   /* factor for celcius-farenheit conversion */&lt;br&gt;
conv = c * ff + 32;&lt;/p&gt;

&lt;p&gt;After:&lt;br&gt;
conv = c * 1.800 + 32;&lt;/p&gt;

&lt;p&gt;There is many that you can read on our &lt;a href="https://wiki.cdot.senecacollege.ca/wiki/Compiler_Optimizations#GCC_Optimization_Options"&gt;wiki&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I learned
&lt;/h2&gt;

&lt;p&gt;The code that I write as a programmer, and the code that will be transformed in machine language are actually different. The compiler will get everything I have done and will optimize it. &lt;/p&gt;

&lt;p&gt;The compiler is smarter than me, that’s not hard to see, but its good to know that it will take care of these kind of optimizations for me.&lt;/p&gt;

</description>
      <category>beginners</category>
    </item>
    <item>
      <title>SPO600 - Lab03</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 07 Feb 2022 19:20:36 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-lab03-cc4</link>
      <guid>https://dev.to/xguhx/spo600-lab03-cc4</guid>
      <description>&lt;p&gt;For lab 03, we were supposed to create a basic game or calculator using 6502 assembly language.&lt;/p&gt;

&lt;h2&gt;
  
  
  My idea
&lt;/h2&gt;

&lt;p&gt;We the professor noticed us, Assembly language is hard, and we should keep it simple as much as we can.&lt;br&gt;
My idea was to create a simple calculator that would divide any number from 0 to 9 by 2 using LSR(Logical Shift Right).&lt;/p&gt;
&lt;h2&gt;
  
  
  The way it works
&lt;/h2&gt;

&lt;p&gt;First I display a message to the user:&lt;br&gt;
&lt;code&gt;Type a number from 0 to 9:&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;I did it using &lt;code&gt;dcb&lt;/code&gt; and &lt;code&gt;CHROUT&lt;/code&gt;, this way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LDY #$00

nextchar:  ;DISPLAY INITIAL MESSAGE

    LDA msg,Y
    BEQ getnumber
    JSR CHROUT
    INY
    BNE nextchar    

msg:
dcb "T","y","p","e",32,"a",32,"n","u","m","b","e","r",32,"f","r","o","m",32,"0",32,"t","o",32,"9",":",0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then I get the input from the user using &lt;code&gt;CHRIN&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;getnumber:   ; ACCEPT ONLY NUMBERS

    LDY #$00
    JSR CHRIN

    CMP #$00
    BEQ getnumber

    CMP #$30
    BMI getnumber

    CMP #$39
    BPL getnumber

    LDX A ;Load x register with A 

    JSR CHROUT

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code means accept only numbers from 0 to 9, if user enters something different than that then it will be ignored.&lt;/p&gt;

&lt;p&gt;After that I display the following message:&lt;br&gt;
&lt;code&gt;Your number dividev by to is:&lt;/code&gt;&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dividemsg: ;DISPLAY DIVIDE MESSAGE

    LDA msg2,y
    BEQ result
    JSR CHROUT
    INY
    BNE dividemsg   

msg2:
dcb $0d,"Y","o","u","r",32,"n","u","m","b","e","r",32,"d","i","v","i","d","e","d",32,"b","y",32,"2",32,"i","s",":",0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After the message is displayed the result should be outputs in the &lt;code&gt;result&lt;/code&gt; label:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;result:  ; GET THE RESULT - LSR DIVIDE BY 2

    LDY #$00
    CPX #$00 ;Compare X with 0
    BEQ zero ;Display divide by 0 message

    LSR X   ;Divide by 2
    LDA X   ; load result in A
    JSR CHROUT ;Display the result

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;It is important to say that I could not make this part work properly, the result is not displaying to the user.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And if the user is trying to divide by 0, then a message should be displayed:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;zero:   ;MESSAGE IN CASE DIVIDING BY 0

    LDA msg3,y
    BEQ done
    JSR CHROUT
    INY
    BNE zero


done: BRK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The entire code can be found here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;
define      SCINIT      $ff81 ; initialize/clear screen
define      CHRIN       $ffcf ; input character from keyboard
define      CHROUT      $ffd2 ; output character to screen
define      SCREEN      $ffed ; get screen size
define      PLOT        $fff0 ; get/set cursor coordinates


    LDY #$00

nextchar:  ;DISPLAY INITIAL MESSAGE

    LDA msg,Y
    BEQ getnumber
    JSR CHROUT
    INY
    BNE nextchar    

getnumber:   ; ACCEPT ONLY NUMBERS

    LDY #$00
    JSR CHRIN

    CMP #$00
    BEQ getnumber

    CMP #$30
    BMI getnumber

    CMP #$39
    BPL getnumber

    LDX A ;Load x register with A 

    JSR CHROUT



LDY #$00

dividemsg: ;DISPLAY DIVIDE MESSAGE

    LDA msg2,y
    BEQ result
    JSR CHROUT
    INY
    BNE dividemsg   


result:  ; GET THE RESULT - LSR DIVIDE BY 2

    LDY #$00
    CPX #$00 ;Compare X with 0
    BEQ zero ;Display divide by 0 message

    LSR X   ;Divide by 2
    LDA X   ; load result in A
    JSR CHROUT ;Display the result



zero:   ;MESSAGE IN CASE DIVIDING BY 0

    LDA msg3,y
    BEQ done
    JSR CHROUT
    INY
    BNE zero


done: BRK

msg:
dcb "T","y","p","e",32,"a",32,"n","u","m","b","e","r",32,"f","r","o","m",32,"0",32,"t","o",32,"9",":",0


msg2:
dcb $0d,"Y","o","u","r",32,"n","u","m","b","e","r",32,"d","i","v","i","d","e","d",32,"b","y",32,"2",32,"i","s",":",0

msg3:
dcb 32,"C","a","n","n","o","t",32,"d","i","v","i","d","e",32,"0","!",0

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
    </item>
    <item>
      <title>SPO600 - W3</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 07 Feb 2022 17:18:59 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-w3-46bf</link>
      <guid>https://dev.to/xguhx/spo600-w3-46bf</guid>
      <description>&lt;p&gt;In week 3 was time to learn about: &lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;  Math&lt;/li&gt;
&lt;li&gt;   Characters&lt;/li&gt;
&lt;li&gt;   and strings!&lt;/li&gt;
&lt;li&gt;  Branches, Jumps and Procedures&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  About Math
&lt;/h3&gt;

&lt;blockquote&gt;
&lt;p&gt;6502 Assembly language can perform calculations on binary and decimal mode.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;In binary, it will perform the operations on a 8-bite value.&lt;br&gt;
In decimal, the bytes are treated as 2 decimal digits.&lt;/p&gt;

&lt;p&gt;The operations are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;ADC (Add with Carry)&lt;br&gt;
This will perform the following:  &lt;code&gt;the value in the accumulator + the specified byte + the carry flag&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;SBC (Subtract with Carry)&lt;br&gt;
This will perform the following:  &lt;code&gt;the value in the accumulator - the specified byte - (not Carry)&lt;/code&gt; &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;LSR (Logical Shift Right)&lt;br&gt;
This will perform a division by 2.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;ASL (Arithmetic Shift Left)&lt;br&gt;
This will perform a multiplication by 2.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;
&lt;h3&gt;
  
  
  About Characters and Strings
&lt;/h3&gt;

&lt;p&gt;In 6502, we can use &lt;code&gt;dcb&lt;/code&gt; to make a string. Keep in mind that if you want to use &lt;code&gt;enter&lt;/code&gt; you need to use &lt;code&gt;$0d&lt;/code&gt; and &lt;code&gt;32&lt;/code&gt; for space, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;msg:
dcb "T","y","p","e",32,"a",32,"n","u","m","b","e","r",32,"f","r","o","m",32,"0",32,"t","o",32,"9",":",0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;0&lt;/code&gt; at the end is to control the end of the string.&lt;/p&gt;

&lt;p&gt;Having this, we can use &lt;code&gt;JSR CHROUT&lt;/code&gt; to output the values to the screen.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;LDY #$00
nextchar:  ;DISPLAY INITIAL MESSAGE

    LDA msg,Y
    BEQ getnumber
    JSR CHROUT
    INY
    BNE nextchar    
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;






&lt;p&gt;For Input, we can use &lt;code&gt;CHRIN&lt;/code&gt; to get input from user.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;getnumber:   ; ACCEPT ONLY NUMBERS

    LDY #$00
    JSR CHRIN

    CMP #$00
    BEQ getnumber

    CMP #$30
    BMI getnumber

    CMP #$39
    BPL getnumber

    JSR CHROUT
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This code will accept only a number from 0 to 9 from the user.&lt;/p&gt;

&lt;h3&gt;
  
  
  Branches, Jumps and Procedures
&lt;/h3&gt;

&lt;p&gt;Branches, Jumps and Procedures are how we can control the flow of our program.&lt;br&gt;
Branches can be used for conditional jumps&lt;br&gt;
For example:&lt;br&gt;
We can use CMP #$00 (Compare to 0) and then BNE branchName (Branch if not equal) or BEQ branchName (branch if eqqual).&lt;/p&gt;

&lt;p&gt;Jumps can be used to perform a change in the branch independently of any conditions. &lt;br&gt;
For Exemple:&lt;br&gt;
JMP done will bring you to the done label.&lt;/p&gt;

&lt;p&gt;The procedures will transfer the control to a subroutine.&lt;/p&gt;

&lt;p&gt;For example, if you want to execute a subroutine you can:&lt;br&gt;
JSR myfunction&lt;/p&gt;

&lt;p&gt;Myfunction:     …&lt;br&gt;
        RTN&lt;/p&gt;

&lt;p&gt;Note that in your subroutine you need to return.&lt;/p&gt;

&lt;p&gt;The system have some already pre defined sobroutines:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;define      SCINIT      $ff81 ; initialize/clear screen
define      CHRIN       $ffcf ; input character from keyboard
define      CHROUT  $ffd2 ; output character to screen
define      SCREEN      $ffed ; get screen size
define      PLOT        $fff0 ; get/set cursor coordinates
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  This was it for W3
&lt;/h3&gt;

&lt;p&gt;Thank you so much for reading!&lt;/p&gt;

</description>
    </item>
    <item>
      <title>SPO600 - LAB 02</title>
      <dc:creator>Gustavo Tavares</dc:creator>
      <pubDate>Mon, 31 Jan 2022 22:41:16 +0000</pubDate>
      <link>https://dev.to/xguhx/spo600-lab-02-ei5</link>
      <guid>https://dev.to/xguhx/spo600-lab-02-ei5</guid>
      <description>&lt;p&gt;Lab 02 - 6502!&lt;/p&gt;

&lt;p&gt;This lab goal is to calculate the performance of  a 6502 program.&lt;br&gt;
The program is this one and can be run &lt;a href="//6502%20assembler/simulator%20(cdot.systems)"&gt;HERE&lt;/a&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;    lda #$00    ; set a pointer at $40 to point to $0200
    sta $40
    lda #$02
    sta $41

    lda #$07    ; colour number

    ldy #$00    ; set index to 0

loop:   sta ($40),y ; set pixel at the address (pointer)+Y

    iny     ; increment index
    bne loop    ; continue until done the page

    inc $41     ; increment the page
    ldx $41     ; get the current page number
    cpx #$06    ; compare with 6
    bne loop    ; continue until done all pages

&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In order to calculate how many seconds, the program will take to run, we need to calculate how many cycles each instruction will take and how much time each instruction will take to be completed.&lt;/p&gt;

&lt;p&gt;To do that, we have to take a look at &lt;a href="https://www.masswerk.at/6502/6502_instruction_set.html"&gt;this&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;This website provide us the clock values for each instruction, this way we are able to measure the total time that the program will take at 1MHz.&lt;/p&gt;

&lt;h2&gt;
  
  
  Lets Calculate Performance
&lt;/h2&gt;

&lt;p&gt;Lets start with our first instruction:&lt;br&gt;
&lt;code&gt;lda #$00&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--kd5-Co5v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1sdfgc2vy0ronedo1z9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--kd5-Co5v--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/1sdfgc2vy0ronedo1z9h.png" alt="Image description" width="880" height="532"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Lda is our first instruction, so now lets take a look at the website to see the cycles values:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--G-VEFC1e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ftwbtrsskb2ipede7qvq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--G-VEFC1e--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/ftwbtrsskb2ipede7qvq.png" alt="Image description" width="880" height="526"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Here we can see that LDA takes 2 cycles, because we are using immediate addressing.&lt;/p&gt;

&lt;p&gt;To keep track, lets make a chart with all the values:&lt;/p&gt;

&lt;p&gt;As we can see in my chart, we are getting the number of cycles, how many times they will happen, the alternative cycles and how many times they will happen and making a sum of the total cycles.&lt;/p&gt;

&lt;p&gt;Here we can see the final number of cycles are multiplied by the CPU speed, giving us the time needed to run the program.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--n2ZnOgfo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8ul375i0v2vxv10wwlgk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--n2ZnOgfo--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8ul375i0v2vxv10wwlgk.png" alt="Image description" width="395" height="228"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I tried to optimize it and make it run faster but I am finding difficult to work with the pages, my goal was to make for each loop a pixel on each page to become yellow, this way maybe it would be faster but I could not make it work to calculate its speed.&lt;/p&gt;

&lt;h3&gt;
  
  
  Experimenting:
&lt;/h3&gt;

&lt;p&gt;The first program will give us this screen: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--CFZ4CLaS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6h27jk2ts1euuf8lpfep.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--CFZ4CLaS--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/6h27jk2ts1euuf8lpfep.png" alt="Image description" width="305" height="559"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;When experimenting with &lt;code&gt;tya&lt;/code&gt; instruction, we will have this result:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--RtJjcemG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jn71wnrcn15t83crbotp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--RtJjcemG--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/jn71wnrcn15t83crbotp.png" alt="Image description" width="327" height="598"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;We can see now that we have stripes with different colors in the screen, &lt;code&gt;tya&lt;/code&gt; means &lt;code&gt;transfer Index Y to accumulator&lt;/code&gt; which means that every time the loop happens the accumulator (which it’s the color) will get incremented by one.&lt;/p&gt;

&lt;p&gt;But when trying to use &lt;code&gt;lsa&lt;/code&gt; after &lt;code&gt;tya&lt;/code&gt; I keep getting this error: &lt;/p&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--TDTueijy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5bf7xksnwn5kbijkwk3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--TDTueijy--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/5bf7xksnwn5kbijkwk3r.png" alt="Image description" width="423" height="202"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  My thoughts on this Lab
&lt;/h2&gt;

&lt;p&gt;It is really cool to understand how to calculate the cycles and the time that takes for a 6502 program to run, this way we are able to make tests to make it faster.&lt;br&gt;
But I feel like sometimes working with memory addresses can be complicated and confusing, there is many ways to write it and I feel like I need more time to understand it better.&lt;br&gt;
I could not optimize the program the way I wanted to, which made me frustrated but I am decided to come back to this lab in the future and try it again.&lt;/p&gt;

&lt;p&gt;Thank you for reading!&lt;/p&gt;

</description>
    </item>
  </channel>
</rss>
