<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Akira</title>
    <description>The latest articles on DEV Community by Akira (@_akira19).</description>
    <link>https://dev.to/_akira19</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F120249%2F1ccd20c7-a9fe-4aa0-9eac-0025f7b93d0d.jpg</url>
      <title>DEV Community: Akira</title>
      <link>https://dev.to/_akira19</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/_akira19"/>
    <language>en</language>
    <item>
      <title>Creating a Toy Solidity compiler and running it in a Toy EVM</title>
      <dc:creator>Akira</dc:creator>
      <pubDate>Wed, 20 Nov 2024 11:08:40 +0000</pubDate>
      <link>https://dev.to/_akira19/creating-a-toy-solidity-compiler-and-running-it-in-a-toy-evm-28jo</link>
      <guid>https://dev.to/_akira19/creating-a-toy-solidity-compiler-and-running-it-in-a-toy-evm-28jo</guid>
      <description>&lt;p&gt;I wanted to try making my own compiler and runtime environment, so I gave it a shot. It doesn’t cover all the syntax or opcodes yet, but I’m satisfied as long as it can run basic functionality. I’ll make some improvements if I feel like it.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/akira-19/toythereum" rel="noopener noreferrer"&gt;https://github.com/akira-19/toythereum&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Demo
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;contract TestContract {
    uint256 a = 1 + 2 * 3 + 4;

    function increment() returns (uint256) {
        a = a + 1;
        return a;
    }

    function compare(uint256 b) returns (bool) {
        return a &amp;lt; b;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When you compile the code, you can get the following bytecode.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;7f00000000000000000000000000000000000000000000000000000000000000807f0000000000000000000000000000000000000000000000000000000000000040527f00000000000000000000000000000000000000000000000000000000000000047f00000000000000000000000000000000000000000000000000000000000000037f00000000000000000000000000000000000000000000000000000000000000027f00000000000000000000000000000000000000000000000000000000000000017f000000000000000000000000000000000000000000000000000000000000000b7f0000000000000000000000000000000000000000000000000000000000000000557f000000000000000000000000000000000000000000000000000000000000032f807f00000000000000000000000000000000000000000000000000000000000001515f395ff37f00000000000000000000000000000000000000000000000000000000000000807f0000000000000000000000000000000000000000000000000000000000000040527f00000000000000000000000000000000000000000000000000000000000000e07f0000000000000000000000000000000000000000000000000000000000000000351c7f00000000000000000000000000000000000000000000000000000000d09de08a147f0000000000000000000000000000000000000000000000000000000000000196577f00000000000000000000000000000000000000000000000000000000000000e07f0000000000000000000000000000000000000000000000000000000000000000351c7f00000000000000000000000000000000000000000000000000000000c4b8c257147f0000000000000000000000000000000000000000000000000000000000000283577f00000000000000000000000000000000000000000000000000000000000000007f0000000000000000000000000000000000000000000000000000000000000000fd5b7f00000000000000000000000000000000000000000000000000000000000000017f000000000000000000000000000000000000000000000000000000000000000054017f0000000000000000000000000000000000000000000000000000000000000000557f0000000000000000000000000000000000000000000000000000000000000000547f0000000000000000000000000000000000000000000000000000000000000080527f00000000000000000000000000000000000000000000000000000000000000207f0000000000000000000000000000000000000000000000000000000000000080f35b7f0000000000000000000000000000000000000000000000000000000000000004357f000000000000000000000000000000000000000000000000000000000000000054107f0000000000000000000000000000000000000000000000000000000000000080527f00000000000000000000000000000000000000000000000000000000000000207f0000000000000000000000000000000000000000000000000000000000000080f3
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;demo is here. (I don't know why, but I can embed gif file on the article)&lt;br&gt;
&lt;a href="https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8l5vmjmwwkp2iq85s7gd.gif" rel="noopener noreferrer"&gt;https://dev-to-uploads.s3.amazonaws.com/uploads/articles/8l5vmjmwwkp2iq85s7gd.gif&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;In the demo above, the following actions are performed:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Run the EVM.
&lt;/li&gt;
&lt;li&gt;Deploy the compiled bytecode (returns the contract code).
&lt;/li&gt;
&lt;li&gt;Execute &lt;code&gt;compare&lt;/code&gt; with argument &lt;code&gt;b&lt;/code&gt; as 12 (the initial value of &lt;code&gt;a&lt;/code&gt; is 11, so it returns &lt;code&gt;true&lt;/code&gt;, which is 1).
&lt;/li&gt;
&lt;li&gt;Execute &lt;code&gt;increment&lt;/code&gt; (returns 12 in hexadecimal).
&lt;/li&gt;
&lt;li&gt;Execute &lt;code&gt;compare&lt;/code&gt; again with argument &lt;code&gt;b&lt;/code&gt; as 12 (since 12 == 12, it returns &lt;code&gt;false&lt;/code&gt;, which is 0).
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;BTW, if you make a mistake in the return type as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;contract TestContract {
    uint256 a = 1 + 2 * 3 + 4;

    function compare(uint256 b) returns (uint256) {
        return a &amp;lt; b; // Compilation error occurs here because the return type is mismatched
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The error message looks like:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o7zamstxocc0m7tqj1k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9o7zamstxocc0m7tqj1k.png" alt="error message" width="800" height="45"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  EVM
&lt;/h2&gt;

&lt;p&gt;The EVM is a stack-based virtual machine.&lt;/p&gt;

&lt;h3&gt;
  
  
  Memory
&lt;/h3&gt;

&lt;h3&gt;
  
  
  Pre-allocated Memory
&lt;/h3&gt;

&lt;p&gt;While inspecting compiled bytecode, I noticed that &lt;code&gt;mstore&lt;/code&gt; often takes &lt;code&gt;0x40&lt;/code&gt;, so I looked into it.&lt;br&gt;&lt;br&gt;
Depending on the version, this might differ, but here is the reference I checked:&lt;br&gt;&lt;br&gt;
&lt;a href="https://docs.soliditylang.org/en/v0.8.27/internals/layout_in_memory.html" rel="noopener noreferrer"&gt;https://docs.soliditylang.org/en/v0.8.27/internals/layout_in_memory.html&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;The first 128 bytes of memory have specific purposes:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;0x00 - 0x3f (64 bytes):&lt;/strong&gt; Scratch space for hashing methods.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;0x40 - 0x5f (32 bytes):&lt;/strong&gt; Current memory size (referred to as the free memory pointer).&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;0x60 - 0x7f (32 bytes):&lt;/strong&gt; Zero slot.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The scratch space is reserved for temporary use in hashing methods. If the allocated space is insufficient, it will use free memory.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The free memory pointer specifies the next available memory address. This means that the first &lt;code&gt;mstore&lt;/code&gt; operation must store data at &lt;code&gt;0x80&lt;/code&gt; since the first 128 bytes (up to &lt;code&gt;0x7f&lt;/code&gt;) are already reserved. The typical first instruction in a contract is to &lt;code&gt;mstore&lt;/code&gt; &lt;code&gt;0x80&lt;/code&gt; at address &lt;code&gt;0x40&lt;/code&gt;.  &lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;The zero slot is used as the default value for dynamic arrays. It remains filled with &lt;code&gt;0x00&lt;/code&gt; and can be directly copied for initialization purposes.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;
&lt;h4&gt;
  
  
  Memory Allocation
&lt;/h4&gt;

&lt;p&gt;Memory is allocated in 32-byte chunks.&lt;br&gt;&lt;br&gt;
Since the first 128 bytes are pre-allocated, the first writable memory address is &lt;code&gt;0x80&lt;/code&gt;.&lt;br&gt;&lt;br&gt;
Memory expands dynamically when needed. For example, if a 30-byte string is replaced with a 40-byte string, memory will automatically expand from &lt;code&gt;0x80-0x9f&lt;/code&gt; to &lt;code&gt;0x80-0xbf&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;However, memory expansion incurs additional gas costs, so it should be managed carefully. Also, memory does not get freed automatically, even when it is no longer in use.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://learnevm.com/chapters/evm/memory#memory-expansion" rel="noopener noreferrer"&gt;https://learnevm.com/chapters/evm/memory#memory-expansion&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;If the next memory space is already occupied, new memory is allocated at the end of the current memory.  &lt;/p&gt;

&lt;p&gt;&lt;a href="https://medium.com/@PaulElisha1/layout-of-storage-and-memory-management-in-low-level-solidity-feb41c3de140" rel="noopener noreferrer"&gt;https://medium.com/@PaulElisha1/layout-of-storage-and-memory-management-in-low-level-solidity-feb41c3de140&lt;/a&gt;  &lt;/p&gt;
&lt;h3&gt;
  
  
  Storage
&lt;/h3&gt;
&lt;h4&gt;
  
  
  Slots
&lt;/h4&gt;

&lt;p&gt;Unlike memory, storage is pre-allocated with &lt;code&gt;2^256-1&lt;/code&gt; slots, each 32 bytes in size.  &lt;/p&gt;

&lt;p&gt;For fixed-size data types of 32 bytes or less, data is stored sequentially. For dynamically sized types, the following rules apply:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The slot stores the length of the data.&lt;/li&gt;
&lt;li&gt;The actual data is stored in the slot calculated by &lt;code&gt;keccak256(slot)&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, if slot &lt;code&gt;3&lt;/code&gt; is used:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Slot &lt;code&gt;(3)&lt;/code&gt; stores the data length.&lt;/li&gt;
&lt;li&gt;The actual data is stored at &lt;code&gt;keccak256(3)&lt;/code&gt;, e.g., &lt;code&gt;55b2c6...&lt;/code&gt;. If the data exceeds 32 bytes, subsequent slots after &lt;code&gt;55b2c6...&lt;/code&gt; are used. Arrays follow a similar pattern.
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My implementation works only for strings of 32 bytes or less at the moment.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.alchemy.com/docs/smart-contract-storage-layout" rel="noopener noreferrer"&gt;https://docs.alchemy.com/docs/smart-contract-storage-layout&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;For multidimensional arrays or mappings, slot numbers and key values are factored into the calculations. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://docs.soliditylang.org/en/latest/internals/layout_in_storage.html" rel="noopener noreferrer"&gt;https://docs.soliditylang.org/en/latest/internals/layout_in_storage.html&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;Currently, the implementation maps slots and values using Rust's &lt;code&gt;HashMap&lt;/code&gt;, but in practice, a Merkle Patricia Trie is used.&lt;/p&gt;
&lt;h3&gt;
  
  
  Function Definitions
&lt;/h3&gt;

&lt;p&gt;Functions are identified using a function selector, derived by taking the first 4 bytes of the &lt;code&gt;keccak256&lt;/code&gt; hash of the function signature.&lt;br&gt;&lt;br&gt;
&lt;a href="https://docs.soliditylang.org/en/latest/abi-spec.html" rel="noopener noreferrer"&gt;https://docs.soliditylang.org/en/latest/abi-spec.html&lt;/a&gt;  &lt;/p&gt;

&lt;p&gt;For example, for a function like &lt;code&gt;function mint(uint256)&lt;/code&gt;,&lt;br&gt;&lt;br&gt;
&lt;code&gt;keccak256(mint(uint256)) = 0xa0712d680358d64f694230b7cc0b277c73686bdf768385d55cd7c547d0ffd30e&lt;/code&gt;.&lt;br&gt;&lt;br&gt;
The function selector is &lt;code&gt;0xa0712d68&lt;/code&gt;.  &lt;/p&gt;

&lt;p&gt;When a transaction is sent, the first 4 bytes of the &lt;code&gt;calldata&lt;/code&gt; contain the function selector. The contract compares this selector to the stored selector and jumps to the corresponding function if they match.  &lt;/p&gt;

&lt;p&gt;If you compile a simple contract in Remix and extract the opcodes for the function call, you will see something like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;- CALLDATALOAD
- PUSH1 0xe0
- SHR
- DUP1
- PUSH4 0x6057361d
- EQ
- PUSH1 0x2a
- JUMPI
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Explanation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;CALLDATALOAD&lt;/code&gt;, &lt;code&gt;PUSH1 0xe0&lt;/code&gt;, and &lt;code&gt;SHR&lt;/code&gt;: Extract the first 4 bytes of the calldata (the function selector).&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;PUSH4 0x6057361d&lt;/code&gt;, &lt;code&gt;EQ&lt;/code&gt;: Compare the extracted function selector with the stored selector.&lt;/li&gt;
&lt;li&gt;If they match, &lt;code&gt;PUSH1 0x2a&lt;/code&gt; and &lt;code&gt;JUMPI&lt;/code&gt; jump to the function (0x2a is the program counter in the bytecode).&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  Calldata
&lt;/h3&gt;

&lt;p&gt;The first 4 bytes of the calldata represent the function selector. The arguments are stored in 32-byte chunks following the selector.&lt;br&gt;&lt;br&gt;
For simplicity, this assumes all arguments fit within 32 bytes, so the number of arguments multiplied by 32 bytes follows the function selector.&lt;/p&gt;

&lt;h2&gt;
  
  
  Solidity Compiler
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Syntax
&lt;/h3&gt;

&lt;p&gt;Unfortunately, there are no &lt;code&gt;public&lt;/code&gt; or &lt;code&gt;private&lt;/code&gt; keywords in my implementation at the moment.&lt;br&gt;&lt;br&gt;
In fact, there are no &lt;code&gt;if&lt;/code&gt; statements or &lt;code&gt;for&lt;/code&gt; loops either. Additionally, functions cannot call other functions within them.&lt;/p&gt;

&lt;h3&gt;
  
  
  Types
&lt;/h3&gt;

&lt;p&gt;Not all types are supported at the moment. Currently, only &lt;code&gt;string&lt;/code&gt;, &lt;code&gt;uint256&lt;/code&gt;, and &lt;code&gt;bool&lt;/code&gt; are available.&lt;br&gt;&lt;br&gt;
While &lt;code&gt;string&lt;/code&gt; is defined, it only works if it is 256 bits or smaller.&lt;/p&gt;

&lt;h3&gt;
  
  
  Constant Folding
&lt;/h3&gt;

&lt;p&gt;Constant folding is a process where constant expressions are precomputed at compile time instead of being evaluated at runtime on the EVM.&lt;br&gt;&lt;br&gt;
For example, in the statement &lt;code&gt;uint256 a = 1 + 2;&lt;/code&gt;, instead of executing &lt;code&gt;1 + 2&lt;/code&gt; on the EVM, the compiler calculates it during compilation and outputs &lt;code&gt;uint256 a = 3;&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;This is straightforward for simple cases, but it gets tricky when the expression includes unknown values, such as function arguments.&lt;br&gt;&lt;br&gt;
For instance, in &lt;code&gt;uint256 a = 1 + arg + 2;&lt;/code&gt; (where &lt;code&gt;arg&lt;/code&gt; is a function argument), the ideal approach would be to compute &lt;code&gt;uint256 a = 3 + arg;&lt;/code&gt; at compile time and execute only &lt;code&gt;3 + arg&lt;/code&gt; on the EVM. However, due to complexity, if an unknown value is encountered, the entire expression is left for the EVM to calculate.&lt;/p&gt;

&lt;h3&gt;
  
  
  Deployment Code
&lt;/h3&gt;

&lt;p&gt;In the EVM, there are two types of bytecode: deployment bytecode and deployed contract bytecode.&lt;br&gt;&lt;br&gt;
The deployed contract bytecode is included within the deployment bytecode and is eventually deployed using the &lt;code&gt;codecopy&lt;/code&gt; opcode (technically, it is deployed at the point of &lt;code&gt;RETURN&lt;/code&gt;).  &lt;/p&gt;

&lt;p&gt;When compiled with &lt;code&gt;solc&lt;/code&gt;, the bytecode contains additional code before and after the contract code. In &lt;code&gt;toythereum&lt;/code&gt;, the deployment process follows these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Temporarily store all contract code without compiling functions until the entire contract code is scanned.
&lt;/li&gt;
&lt;li&gt;Compile non-function code (e.g., initializing storage variables).
&lt;/li&gt;
&lt;li&gt;Add deployment-specific instructions such as &lt;code&gt;codecopy&lt;/code&gt; (use placeholders for contract code length as it is unknown at this stage).
&lt;/li&gt;
&lt;li&gt;Compile the functions (this becomes the contract code).
&lt;/li&gt;
&lt;li&gt;Calculate the length of the compiled functions and insert the length into the placeholders in the deployment instructions.
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As a result, the final bytecode consists of two parts:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The first part handles initialization (e.g., storage setup) and deployment.&lt;/li&gt;
&lt;li&gt;The second part contains the actual contract bytecode.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  References
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/akatsuki105/ToyEVM" rel="noopener noreferrer"&gt;https://github.com/akatsuki105/ToyEVM&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://github.com/msakuta/ruscal" rel="noopener noreferrer"&gt;https://github.com/msakuta/ruscal&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>solidity</category>
      <category>evm</category>
      <category>ethereum</category>
      <category>rust</category>
    </item>
    <item>
      <title>Why is a Hash Function Irreversible?</title>
      <dc:creator>Akira</dc:creator>
      <pubDate>Sat, 16 Nov 2024 04:22:24 +0000</pubDate>
      <link>https://dev.to/_akira19/why-is-a-hash-function-irreversible-453c</link>
      <guid>https://dev.to/_akira19/why-is-a-hash-function-irreversible-453c</guid>
      <description>&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Hash functions are commonly used for verifying data integrity and cryptographic purposes. In this article, I will implement the widely-used SHA-256 hash function and examine why it is irreversible (i.e., why the original value cannot be restored).&lt;/p&gt;

&lt;p&gt;The conclusion is that the irreversibility of hash functions is achieved through information loss.&lt;/p&gt;

&lt;p&gt;The Rust implementation of SHA-256 for this article is available here:&lt;br&gt;&lt;br&gt;
&lt;a href="https://github.com/akira-19/algorithms_rust/tree/main/sha-256" rel="noopener noreferrer"&gt;https://github.com/akira-19/algorithms_rust/tree/main/sha-256&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  The Flow of SHA-256
&lt;/h2&gt;

&lt;p&gt;The irreversibility of SHA-256 can be understood by analyzing its flow. Let’s hash the string &lt;code&gt;"msg"&lt;/code&gt; as an example.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6re769narwd9yo7h24zf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F6re769narwd9yo7h24zf.png" alt="flow-1,2" width="800" height="625"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;First, convert the string &lt;code&gt;msg&lt;/code&gt; into its character codes (hexadecimal representation).&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Next, format the message into a 64-byte block. Add &lt;code&gt;0x80&lt;/code&gt; just after the original message, and append the binary length of the original message to the last 8 bytes of the block. Since &lt;code&gt;msg&lt;/code&gt; is 3 bytes, its binary length is 24 bits, represented as &lt;code&gt;0x18&lt;/code&gt; in hexadecimal. Pad the gap between &lt;code&gt;0x80&lt;/code&gt; and the message length with zeros.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquabjk5ebxlhl2l0v5e4.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fquabjk5ebxlhl2l0v5e4.png" alt="flow3,4" width="800" height="441"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Split the 64-byte block into 4-byte segments. For example, the first segment &lt;code&gt;6d 73 67 80&lt;/code&gt; becomes &lt;code&gt;6d736780&lt;/code&gt; (in decimal: 1836279680).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;
&lt;p&gt;Calculate the value &lt;code&gt;W&lt;/code&gt;, which includes irreversible processing. A part of sample Rust code snippet is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;pre class="highlight rust"&gt;&lt;code&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;calc_w&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="k"&gt;let&lt;/span&gt; &lt;span class="k"&gt;mut&lt;/span&gt; &lt;span class="n"&gt;w&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nd"&gt;vec!&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;

   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;16&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="k"&gt;in&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
       &lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;lower_sigma_1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
           &lt;span class="nf"&gt;.wrapping_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;7&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
           &lt;span class="nf"&gt;.wrapping_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;lower_sigma_0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;15&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
           &lt;span class="nf"&gt;.wrapping_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;16&lt;/span&gt;&lt;span class="p"&gt;]);&lt;/span&gt;
   &lt;span class="p"&gt;}&lt;/span&gt;

   &lt;span class="n"&gt;w&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;

 &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;lower_sigma_1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="nf"&gt;rotr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;17&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="nf"&gt;rotr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;19&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;^&lt;/span&gt; &lt;span class="nf"&gt;shr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;

 &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;rotr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;|&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;32&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;

 &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="nf"&gt;shr&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;u32&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
   &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;n&lt;/span&gt;
 &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;



&lt;p&gt;The &lt;code&gt;calc_w&lt;/code&gt; function calls the &lt;code&gt;lower_sigma_1&lt;/code&gt; function, which in turn calls &lt;code&gt;rotr&lt;/code&gt;. Let’s visualize how this works.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv5v3c6qdkm0cc3n9o3r.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fxv5v3c6qdkm0cc3n9o3r.png" alt="flow5" width="800" height="238"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;As an example, let &lt;code&gt;x = 31&lt;/code&gt; and &lt;code&gt;n = 3&lt;/code&gt;. In binary, 31 is &lt;code&gt;11111&lt;/code&gt;. Since &lt;code&gt;x&lt;/code&gt; and &lt;code&gt;n&lt;/code&gt; are unsigned 32-bit integers, &lt;code&gt;x&lt;/code&gt; is represented as &lt;code&gt;000...000011111&lt;/code&gt; (27 leading zeros).&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rcjyoms514or73zc6xs.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F3rcjyoms514or73zc6xs.png" alt="flow6,7,8" width="800" height="673"&gt;&lt;/a&gt;&lt;/p&gt;


&lt;/li&gt;

&lt;li&gt;&lt;p&gt;The operation &lt;code&gt;x &amp;gt;&amp;gt; n&lt;/code&gt; shifts &lt;code&gt;x&lt;/code&gt; to the right by &lt;code&gt;n&lt;/code&gt; bits, adding &lt;code&gt;000&lt;/code&gt; to the left and removing the last &lt;code&gt;111&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Similarly, &lt;code&gt;x &amp;lt;&amp;lt; (32 - n)&lt;/code&gt; shifts &lt;code&gt;x&lt;/code&gt; to the left by &lt;code&gt;32 - 3 = 29&lt;/code&gt; bits. The first three bits become &lt;code&gt;111&lt;/code&gt;, followed by zeros.&lt;/p&gt;&lt;/li&gt;

&lt;li&gt;&lt;p&gt;Finally, the bitwise OR of steps 6 and 7 results in a value like &lt;code&gt;111000...0011&lt;/code&gt;.&lt;/p&gt;&lt;/li&gt;

&lt;/ol&gt;

&lt;p&gt;The information loss occurs in the &lt;code&gt;shr&lt;/code&gt; operation. In step 6, the bits shifted out to the right cannot be recovered.&lt;/p&gt;

&lt;p&gt;Additionally, in &lt;code&gt;lower_sigma_1&lt;/code&gt;, the operation &lt;code&gt;rotr(x, 17) ^ rotr(x, 19)&lt;/code&gt; results in information loss. To simplify, this process involves XORing &lt;code&gt;10000&lt;/code&gt; shifted right by 1 bit with &lt;code&gt;10000&lt;/code&gt; shifted right by 2 bits, yielding &lt;code&gt;01000 ^ 00100 = 00000&lt;/code&gt;. This result could also originate from an original value of &lt;code&gt;00000&lt;/code&gt;, making the original value unrecoverable.&lt;/p&gt;

&lt;p&gt;By repeating such operations, it becomes impossible to restore the original value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Referrence
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf" rel="noopener noreferrer"&gt;https://nvlpubs.nist.gov/nistpubs/FIPS/NIST.FIPS.180-4.pdf&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>
      <category>hash</category>
      <category>rust</category>
    </item>
    <item>
      <title>Creating Regular Expression</title>
      <dc:creator>Akira</dc:creator>
      <pubDate>Wed, 13 Nov 2024 13:00:47 +0000</pubDate>
      <link>https://dev.to/_akira19/creating-regular-expression-33ep</link>
      <guid>https://dev.to/_akira19/creating-regular-expression-33ep</guid>
      <description>&lt;p&gt;I created a toy regular expression engine to understand how regular expression works.&lt;/p&gt;

&lt;h2&gt;
  
  
  Process Flow
&lt;/h2&gt;

&lt;p&gt;A regular expression represents a pattern for strings, such as /ab*(a|c)+/. However, this expression can be difficult to handle directly in a program. Therefore, the first step is to convert it into a format that is easier to work with programmatically.&lt;/p&gt;

&lt;p&gt;This format is typically represented as an AST (Abstract Syntax Tree). An AST is a data structure that represents the structure of the regular expression in a tree format. By creating an AST, it becomes easier to analyze the regular expression and process it for pattern matching.&lt;/p&gt;

&lt;p&gt;Next, we perform code generation based on the AST. Code generation is the process of creating instructions that a computer can execute. By analyzing the AST, we can generate the necessary commands and processing steps to create code that can actually handle the regular expression.&lt;/p&gt;

&lt;p&gt;Finally, we evaluate the given string to see if it matches the regular expression using the generated code. The goal is to execute the generated code to determine whether the given string matches the regular expression pattern.&lt;/p&gt;

&lt;p&gt;This approach allows effective handling of regular expressions, enabling pattern matching and searching within strings.&lt;/p&gt;

&lt;h2&gt;
  
  
  Parsing to AST
&lt;/h2&gt;

&lt;p&gt;First, we parse the regular expression into an AST.&lt;br&gt;
Consider a regular expression such as &lt;code&gt;ab*(c|d)&lt;/code&gt; as an example.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;AST for ab*(c|d)

          Concat
            |
     ---------------------------------
     |              |                 |
Literal('a')  Repeat0(Literal('b'))   or
                                      |
                                 -----------
                                |          |
                           Literal('c')  Literal('d')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Representing this tree structure as structs makes it easier to handle in a program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="k"&gt;enum&lt;/span&gt; &lt;span class="n"&gt;RegexAst&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="nf"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;char&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;Concat&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Vec&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RegexAst&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RegexAst&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RegexAst&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="nf"&gt;Repeat0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;Box&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;RegexAst&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="c1"&gt;// Repeat 0 or more times&lt;/span&gt;
    &lt;span class="o"&gt;...&lt;/span&gt;&lt;span class="n"&gt;other&lt;/span&gt; &lt;span class="n"&gt;symbols&lt;/span&gt; &lt;span class="n"&gt;like&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;and&lt;/span&gt; &lt;span class="o"&gt;?&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With such a struct definition, the AST for &lt;code&gt;ab*(c|d)&lt;/code&gt; can be represented as follows:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight rust"&gt;&lt;code&gt;&lt;span class="nf"&gt;Concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;
  &lt;span class="nf"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'a'&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; 
  &lt;span class="nf"&gt;Repeat0&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'b'&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; 
  &lt;span class="nf"&gt;Or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nf"&gt;Concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'c'&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt; 
    &lt;span class="nf"&gt;Concat&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="nf"&gt;Literal&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sc"&gt;'d'&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
  &lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Code Generation
&lt;/h2&gt;

&lt;p&gt;In code generation, we analyze the AST and generate code to process the regular expression. Specifically, we define a set of instructions to handle the regular expression, and based on this, we generate the code.&lt;/p&gt;

&lt;p&gt;First, we define a register machine as the data structure to handle regular expressions. A register machine consists of a finite set of registers and a program counter (which indicates the current instruction to execute). Using this register machine, we execute the regular expression processing.&lt;/p&gt;

&lt;p&gt;We define the commands to use. This register machine has four commands:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;char&lt;/strong&gt;: Matches a specific character. If the current character matches the specified character, it moves to the next command.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;match&lt;/strong&gt;: Checks for a match with the regular expression. If the specified pattern matches the current string, it moves to the next command.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;jump&lt;/strong&gt;: Jumps to a specified location, changing the execution point of the command.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;split&lt;/strong&gt;: Branches to two different commands, each with its own target command. This can allow processing of two branches.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;By combining these commands, we can represent the processing of regular expressions. By analyzing the AST and generating commands for each node, we also generate control structures like loops and conditional branches as needed.&lt;/p&gt;

&lt;p&gt;With just these four commands, we can evaluate a regular expression.&lt;br&gt;
The sample regular expression (&lt;code&gt;ab*(c|d)&lt;/code&gt;) can be represented with only these four commands, as shown below.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;0001: Char(a)
0002: Split 0003, 0005
0003: Char(b)
0004: Jump(0002)
0005: Split 0006, 0008
0006: Char(c)
0007: Jump(0009)
0008: Char(d)
0009: Match
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The numbers on the left are the PC (Program Counter).&lt;/p&gt;

&lt;h2&gt;
  
  
  Evaluation
&lt;/h2&gt;

&lt;p&gt;This may be a bit challenging to understand, so let’s walk through the generated code step by step.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;First, the Char(a) command is executed, checking if the current character is 'a'. If it doesn’t match, the regular expression does not match. If it matches, we increment the PC by one and jump to 0002.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;If it matches, the Split 0003, 0005 command is executed, branching the program counter into two locations, 0003 and 0005. This represents the &lt;code&gt;b*&lt;/code&gt; part of the expression, which can repeat 0 or more times. If 0005 is chosen, 0003 is skipped, so the 'b' character is not evaluated. However, if it jumps to 0003, it evaluates 'b'. If it matches, the PC is incremented to 0004, which instructs to jump back to 0002. This branches again into 0003 and 0005, and if 0003 is selected, 'b' is re-evaluated, allowing 'b' to match consecutively. Jumping to 0005 exits this branch, resulting in the &lt;code&gt;b*&lt;/code&gt; evaluation for zero or more repetitions of 'b'.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Next, it branches into 0006 and 0008. This allows for a match of either 'c' or 'd'. At location 0006, the Char(c) command is executed, checking if the current character is 'c'. If it matches, it moves the PC to 0007, which jumps to 0009, ending the process. At location 0008, the Char(d) command is executed, checking if the current character is 'd'. Here too, the PC advances, and ultimately, the Match command confirms a match with the regular expression and the program ends.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Thus, this sequence of commands efficiently processes the pattern /ab*(c|d)/. Each command reflects the structure of the regular expression, with appropriate branching and repetition achieved through program counter control.&lt;/p&gt;

&lt;p&gt;Evaluation is conducted using a loop.&lt;/p&gt;

&lt;p&gt;For instance, to evaluate a string like &lt;code&gt;char = xabd&lt;/code&gt;:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;code&gt;char[0] == x&lt;/code&gt;, so process at 0001 ends, moving to the next check.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;char[1] == a&lt;/code&gt;, so 0001 passes, and the PC increments.&lt;/li&gt;
&lt;li&gt;At 0002, branches to 0003 and 0005 are evaluated.&lt;/li&gt;
&lt;li&gt;At 0003, 'b' matches, so we change the evaluated character to 'd' and increment the PC to 0004.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This process continues, evaluating one character at a time.&lt;/p&gt;

</description>
      <category>regularexpression</category>
      <category>rust</category>
    </item>
    <item>
      <title>How to disconnect WebSocket from Chrome Developer tool</title>
      <dc:creator>Akira</dc:creator>
      <pubDate>Wed, 13 Nov 2024 12:21:52 +0000</pubDate>
      <link>https://dev.to/_akira19/how-to-disconnect-websocket-from-chrome-developer-tool-4mfc</link>
      <guid>https://dev.to/_akira19/how-to-disconnect-websocket-from-chrome-developer-tool-4mfc</guid>
      <description>&lt;p&gt;Sometimes, you may need to test how your application behaves when a WebSocket connection is disconnected. Here’s a step-by-step guide to disconnecting a WebSocket via the Chrome Console.&lt;/p&gt;

&lt;p&gt;First, enter &lt;code&gt;queryObjects(WebSocket)&lt;/code&gt; in the Chrome Console to retrieve the WebSocket object.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcykbe5gqvh7bu57igwb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcykbe5gqvh7bu57igwb.png" alt="websocket object" width="185" height="113"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Next, right-click on the WebSocket object, select &lt;code&gt;Store as global variable&lt;/code&gt; from the modal that appears.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F465vw1ppwb4eirn32qb3.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F465vw1ppwb4eirn32qb3.png" alt="right-click modal" width="191" height="104"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This action will store the WebSocket object in a temporary variable, typically named &lt;code&gt;temp1&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wpszcwoorzzv14ij4g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F2wpszcwoorzzv14ij4g5.png" alt="variable which stores the websocket object" width="159" height="42"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Finally, to close the WebSocket connection, use the command &lt;code&gt;temp1.close()&lt;/code&gt;. This will disconnect the WebSocket, allowing you to observe your application’s response to the disconnection.&lt;/p&gt;

</description>
      <category>websocket</category>
      <category>chrome</category>
    </item>
  </channel>
</rss>
