<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Hiroshi Watanabe</title>
    <description>The latest articles on DEV Community by Hiroshi Watanabe (@kaityo256).</description>
    <link>https://dev.to/kaityo256</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F803951%2Fcab9b727-d6b9-4d08-b136-3a6e396ab91c.png</url>
      <title>DEV Community: Hiroshi Watanabe</title>
      <link>https://dev.to/kaityo256</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kaityo256"/>
    <language>en</language>
    <item>
      <title>A Hands-On Introduction to Restricted Boltzmann Machines with a Minimal NumPy Implementation</title>
      <dc:creator>Hiroshi Watanabe</dc:creator>
      <pubDate>Wed, 11 Feb 2026 13:51:36 +0000</pubDate>
      <link>https://dev.to/kaityo256/a-hands-on-introduction-to-restricted-boltzmann-machines-with-a-minimal-numpy-implementation-1inp</link>
      <guid>https://dev.to/kaityo256/a-hands-on-introduction-to-restricted-boltzmann-machines-with-a-minimal-numpy-implementation-1inp</guid>
      <description>&lt;p&gt;We have developed and released a Python library that makes it easy to experiment with a Restricted Boltzmann Machine (RBM), a classic machine learning model. The main developer of this library is Mr. Kobayashi, and the project is available under the MIT License.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/watanabe-appi/simple_rbm" rel="noopener noreferrer"&gt;https://github.com/watanabe-appi/simple_rbm&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The library can be used easily both in a local Python environment and in Google Colab. If CuPy is available, it also supports GPU acceleration via GPGPU.&lt;/p&gt;

&lt;p&gt;In the following sections, I will introduce the basics of RBMs and provide a step-by-step guide to using this library.&lt;/p&gt;

&lt;h2&gt;
  
  
  What Is a Restricted Boltzmann Machine?
&lt;/h2&gt;

&lt;p&gt;Originally, Hinton and Sejnowski proposed the &lt;strong&gt;Boltzmann Machine&lt;/strong&gt; as an associative memory network. This model corresponds to a physical system in which spins are arranged on a network, and system states appear according to a defined energy function and the Boltzmann distribution. By properly learning the network weights, the model can memorize and represent various types of data.&lt;/p&gt;

&lt;p&gt;Although the Boltzmann Machine is theoretically fascinating, its training cost is extremely high, making it impractical for many real-world applications.&lt;/p&gt;

&lt;p&gt;To address this issue, the &lt;strong&gt;Restricted Boltzmann Machine (RBM)&lt;/strong&gt; was introduced. In this model, units are divided into two groups: a visible layer and a hidden layer. By prohibiting connections between units within the same group, the model becomes much more efficient to train.&lt;/p&gt;

&lt;p&gt;While RBMs often do not match the performance of similarly sized deep neural networks, they remain theoretically interesting. Their strong connections to statistical physics make them an attractive subject for research.&lt;/p&gt;

&lt;p&gt;An RBM can be viewed as a network in which spins are placed on nodes. Each node has a bias parameter that controls how likely the corresponding spin is to point “up,” and each edge has a weight parameter that determines whether the two connected spins tend to align in the same direction.&lt;/p&gt;

&lt;p&gt;Training an RBM means optimizing these bias and weight parameters so that the model exhibits the desired behavior.&lt;/p&gt;

&lt;p&gt;An RBM can memorize given data in advance—for example, images. Once trained, it can reconstruct an input image from its internal representation.&lt;/p&gt;

&lt;p&gt;This is somewhat similar to how humans recall information: imagine you are shown a handwritten character, then it is hidden, and you are asked to write down the character that was on the paper. You may not reproduce the exact same strokes, but you can reproduce the same character. In a similar way, a trained RBM reconstructs inputs based on what it has learned.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rnviqhbcvw8j7ovvmrk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F9rnviqhbcvw8j7ovvmrk.png" alt=" " width="800" height="425"&gt;&lt;/a&gt;&lt;br&gt;
In this situation, when we see an image of the digit “9,” we compress the visual information and recognize it as the abstract concept “the number 9.” From that abstract representation, we can then reconstruct the character “9” again.&lt;/p&gt;

&lt;p&gt;An RBM can perform a similar process. In the following sections, we will use the MNIST handwritten digit dataset as an example to train an RBM and demonstrate image reconstruction.&lt;/p&gt;
&lt;h2&gt;
  
  
  Using the RBM Library in Google Colab
&lt;/h2&gt;

&lt;p&gt;Although the library can also be used in a local environment, using Google Colab is the easiest way to get started. Let’s try it there. First, open a new notebook in Google Colab.&lt;/p&gt;
&lt;h3&gt;
  
  
  Installing the Library
&lt;/h3&gt;

&lt;p&gt;In the first cell, install the RBM library using &lt;code&gt;pip&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;git+https://github.com/watanabe-appi/simple_rbm.git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Importing the Required Libraries
&lt;/h3&gt;

&lt;p&gt;Next, import all the necessary libraries:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;PIL&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;Image&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;simple_rbm&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;RBM&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;IPython&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;By installing the library in the previous step, you can now import &lt;code&gt;simple_rbm&lt;/code&gt; from the &lt;code&gt;RBM&lt;/code&gt; package.&lt;/p&gt;

&lt;h3&gt;
  
  
  Initializing the RBM
&lt;/h3&gt;

&lt;p&gt;Next, initialize the RBM. The required parameters are the number of visible units and the number of hidden units. Since MNIST images are 28×28 pixels, we set the number of visible units accordingly. The hidden layer can be any size smaller than the visible layer; for example, let’s use 64 units.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rbm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RBM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;visible_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hidden_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Preparing the Data
&lt;/h3&gt;

&lt;p&gt;Now, prepare the MNIST dataset. Because an RBM is an unsupervised learning model, we only use the image data and ignore the labels.&lt;/p&gt;

&lt;p&gt;We also normalize the pixel values to the range 0.0 to 1.0.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;keras&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;mnist&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;x_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;
&lt;span class="n"&gt;x_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="mi"&gt;255&lt;/span&gt;
&lt;span class="n"&gt;x_train&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;x_test&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;astype&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;float32&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, &lt;code&gt;x_train&lt;/code&gt; is the training dataset and &lt;code&gt;x_test&lt;/code&gt; is the test dataset.&lt;/p&gt;

&lt;h3&gt;
  
  
  Training the RBM
&lt;/h3&gt;

&lt;p&gt;To train the RBM, simply pass the dataset and call &lt;code&gt;fit&lt;/code&gt;, as in many similar frameworks. You can also specify the number of epochs and the batch size.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;fit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_train&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will see output similar to the following:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Computation will proceed on the CPU.
Epoch [1/10], KL Divergence: 0.3689
Epoch [2/10], KL Divergence: 0.2504
Epoch [3/10], KL Divergence: 0.2144
Epoch [4/10], KL Divergence: 0.1982
Epoch [5/10], KL Divergence: 0.1875
Epoch [6/10], KL Divergence: 0.1797
Epoch [7/10], KL Divergence: 0.1736
Epoch [8/10], KL Divergence: 0.1685
Epoch [9/10], KL Divergence: 0.1645
Epoch [10/10], KL Divergence: 0.1612
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since GPU acceleration was not specified, a message indicates that computation will proceed on the CPU.  &lt;/p&gt;

&lt;p&gt;As the cost function, we use the Kullback–Leibler (KL) divergence. Training is performed using the Contrastive Divergence (CD) algorithm. The input image is first encoded into the hidden layer, then reconstructed back to the visible layer. The weights are updated so that the reconstructed image becomes closer to the original input.&lt;/p&gt;

&lt;h3&gt;
  
  
  A Helper Function for Image Reconstruction
&lt;/h3&gt;

&lt;p&gt;Next, we let the RBM reconstruct input images. To visualize the results, we define a helper function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;show_restored_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="n"&gt;fig&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;axes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;subplots&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;figsize&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;off&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Input Image&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;input&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gray&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;axis&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;off&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;set_title&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;Restored Image&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;axes&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;imshow&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;output&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="n"&gt;cmap&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;gray&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;plt&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;show&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function simply takes the original image (&lt;code&gt;input&lt;/code&gt;) and the RBM-reconstructed image (&lt;code&gt;output&lt;/code&gt;) and displays them side by side using Matplotlib.&lt;/p&gt;

&lt;h3&gt;
  
  
  Reconstructing Images
&lt;/h3&gt;

&lt;p&gt;Now, let’s reconstruct some images.&lt;/p&gt;

&lt;p&gt;We use &lt;code&gt;x_test&lt;/code&gt;, the portion of the dataset that was not used for training. Passing this data to &lt;code&gt;rbm.reconstruct&lt;/code&gt; returns reconstructed images from the trained RBM.&lt;/p&gt;

&lt;p&gt;The reconstruction is computed through the following procedure:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Fix the visible layer units to the input data and sample the hidden layer units.&lt;/li&gt;
&lt;li&gt;Fix the sampled hidden layer units and compute the expected values of the visible layer units.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The &lt;code&gt;simple_rbm&lt;/code&gt; library uses a Bernoulli–Bernoulli model with Ising spins in both the visible and hidden layers. However, &lt;code&gt;RBM::reconstruct&lt;/code&gt; returns the expected values of the visible units, so the output consists of real-valued numbers.&lt;/p&gt;

&lt;p&gt;The following code feeds the first 10 test images into the RBM and displays their reconstructions:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
  &lt;span class="nf"&gt;show_restored_image&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;rbm&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;reconstruct&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x_test&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;reshape&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;))[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output looks like this:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrgto22m901k50b6242e.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcrgto22m901k50b6242e.png" alt=" " width="722" height="1092"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Although the reconstructed images are not identical to the inputs, we can clearly recognize that they represent the same digits.&lt;/p&gt;

&lt;p&gt;In this example, the RBM compresses the original 28×28 = 784-bit visible representation into a 64-bit hidden representation, and then reconstructs it back to 784 units. This demonstrates how the RBM performs information compression and reconstruction.&lt;/p&gt;

&lt;h2&gt;
  
  
  Other Usage Options
&lt;/h2&gt;

&lt;h3&gt;
  
  
  Using GPGPU
&lt;/h3&gt;

&lt;p&gt;If you want to enable GPGPU acceleration, specify &lt;code&gt;use_GPU=True&lt;/code&gt; in the constructor:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;rbm&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;RBM&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;visible_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;28&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="mi"&gt;28&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;hidden_num&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;64&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;use_GPU&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;CuPy&lt;/code&gt; is available in your environment, GPU acceleration will be automatically used when calling &lt;code&gt;RBM::fit&lt;/code&gt;.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# GPU usage has been enabled. Computation will proceed on the GPU.
Epoch [1/10], KL Divergence: 0.3716
Epoch [2/10], KL Divergence: 0.2513
Epoch [3/10], KL Divergence: 0.2144
Epoch [4/10], KL Divergence: 0.1968
Epoch [5/10], KL Divergence: 0.1857
Epoch [6/10], KL Divergence: 0.1780
Epoch [7/10], KL Divergence: 0.1723
Epoch [8/10], KL Divergence: 0.1677
Epoch [9/10], KL Divergence: 0.1639
Epoch [10/10], KL Divergence: 0.1607
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In Google Colab, simply selecting a GPU runtime enables the accelerated version of the RBM. The GitHub repository also provides a sample Google Colab notebook demonstrating GPU usage.&lt;/p&gt;

&lt;h3&gt;
  
  
  Using the Library in Your Own Project
&lt;/h3&gt;

&lt;p&gt;If you want to use the RBM library in your own code, creating a virtual environment is recommended:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;python3 &lt;span class="nt"&gt;-m&lt;/span&gt; venv .venv
&lt;span class="nb"&gt;source&lt;/span&gt; .venv/bin/activate
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;--upgrade&lt;/span&gt; pip
python3 &lt;span class="nt"&gt;-m&lt;/span&gt; pip &lt;span class="nb"&gt;install &lt;/span&gt;https://github.com/watanabe-appi/simple_rbm.git
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;After this setup, the RBM library will be ready to use in your project.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;In this article, we introduced how to use the RBM library developed in our laboratory. RBMs have a simple structure, which makes them both analytically tractable and intellectually interesting. We hope this library will contribute to further research and experimentation with RBMs.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>deeplearning</category>
      <category>opensource</category>
    </item>
    <item>
      <title>Comparison between ChatGPT-3.5 and 4</title>
      <dc:creator>Hiroshi Watanabe</dc:creator>
      <pubDate>Sat, 22 Apr 2023 14:05:36 +0000</pubDate>
      <link>https://dev.to/kaityo256/comparison-between-chatgpt-35-and-4-1k34</link>
      <guid>https://dev.to/kaityo256/comparison-between-chatgpt-35-and-4-1k34</guid>
      <description>&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;I asked same questions to ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 is qualitatively smarter than ChatGPT-3.5&lt;/p&gt;

&lt;h2&gt;
  
  
  Poor Snail
&lt;/h2&gt;

&lt;p&gt;A snail has fallen into a well. The well is 30 cm high, and the snail climbs 3 cm per day but falls 2 cm during the night. How many days does it take the snail to climb up the well?&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-3.5
&lt;/h3&gt;

&lt;p&gt;At day time, the snail climbs 3 cm, but at night it slips down 2 cm. So, each day, the snail moves up 1 cm in height.&lt;/p&gt;

&lt;p&gt;To climb a total of 30 cm, the snail would need 30 days if there were no nights to slip down. However, the snail slips 2 cm at night, so it needs an additional 2 cm to reach the top each day.&lt;/p&gt;

&lt;p&gt;Thus, the snail actually moves 1+2=3 cm every day. Therefore, it would take the snail 30/3 = 10 days to climb up the well.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-4
&lt;/h3&gt;

&lt;p&gt;Let's analyze the snail's progress. Each day the snail climbs 3 cm, but at night, it falls back 2 cm. So overall, the snail makes a net progress of 1 cm per day (3 cm - 2 cm).&lt;/p&gt;

&lt;p&gt;On the day the snail reaches the top of the well, it will climb up 3 cm and won't fall back, as it will be out of the well already. To calculate the number of days it takes for the snail to reach the top, we should consider the snail's progress up to 27 cm (30 cm - 3 cm). &lt;/p&gt;

&lt;p&gt;Since the snail makes a net progress of 1 cm per day, it will take 27 days to reach 27 cm (27 cm / 1 cm per day). On the 28th day, the snail will climb the remaining 3 cm and finally reach the top of the well.&lt;/p&gt;

&lt;p&gt;So, it will take the snail 28 days to climb up the 30 cm well.&lt;/p&gt;

&lt;h2&gt;
  
  
  Heavy Smoker
&lt;/h2&gt;

&lt;p&gt;A man can make one new cigarette from the butts of five cigarettes. What is the maximum number of cigarettes this man can smoke when he has 50 new cigarettes?&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-3.5
&lt;/h3&gt;

&lt;p&gt;If the man can make one new cigarette from the butts of five cigarettes, it means he needs five cigarette butts to make a new cigarette.&lt;/p&gt;

&lt;p&gt;Therefore, if he has 50 new cigarettes, he must have started with 250 cigarette butts (50 new cigarettes x 5 butts per cigarette = 250 butts).&lt;/p&gt;

&lt;p&gt;Now, if he smokes all 50 new cigarettes, he will end up with 50 cigarette butts. He can then use those 50 butts to make 10 new cigarettes (50 butts / 5 butts per cigarette = 10 cigarettes).&lt;/p&gt;

&lt;p&gt;If he smokes those 10 new cigarettes, he will have 10 new butts, which he can use to make 2 new cigarettes (10 butts / 5 butts per cigarette = 2 cigarettes).&lt;/p&gt;

&lt;p&gt;If he smokes those 2 new cigarettes, he will have 2 new butts, which he can use to make 1 new cigarette (2 butts / 5 butts per cigarette = 0.4 cigarettes, but we round down to 1).&lt;/p&gt;

&lt;p&gt;So, in total, he can smoke 50 + 10 + 2 + 1 = 63 cigarettes.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-4
&lt;/h3&gt;

&lt;p&gt;Let's break down this problem step by step:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The man starts with 50 new cigarettes.&lt;/li&gt;
&lt;li&gt;After smoking the 50 cigarettes, he has 50 butts.&lt;/li&gt;
&lt;li&gt;He can use these 50 butts to create 50/5 = 10 new cigarettes.&lt;/li&gt;
&lt;li&gt;After smoking the 10 new cigarettes, he has 10 more butts.&lt;/li&gt;
&lt;li&gt;He can use these 10 butts to create 10/5 = 2 new cigarettes.&lt;/li&gt;
&lt;li&gt;After smoking the 2 new cigarettes, he has 2 more butts.&lt;/li&gt;
&lt;li&gt;At this point, he has a total of 50 (original) + 10 + 2 = 62 smoked cigarettes.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;However, he still has some butts left. He has the 2 butts from the last two cigarettes he smoked, plus the 2 butts that he couldn't use when making the last 2 new cigarettes (a total of 4 butts). He only needs one more butt to make another new cigarette.&lt;/p&gt;

&lt;p&gt;Since the man can't make another new cigarette with the butts he has, the maximum number of cigarettes he can smoke is 62.&lt;/p&gt;

&lt;h2&gt;
  
  
  Four Chairs
&lt;/h2&gt;

&lt;p&gt;There is a desk with seats on the east, west, north, and south sides; four people, A, B, C and D, are sitting in one of the chairs. From the following conversation, infer which seat C is sitting in.&lt;/p&gt;

&lt;p&gt;A "I have B in front of me"&lt;br&gt;
D "I am sitting on the north seat"&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-3.5
&lt;/h3&gt;

&lt;p&gt;From the given conversation, we can infer that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Person A is sitting facing towards the south seat.&lt;/li&gt;
&lt;li&gt;Person B is sitting on the south seat, facing towards person A.&lt;/li&gt;
&lt;li&gt;Person D is sitting on the north seat.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Since person A has person B in front of them, and person B is sitting on the south seat facing towards person A, it means that person A is sitting on the north seat facing towards the south. Therefore, person C must be sitting on either the east or the west seat.&lt;/p&gt;

&lt;p&gt;We do not have any information about the positions of A, B, or D with respect to person C. However, we can infer from person D's statement that they are not sitting on the east or the west seat, as those are not the north seat. Therefore, person C must be sitting on either the east or the west seat.&lt;/p&gt;

&lt;p&gt;So, based on the given conversation, we cannot determine which seat C is sitting in.&lt;/p&gt;

&lt;h3&gt;
  
  
  ChatGPT-4
&lt;/h3&gt;

&lt;p&gt;From the conversation, we know that:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;A has B in front of them.&lt;/li&gt;
&lt;li&gt;D is sitting on the north seat.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Since A has B in front of them, A and B must be sitting in either the north-south or east-west orientation. Since D is sitting on the north seat, A and B cannot be sitting in the north-south orientation. Therefore, A and B are sitting in the east-west orientation.&lt;/p&gt;

&lt;p&gt;Since D is sitting on the north seat, C must be sitting on the south seat.&lt;/p&gt;

</description>
      <category>chatgpt</category>
    </item>
    <item>
      <title>How many asterisks can be put?</title>
      <dc:creator>Hiroshi Watanabe</dc:creator>
      <pubDate>Fri, 29 Jul 2022 13:31:20 +0000</pubDate>
      <link>https://dev.to/kaityo256/how-many-asterisks-can-be-put-o71</link>
      <guid>https://dev.to/kaityo256/how-many-asterisks-can-be-put-o71</guid>
      <description>&lt;p&gt;In C and C++, the asterisk (&lt;code&gt;*&lt;/code&gt;) is used to declare or dereference a pointer variable.&lt;/p&gt;

&lt;p&gt;By the way, did you know that you can prefix a function call with as many asterisks as you like?&lt;/p&gt;

&lt;p&gt;Here is the simple and valid C program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"Hello world!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is a valid C program too.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="s"&gt;"Hello world!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is also a valid one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;******&lt;/span&gt;&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="s"&gt;"Hello world!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This program with 100 asterisks is also a perfect C program.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;
&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
  &lt;span class="p"&gt;(&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="o"&gt;**********&lt;/span&gt;
   &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="s"&gt;"Hello world!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You see this and you're wondering how many asterisks you can add? Let's find out.&lt;/p&gt;

&lt;p&gt;This is a Ruby script that prepends the specified number of asterisks to the printf function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;
  &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"test.c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;puts&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;&lt;span class="sh"&gt;
#include &amp;lt;stdio.h&amp;gt;
int main(){
(&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;printf)("Hello World!&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sh"&gt;n");
}
&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"clang test.c"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can use like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check.rb 10
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will generate a following source codes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight c"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include&lt;/span&gt; &lt;span class="cpf"&gt;&amp;lt;stdio.h&amp;gt;&lt;/span&gt;&lt;span class="cp"&gt;
&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;(){&lt;/span&gt;
&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;**********&lt;/span&gt;&lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="s"&gt;"Hello World!&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try it with a thousand asterisks.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check.rb 1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No problem. Then try with ten thousand.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check.rb 10000
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed &lt;span class="nb"&gt;source&lt;/span&gt;, and associated run script.
Stack dump:
&lt;span class="o"&gt;(&lt;/span&gt;snip&lt;span class="o"&gt;)&lt;/span&gt;
clang-12: error: unable to execute &lt;span class="nb"&gt;command&lt;/span&gt;: Segmentation fault &lt;span class="o"&gt;(&lt;/span&gt;core dumped&lt;span class="o"&gt;)&lt;/span&gt;
clang-12: error: clang frontend &lt;span class="nb"&gt;command &lt;/span&gt;failed due to signal &lt;span class="o"&gt;(&lt;/span&gt;use &lt;span class="nt"&gt;-v&lt;/span&gt; to see invocation&lt;span class="o"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Clang died with SIGSEGV.&lt;/p&gt;

&lt;p&gt;Clang was ok with 1000 asterisks and died with 10,000 asterisks. So there must be a limit somewhere in there. Let's check it out with a binary search.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;
  &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"test.c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;puts&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;&lt;span class="sh"&gt;
#include &amp;lt;stdio.h&amp;gt;
int main(){
(&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;printf)("Hello World!&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sh"&gt;n");
}
&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="nb"&gt;system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"clang test.c 2&amp;gt; /dev/null"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
    &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; OK"&lt;/span&gt;
    &lt;span class="kp"&gt;false&lt;/span&gt;
  &lt;span class="k"&gt;else&lt;/span&gt;
    &lt;span class="nb"&gt;puts&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s2"&gt; NG"&lt;/span&gt;
    &lt;span class="kp"&gt;true&lt;/span&gt;
  &lt;span class="k"&gt;end&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1000&lt;/span&gt;&lt;span class="o"&gt;..&lt;/span&gt;&lt;span class="mi"&gt;10000&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;bsearch&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the execution result.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ruby search.rb
5500 NG
3250 NG
2125 NG
1562 OK
1844 NG
1703 NG
1633 OK
1668 NG
1651 OK
1660 NG
1656 OK
1658 NG
1657 OK
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now, we have new knowledge. We can put up to 1657 asterisks (environment dependent).&lt;/p&gt;

&lt;p&gt;By the way, let's try it with GCC. Here is a check script.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight ruby"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
  &lt;span class="n"&gt;s&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="s2"&gt;"*"&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;
  &lt;span class="nb"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"test.c"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="s2"&gt;"w"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="o"&gt;|&lt;/span&gt;
  &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;puts&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;&amp;lt;&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;&lt;span class="sh"&gt;
#include &amp;lt;stdio.h&amp;gt;
int main(){
(&lt;/span&gt;&lt;span class="si"&gt;#{&lt;/span&gt;&lt;span class="n"&gt;s&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="sh"&gt;printf)("Hello World!&lt;/span&gt;&lt;span class="se"&gt;\\&lt;/span&gt;&lt;span class="sh"&gt;n");
}
&lt;/span&gt;&lt;span class="no"&gt;EOS&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;
  &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;system&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"gcc test.c"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;end&lt;/span&gt;

&lt;span class="n"&gt;check&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="no"&gt;ARGV&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;].&lt;/span&gt;&lt;span class="nf"&gt;to_i&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try from 1000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check_gcc.rb 1000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;No problem. Next, 10,000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check_gcc.rb 10000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Well, how about 100,000?&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ruby check_gcc.rb 100000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Seriously? Then, 1,000,000.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;ruby check_gcc.rb 1000000
gcc: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report,
with preprocessed &lt;span class="nb"&gt;source &lt;/span&gt;&lt;span class="k"&gt;if &lt;/span&gt;appropriate.
See &amp;lt;http://bugzilla.redhat.com/bugzilla&amp;gt; &lt;span class="k"&gt;for &lt;/span&gt;instructions.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Oops, gcc dided with a million of asterisks.&lt;/p&gt;

&lt;p&gt;So, If you want to use ten thousand asterisks, use GCC.&lt;/p&gt;

</description>
      <category>c</category>
      <category>clang</category>
      <category>gcc</category>
    </item>
    <item>
      <title>Docker Environment for ARM SVE</title>
      <dc:creator>Hiroshi Watanabe</dc:creator>
      <pubDate>Tue, 01 Feb 2022 13:47:24 +0000</pubDate>
      <link>https://dev.to/kaityo256/docker-environment-for-arm-sve-1bm5</link>
      <guid>https://dev.to/kaityo256/docker-environment-for-arm-sve-1bm5</guid>
      <description>&lt;h1&gt;
  
  
  Docker Environment for ARM SVE
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://www.top500.org/"&gt;TOP500&lt;/a&gt; is a list of the 500 most powerful computer systems. Currently, the number one computer on this list is a Japanese computer called &lt;a href="https://www.fujitsu.com/global/about/innovation/fugaku/"&gt;Fugaku&lt;/a&gt;. The architecture of Fugaku is A64fx, which is one of the ARM family. In the field of high performance computing, SIMDs with long vector length are required. The architecture A64fx adopts ARM SVE (Scalable vector Extension). Unlike other SIMD implementations such as AVX2, ARM SVE is scalable, which means that the vector length is not determined at compile time.&lt;/p&gt;

&lt;p&gt;While the idea of ARM SVE is interesting, few people can touch an actual machine that implements ARM SVE. So I prepared a Docker environment where you can experience programming with ARM SVE on QEMU.&lt;/p&gt;

&lt;p&gt;Here is the tutorial for ARM SVE on Docker.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/kaityo256/xbyak_aarch64_handson"&gt;kaityo256/xbyak_aarch64_handson&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  How to obtain a Docker image
&lt;/h2&gt;

&lt;p&gt;Just run the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;-it&lt;/span&gt; kaityo256/xbyak_aarch64_handson
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you will see something like this.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="o"&gt;[&lt;/span&gt;user@2cd82e1ea4e3 ~]&lt;span class="err"&gt;$&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Or, you can build and run the Docker environemnt from Dockerfile as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone https://github.com/kaityo256/xbyak_aarch64_handson.git
&lt;span class="nb"&gt;cd &lt;/span&gt;xbyak_aarch64_handson
&lt;span class="nb"&gt;cd &lt;/span&gt;docker
make
make run
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  ARM SVE
&lt;/h2&gt;

&lt;p&gt;You can use the ARM SVE instructions in two different ways. The first is to use intrinsic functions. ARM provides intrinsic function of C language, which is called the Arm C Language Extensions (ACLEs). Another way is to use the ARM SVE instructions directly. However, it is hard to write assemblies directly, so I recommend using a JIT assembler called Xbyak. Xbyak is a JIT assembler developed by &lt;a href="https://github.com/herumi/"&gt;MITSUNARI Shigeo&lt;/a&gt;. Xbyak was initially developed for &lt;a href="https://github.com/herumi/xbyak"&gt;x86&lt;/a&gt;, but was also released for &lt;a href="https://github.com/fujitsu/xbyak_aarch64"&gt;AArch64&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;In the following, we will introduce only a simple sample of each. Please see the &lt;a href="https://github.com/kaityo256/xbyak_aarch64_handson"&gt;tutorial&lt;/a&gt; for detail.&lt;/p&gt;

&lt;h3&gt;
  
  
  ACLEs
&lt;/h3&gt;

&lt;p&gt;The sample codes for the intrinsic functions are in the directory &lt;code&gt;~/xbyak_aarch64_handson/sample/intrinsic&lt;/code&gt; in the Docker image.&lt;/p&gt;

&lt;p&gt;The sample code can be built as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;01_sve_length/
make
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then you can run the executable using QEMU.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;qemu-aarch64 ./a.out
SVE is available. The length is 512 bits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can specify the vector length in the QEMU options.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;qemu-aarch64 &lt;span class="nt"&gt;-cpu&lt;/span&gt; max,sve128&lt;span class="o"&gt;=&lt;/span&gt;on ./a.out
SVE is available. The length is 128 bits

&lt;span class="nv"&gt;$ &lt;/span&gt;qemu-aarch64 &lt;span class="nt"&gt;-cpu&lt;/span&gt; max,sve256&lt;span class="o"&gt;=&lt;/span&gt;on ./a.out
SVE is available. The length is 256 bits
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here is the source code (&lt;code&gt;sve_length.cpp&lt;/code&gt;).&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#include &amp;lt;cstdio&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;#ifdef __ARM_FEATURE_SVE&lt;/span&gt;
&lt;span class="c"&gt;#include &amp;lt;arm_sve.h&amp;gt;&lt;/span&gt;
&lt;span class="c"&gt;#endif&lt;/span&gt;

int main&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
  int n &lt;span class="o"&gt;=&lt;/span&gt; 0&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c"&gt;#ifdef __ARM_FEATURE_SVE&lt;/span&gt;
  n &lt;span class="o"&gt;=&lt;/span&gt; svcntb&lt;span class="o"&gt;()&lt;/span&gt; &lt;span class="k"&gt;*&lt;/span&gt; 8&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="c"&gt;#endif&lt;/span&gt;
  &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;(&lt;/span&gt;n&lt;span class="o"&gt;)&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;printf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"SVE is available. The length is %d bits&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;, n&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="o"&gt;{&lt;/span&gt;
    &lt;span class="nb"&gt;printf&lt;/span&gt;&lt;span class="o"&gt;(&lt;/span&gt;&lt;span class="s2"&gt;"SVE is unavailable.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="o"&gt;)&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="o"&gt;}&lt;/span&gt;
&lt;span class="o"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Whether or not ARM SVE can be used can be determined by whether or not &lt;strong&gt;ARM_FEATURE_SVE is defined. If `&lt;/strong&gt;ARM_FEATURE_SVE&lt;code&gt; is defined, then you can use intrinsic functions for SVE by including &lt;/code&gt;arm_sve.h`.&lt;/p&gt;

&lt;p&gt;The vector length can be obtained by &lt;code&gt;svcntb()&lt;/code&gt; which returns the vector length in bytes. The corresponding instruction is &lt;code&gt;cntb&lt;/code&gt;. The name of a ACLE SVE function consists of a prefix &lt;code&gt;sv&lt;/code&gt; followed by the corresponding instruction in lower case.&lt;/p&gt;

&lt;p&gt;In order to enable SVE, you need to compile with the &lt;code&gt;-march=armv8-a+sve&lt;/code&gt; option. Without the option, &lt;code&gt;__ARM_FEATURE_SVE&lt;/code&gt; will not be defined.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;aarch64-linux-gnu-g++ &lt;span class="nt"&gt;-static&lt;/span&gt; sve_length.cpp
&lt;span class="nv"&gt;$ &lt;/span&gt;qemu-aarch64 ./a.out
SVE is unavailable.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Xbyak
&lt;/h3&gt;

&lt;p&gt;The sample codes are in the directory &lt;code&gt;~/xbyak_aarch64_handson/sample/xbyak&lt;/code&gt; in the Docker image. First, let's compile and run the test code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;xbyak_aarch64_handson
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;sample
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;xbyak
&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="nb"&gt;cd &lt;/span&gt;01_test
&lt;span class="nv"&gt;$ &lt;/span&gt;make
aarch64-linux-gnu-g++ &lt;span class="nt"&gt;-static&lt;/span&gt; test.cpp &lt;span class="nt"&gt;-L&lt;/span&gt;/home/user/xbyak_aarch64_handson/xbyak_aarch64/lib &lt;span class="nt"&gt;-lxbyak_aarch64&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;./a.out
1
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that even though &lt;code&gt;a.out&lt;/code&gt; is a binary for ARM, you can run &lt;code&gt;a.out&lt;/code&gt; directly without QEMU like this. Even if you don't specify QEMU explicitly, &lt;code&gt;a.out&lt;/code&gt; is executed through QEMU.&lt;/p&gt;

&lt;p&gt;Here is the source code.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="cp"&gt;#include &amp;lt;cstdio&amp;gt;
#include &amp;lt;xbyak_aarch64/xbyak_aarch64.h&amp;gt;
&lt;/span&gt;
&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;Code&lt;/span&gt; &lt;span class="o"&gt;:&lt;/span&gt; &lt;span class="n"&gt;Xbyak_aarch64&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;CodeGenerator&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Code&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;mov&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;w0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="n"&gt;ret&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;main&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
  &lt;span class="n"&gt;Code&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
  &lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;getCode&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="p"&gt;)()&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ready&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
  &lt;span class="n"&gt;printf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%d&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the &lt;code&gt;mov(w0, 1)&lt;/code&gt; part is where the return value of the function is assigned. Let's change the return value of the function to another value, say 42. Replace the code with &lt;code&gt;mov(w0, 42)&lt;/code&gt;, and compile and run it again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nv"&gt;$ &lt;/span&gt;make
aarch64-linux-gnu-g++ &lt;span class="nt"&gt;-static&lt;/span&gt; test.cpp &lt;span class="nt"&gt;-L&lt;/span&gt;/home/user/xbyak_aarch64_handson/xbyak_aarch64/lib &lt;span class="nt"&gt;-lxbyak_aarch64&lt;/span&gt;
&lt;span class="nv"&gt;$ &lt;/span&gt;./a.out
42
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You will see &lt;code&gt;42&lt;/code&gt; as the rsult.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;You can try ARM SVE instructions on QEMU for AArch64. However, preparing that environment is rather tedious, so I prepared a Docker image for it.&lt;/p&gt;

&lt;p&gt;For more information about ACLEs and Xbyak_aarch64, please refer to the official documentation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://developer.arm.com/architectures/system-architectures/software-standards/acle"&gt;ACLEs (ARM Official Document)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/fujitsu/xbyak_aarch64"&gt;Xbyak_aarch64 ; JIT assembler for AArch64 CPUs by C++ (GitHub Repository)&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;My English in &lt;a href="https://github.com/kaityo256/xbyak_aarch64_handson"&gt;the tutorial&lt;/a&gt; can be poor, so I appreciate pull requests for improvements.&lt;/p&gt;

&lt;p&gt;Enjoy ARM SVE!&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>archlinux</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
