DEV Community: Hiroshi Watanabe

A Hands-On Introduction to Restricted Boltzmann Machines with a Minimal NumPy Implementation

Hiroshi Watanabe — Wed, 11 Feb 2026 13:51:36 +0000

We have developed and released a Python library that makes it easy to experiment with a Restricted Boltzmann Machine (RBM), a classic machine learning model. The main developer of this library is Mr. Kobayashi, and the project is available under the MIT License.

https://github.com/watanabe-appi/simple_rbm

The library can be used easily both in a local Python environment and in Google Colab. If CuPy is available, it also supports GPU acceleration via GPGPU.

In the following sections, I will introduce the basics of RBMs and provide a step-by-step guide to using this library.

What Is a Restricted Boltzmann Machine?

Originally, Hinton and Sejnowski proposed the Boltzmann Machine as an associative memory network. This model corresponds to a physical system in which spins are arranged on a network, and system states appear according to a defined energy function and the Boltzmann distribution. By properly learning the network weights, the model can memorize and represent various types of data.

Although the Boltzmann Machine is theoretically fascinating, its training cost is extremely high, making it impractical for many real-world applications.

To address this issue, the Restricted Boltzmann Machine (RBM) was introduced. In this model, units are divided into two groups: a visible layer and a hidden layer. By prohibiting connections between units within the same group, the model becomes much more efficient to train.

While RBMs often do not match the performance of similarly sized deep neural networks, they remain theoretically interesting. Their strong connections to statistical physics make them an attractive subject for research.

An RBM can be viewed as a network in which spins are placed on nodes. Each node has a bias parameter that controls how likely the corresponding spin is to point “up,” and each edge has a weight parameter that determines whether the two connected spins tend to align in the same direction.

Training an RBM means optimizing these bias and weight parameters so that the model exhibits the desired behavior.

An RBM can memorize given data in advance—for example, images. Once trained, it can reconstruct an input image from its internal representation.

This is somewhat similar to how humans recall information: imagine you are shown a handwritten character, then it is hidden, and you are asked to write down the character that was on the paper. You may not reproduce the exact same strokes, but you can reproduce the same character. In a similar way, a trained RBM reconstructs inputs based on what it has learned.

In this situation, when we see an image of the digit “9,” we compress the visual information and recognize it as the abstract concept “the number 9.” From that abstract representation, we can then reconstruct the character “9” again.

An RBM can perform a similar process. In the following sections, we will use the MNIST handwritten digit dataset as an example to train an RBM and demonstrate image reconstruction.

Using the RBM Library in Google Colab

Although the library can also be used in a local environment, using Google Colab is the easiest way to get started. Let’s try it there. First, open a new notebook in Google Colab.

Installing the Library

In the first cell, install the RBM library using pip:

!pip install "git+https://github.com/watanabe-appi/simple_rbm.git"

Importing the Required Libraries

Next, import all the necessary libraries:

import tensorflow as tf
from PIL import Image
from simple_rbm import RBM
import numpy as np
import IPython
import matplotlib.pyplot as plt

By installing the library in the previous step, you can now import simple_rbm from the RBM package.

Initializing the RBM

Next, initialize the RBM. The required parameters are the number of visible units and the number of hidden units. Since MNIST images are 28×28 pixels, we set the number of visible units accordingly. The hidden layer can be any size smaller than the visible layer; for example, let’s use 64 units.

rbm = RBM(visible_num=28 * 28, hidden_num=64)

Preparing the Data

Now, prepare the MNIST dataset. Because an RBM is an unsupervised learning model, we only use the image data and ignore the labels.

We also normalize the pixel values to the range 0.0 to 1.0.

(x_train, _), (x_test, _) = tf.keras.datasets.mnist.load_data()
x_train = np.array(x_train) / 255
x_test = np.array(x_test) / 255
x_train = x_train.reshape(-1, 28 * 28).astype(np.float32)
x_test = x_test.reshape(-1, 28 * 28).astype(np.float32)

Here, x_train is the training dataset and x_test is the test dataset.

Training the RBM

To train the RBM, simply pass the dataset and call fit, as in many similar frameworks. You can also specify the number of epochs and the batch size.

rbm.fit(x_train, epochs=10, batch_size=1000)

You will see output similar to the following:

# Computation will proceed on the CPU.
Epoch [1/10], KL Divergence: 0.3689
Epoch [2/10], KL Divergence: 0.2504
Epoch [3/10], KL Divergence: 0.2144
Epoch [4/10], KL Divergence: 0.1982
Epoch [5/10], KL Divergence: 0.1875
Epoch [6/10], KL Divergence: 0.1797
Epoch [7/10], KL Divergence: 0.1736
Epoch [8/10], KL Divergence: 0.1685
Epoch [9/10], KL Divergence: 0.1645
Epoch [10/10], KL Divergence: 0.1612

Since GPU acceleration was not specified, a message indicates that computation will proceed on the CPU.

As the cost function, we use the Kullback–Leibler (KL) divergence. Training is performed using the Contrastive Divergence (CD) algorithm. The input image is first encoded into the hidden layer, then reconstructed back to the visible layer. The weights are updated so that the reconstructed image becomes closer to the original input.

A Helper Function for Image Reconstruction

Next, we let the RBM reconstruct input images. To visualize the results, we define a helper function:

def show_restored_image(input, output):
  fig, axes = plt.subplots(1, 2, figsize=(4, 2))
  axes[0].axis('off')
  axes[0].set_title('Input Image')
  axes[0].imshow(input.reshape((28,28)), cmap='gray')
  axes[1].axis('off')
  axes[1].set_title('Restored Image')
  axes[1].imshow(output.reshape((28,28)), cmap='gray')
  plt.show()

This function simply takes the original image (input) and the RBM-reconstructed image (output) and displays them side by side using Matplotlib.

Reconstructing Images

Now, let’s reconstruct some images.

We use x_test, the portion of the dataset that was not used for training. Passing this data to rbm.reconstruct returns reconstructed images from the trained RBM.

The reconstruction is computed through the following procedure:

Fix the visible layer units to the input data and sample the hidden layer units.
Fix the sampled hidden layer units and compute the expected values of the visible layer units.

The simple_rbm library uses a Bernoulli–Bernoulli model with Ising spins in both the visible and hidden layers. However, RBM::reconstruct returns the expected values of the visible units, so the output consists of real-valued numbers.

The following code feeds the first 10 test images into the RBM and displays their reconstructions:

for i in range(10):
  show_restored_image(x_test[i], rbm.reconstruct(x_test[i].reshape(1, 28 * 28))[0])

The output looks like this:

Although the reconstructed images are not identical to the inputs, we can clearly recognize that they represent the same digits.

In this example, the RBM compresses the original 28×28 = 784-bit visible representation into a 64-bit hidden representation, and then reconstructs it back to 784 units. This demonstrates how the RBM performs information compression and reconstruction.

Other Usage Options

Using GPGPU

If you want to enable GPGPU acceleration, specify use_GPU=True in the constructor:

rbm = RBM(visible_num=28 * 28, hidden_num=64, use_GPU=True)

If CuPy is available in your environment, GPU acceleration will be automatically used when calling RBM::fit.

# GPU usage has been enabled. Computation will proceed on the GPU.
Epoch [1/10], KL Divergence: 0.3716
Epoch [2/10], KL Divergence: 0.2513
Epoch [3/10], KL Divergence: 0.2144
Epoch [4/10], KL Divergence: 0.1968
Epoch [5/10], KL Divergence: 0.1857
Epoch [6/10], KL Divergence: 0.1780
Epoch [7/10], KL Divergence: 0.1723
Epoch [8/10], KL Divergence: 0.1677
Epoch [9/10], KL Divergence: 0.1639
Epoch [10/10], KL Divergence: 0.1607

In Google Colab, simply selecting a GPU runtime enables the accelerated version of the RBM. The GitHub repository also provides a sample Google Colab notebook demonstrating GPU usage.

Using the Library in Your Own Project

If you want to use the RBM library in your own code, creating a virtual environment is recommended:

python3 -m venv .venv
source .venv/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install https://github.com/watanabe-appi/simple_rbm.git

After this setup, the RBM library will be ready to use in your project.

Conclusion

In this article, we introduced how to use the RBM library developed in our laboratory. RBMs have a simple structure, which makes them both analytically tractable and intellectually interesting. We hope this library will contribute to further research and experimentation with RBMs.

Comparison between ChatGPT-3.5 and 4

Hiroshi Watanabe — Sat, 22 Apr 2023 14:05:36 +0000

Summary

I asked same questions to ChatGPT-3.5 and ChatGPT-4. ChatGPT-4 is qualitatively smarter than ChatGPT-3.5

Poor Snail

A snail has fallen into a well. The well is 30 cm high, and the snail climbs 3 cm per day but falls 2 cm during the night. How many days does it take the snail to climb up the well?

ChatGPT-3.5

At day time, the snail climbs 3 cm, but at night it slips down 2 cm. So, each day, the snail moves up 1 cm in height.

To climb a total of 30 cm, the snail would need 30 days if there were no nights to slip down. However, the snail slips 2 cm at night, so it needs an additional 2 cm to reach the top each day.

Thus, the snail actually moves 1+2=3 cm every day. Therefore, it would take the snail 30/3 = 10 days to climb up the well.

ChatGPT-4

Let's analyze the snail's progress. Each day the snail climbs 3 cm, but at night, it falls back 2 cm. So overall, the snail makes a net progress of 1 cm per day (3 cm - 2 cm).

On the day the snail reaches the top of the well, it will climb up 3 cm and won't fall back, as it will be out of the well already. To calculate the number of days it takes for the snail to reach the top, we should consider the snail's progress up to 27 cm (30 cm - 3 cm).

Since the snail makes a net progress of 1 cm per day, it will take 27 days to reach 27 cm (27 cm / 1 cm per day). On the 28th day, the snail will climb the remaining 3 cm and finally reach the top of the well.

So, it will take the snail 28 days to climb up the 30 cm well.

Heavy Smoker

A man can make one new cigarette from the butts of five cigarettes. What is the maximum number of cigarettes this man can smoke when he has 50 new cigarettes?

ChatGPT-3.5

If the man can make one new cigarette from the butts of five cigarettes, it means he needs five cigarette butts to make a new cigarette.

Therefore, if he has 50 new cigarettes, he must have started with 250 cigarette butts (50 new cigarettes x 5 butts per cigarette = 250 butts).

Now, if he smokes all 50 new cigarettes, he will end up with 50 cigarette butts. He can then use those 50 butts to make 10 new cigarettes (50 butts / 5 butts per cigarette = 10 cigarettes).

If he smokes those 10 new cigarettes, he will have 10 new butts, which he can use to make 2 new cigarettes (10 butts / 5 butts per cigarette = 2 cigarettes).

If he smokes those 2 new cigarettes, he will have 2 new butts, which he can use to make 1 new cigarette (2 butts / 5 butts per cigarette = 0.4 cigarettes, but we round down to 1).

So, in total, he can smoke 50 + 10 + 2 + 1 = 63 cigarettes.

ChatGPT-4

Let's break down this problem step by step:

The man starts with 50 new cigarettes.
After smoking the 50 cigarettes, he has 50 butts.
He can use these 50 butts to create 50/5 = 10 new cigarettes.
After smoking the 10 new cigarettes, he has 10 more butts.
He can use these 10 butts to create 10/5 = 2 new cigarettes.
After smoking the 2 new cigarettes, he has 2 more butts.
At this point, he has a total of 50 (original) + 10 + 2 = 62 smoked cigarettes.

However, he still has some butts left. He has the 2 butts from the last two cigarettes he smoked, plus the 2 butts that he couldn't use when making the last 2 new cigarettes (a total of 4 butts). He only needs one more butt to make another new cigarette.

Since the man can't make another new cigarette with the butts he has, the maximum number of cigarettes he can smoke is 62.

Four Chairs

There is a desk with seats on the east, west, north, and south sides; four people, A, B, C and D, are sitting in one of the chairs. From the following conversation, infer which seat C is sitting in.

A "I have B in front of me"
D "I am sitting on the north seat"

ChatGPT-3.5

From the given conversation, we can infer that:

Person A is sitting facing towards the south seat.
Person B is sitting on the south seat, facing towards person A.
Person D is sitting on the north seat.

Since person A has person B in front of them, and person B is sitting on the south seat facing towards person A, it means that person A is sitting on the north seat facing towards the south. Therefore, person C must be sitting on either the east or the west seat.

We do not have any information about the positions of A, B, or D with respect to person C. However, we can infer from person D's statement that they are not sitting on the east or the west seat, as those are not the north seat. Therefore, person C must be sitting on either the east or the west seat.

So, based on the given conversation, we cannot determine which seat C is sitting in.

ChatGPT-4

From the conversation, we know that:

A has B in front of them.
D is sitting on the north seat.

Since A has B in front of them, A and B must be sitting in either the north-south or east-west orientation. Since D is sitting on the north seat, A and B cannot be sitting in the north-south orientation. Therefore, A and B are sitting in the east-west orientation.

Since D is sitting on the north seat, C must be sitting on the south seat.

How many asterisks can be put?

Hiroshi Watanabe — Fri, 29 Jul 2022 13:31:20 +0000

In C and C++, the asterisk (*) is used to declare or dereference a pointer variable.

By the way, did you know that you can prefix a function call with as many asterisks as you like?

Here is the simple and valid C program.

#include <stdio.h>

int main(){
  printf("Hello world!\n");
}

Here is a valid C program too.

#include <stdio.h>

int main(){
  (*printf)("Hello world!\n");
}

Here is also a valid one.

#include <stdio.h>

int main(){
  (******printf)("Hello world!\n");
}

This program with 100 asterisks is also a perfect C program.

#include <stdio.h>

int main(){
  (
   **********
   **********
   **********
   **********
   **********
   **********
   **********
   **********
   **********
   **********
   printf)("Hello world!\n");
}

You see this and you're wondering how many asterisks you can add? Let's find out.

This is a Ruby script that prepends the specified number of asterisks to the printf function.

def check(n)
  s = "*"*n
  open("test.c","w") do |f|
  f.puts <<EOS
#include <stdio.h>
int main(){
(#{s}printf)("Hello World!\\n");
}
EOS
end
  return system("clang test.c")
end

check(ARGV[0].to_i)

You can use like this.

ruby check.rb 10

It will generate a following source codes.

#include <stdio.h>
int main(){
(**********printf)("Hello World!\n");
}

Let's try it with a thousand asterisks.

ruby check.rb 1000

No problem. Then try with ten thousand.

ruby check.rb 10000
PLEASE submit a bug report to https://bugs.llvm.org/ and include the crash backtrace, preprocessed source, and associated run script.
Stack dump:
(snip)
clang-12: error: unable to execute command: Segmentation fault (core dumped)
clang-12: error: clang frontend command failed due to signal (use -v to see invocation)

Clang died with SIGSEGV.

Clang was ok with 1000 asterisks and died with 10,000 asterisks. So there must be a limit somewhere in there. Let's check it out with a binary search.

def check(n)
  s = "*"*n
  open("test.c","w") do |f|
  f.puts <<EOS
#include <stdio.h>
int main(){
(#{s}printf)("Hello World!\\n");
}
EOS
end
  if system("clang test.c 2> /dev/null")
    puts "#{n} OK"
    false
  else
    puts "#{n} NG"
    true
  end
end

(1000..10000).bsearch do |n|
  check(n)
end

Here is the execution result.

$ ruby search.rb
5500 NG
3250 NG
2125 NG
1562 OK
1844 NG
1703 NG
1633 OK
1668 NG
1651 OK
1660 NG
1656 OK
1658 NG
1657 OK

Now, we have new knowledge. We can put up to 1657 asterisks (environment dependent).

By the way, let's try it with GCC. Here is a check script.

def check(n)
  s = "*"*n
  open("test.c","w") do |f|
  f.puts <<EOS
#include <stdio.h>
int main(){
(#{s}printf)("Hello World!\\n");
}
EOS
end
  return system("gcc test.c")
end

check(ARGV[0].to_i)

Let's try from 1000.

ruby check_gcc.rb 1000

No problem. Next, 10,000.

ruby check_gcc.rb 10000

Well, how about 100,000?

ruby check_gcc.rb 100000

Seriously? Then, 1,000,000.

$ ruby check_gcc.rb 1000000
gcc: internal compiler error: Segmentation fault signal terminated program cc1
Please submit a full bug report,
with preprocessed source if appropriate.
See <http://bugzilla.redhat.com/bugzilla> for instructions.

Oops, gcc dided with a million of asterisks.

So, If you want to use ten thousand asterisks, use GCC.

Docker Environment for ARM SVE

Hiroshi Watanabe — Tue, 01 Feb 2022 13:47:24 +0000

Docker Environment for ARM SVE

TOP500 is a list of the 500 most powerful computer systems. Currently, the number one computer on this list is a Japanese computer called Fugaku. The architecture of Fugaku is A64fx, which is one of the ARM family. In the field of high performance computing, SIMDs with long vector length are required. The architecture A64fx adopts ARM SVE (Scalable vector Extension). Unlike other SIMD implementations such as AVX2, ARM SVE is scalable, which means that the vector length is not determined at compile time.

While the idea of ARM SVE is interesting, few people can touch an actual machine that implements ARM SVE. So I prepared a Docker environment where you can experience programming with ARM SVE on QEMU.

Here is the tutorial for ARM SVE on Docker.

kaityo256/xbyak_aarch64_handson

How to obtain a Docker image

Just run the following.

docker run -it kaityo256/xbyak_aarch64_handson

Then you will see something like this.

[user@2cd82e1ea4e3 ~]$

Or, you can build and run the Docker environemnt from Dockerfile as follows.

git clone https://github.com/kaityo256/xbyak_aarch64_handson.git
cd xbyak_aarch64_handson
cd docker
make
make run

ARM SVE

You can use the ARM SVE instructions in two different ways. The first is to use intrinsic functions. ARM provides intrinsic function of C language, which is called the Arm C Language Extensions (ACLEs). Another way is to use the ARM SVE instructions directly. However, it is hard to write assemblies directly, so I recommend using a JIT assembler called Xbyak. Xbyak is a JIT assembler developed by MITSUNARI Shigeo. Xbyak was initially developed for x86, but was also released for AArch64.

In the following, we will introduce only a simple sample of each. Please see the tutorial for detail.

ACLEs

The sample codes for the intrinsic functions are in the directory ~/xbyak_aarch64_handson/sample/intrinsic in the Docker image.

The sample code can be built as follows.

cd 01_sve_length/
make

Then you can run the executable using QEMU.

$ qemu-aarch64 ./a.out
SVE is available. The length is 512 bits

You can specify the vector length in the QEMU options.

$ qemu-aarch64 -cpu max,sve128=on ./a.out
SVE is available. The length is 128 bits

$ qemu-aarch64 -cpu max,sve256=on ./a.out
SVE is available. The length is 256 bits

Here is the source code (sve_length.cpp).

#include <cstdio>
#ifdef __ARM_FEATURE_SVE
#include <arm_sve.h>
#endif

int main() {
  int n = 0;
#ifdef __ARM_FEATURE_SVE
  n = svcntb() * 8;
#endif
  if (n) {
    printf("SVE is available. The length is %d bits\n", n);
  } else {
    printf("SVE is unavailable.\n");
  }
}

Whether or not ARM SVE can be used can be determined by whether or not ARM_FEATURE_SVE is defined. If `ARM_FEATURE_SVE is defined, then you can use intrinsic functions for SVE by including arm_sve.h`.

The vector length can be obtained by svcntb() which returns the vector length in bytes. The corresponding instruction is cntb. The name of a ACLE SVE function consists of a prefix sv followed by the corresponding instruction in lower case.

In order to enable SVE, you need to compile with the -march=armv8-a+sve option. Without the option, __ARM_FEATURE_SVE will not be defined.

$ aarch64-linux-gnu-g++ -static sve_length.cpp
$ qemu-aarch64 ./a.out
SVE is unavailable.

Xbyak

The sample codes are in the directory ~/xbyak_aarch64_handson/sample/xbyak in the Docker image. First, let's compile and run the test code.

$ cd xbyak_aarch64_handson
$ cd sample
$ cd xbyak
$ cd 01_test
$ make
aarch64-linux-gnu-g++ -static test.cpp -L/home/user/xbyak_aarch64_handson/xbyak_aarch64/lib -lxbyak_aarch64
$ ./a.out
1

Note that even though a.out is a binary for ARM, you can run a.out directly without QEMU like this. Even if you don't specify QEMU explicitly, a.out is executed through QEMU.

Here is the source code.

#include <cstdio>
#include <xbyak_aarch64/xbyak_aarch64.h>

struct Code : Xbyak_aarch64::CodeGenerator {
  Code() {
    mov(w0, 1);
    ret();
  }
};

int main() {
  Code c;
  auto f = c.getCode<int (*)()>();
  c.ready();
  printf("%d\n", f());
}

Here, the mov(w0, 1) part is where the return value of the function is assigned. Let's change the return value of the function to another value, say 42. Replace the code with mov(w0, 42), and compile and run it again.

$ make
aarch64-linux-gnu-g++ -static test.cpp -L/home/user/xbyak_aarch64_handson/xbyak_aarch64/lib -lxbyak_aarch64
$ ./a.out
42

You will see 42 as the rsult.

Summary

You can try ARM SVE instructions on QEMU for AArch64. However, preparing that environment is rather tedious, so I prepared a Docker image for it.

For more information about ACLEs and Xbyak_aarch64, please refer to the official documentation.

My English in the tutorial can be poor, so I appreciate pull requests for improvements.

Enjoy ARM SVE!