<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Keisuke Sato</title>
    <description>The latest articles on DEV Community by Keisuke Sato (@ksk0629).</description>
    <link>https://dev.to/ksk0629</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F803240%2F6fec78ff-4aa0-485e-b007-91d41f0650f2.jpg</url>
      <title>DEV Community: Keisuke Sato</title>
      <link>https://dev.to/ksk0629</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ksk0629"/>
    <language>en</language>
    <item>
      <title>Qiskit Aer vs MQT DDSIM</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Tue, 08 Apr 2025 02:04:09 +0000</pubDate>
      <link>https://dev.to/ksk0629/qiskit-aer-vs-mqt-ddsim-3k50</link>
      <guid>https://dev.to/ksk0629/qiskit-aer-vs-mqt-ddsim-3k50</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Hi there, this article is about &lt;code&gt;qiskit_aer&lt;/code&gt; and &lt;code&gt;mqt.ddsim&lt;/code&gt;, both Python libraries for quantum computing to simulate quantum circuits on one's laptop. &lt;code&gt;qiskit_aer&lt;/code&gt; is known as high-performance quantum computing simulators with realistic noise models [&lt;a href="https://qiskit.github.io/qiskit-aer/" rel="noopener noreferrer"&gt;Qiskit Aer documentation&lt;/a&gt;]. Meanwhile, &lt;code&gt;mqt.ddsim&lt;/code&gt;  is a graph-based simulator, which reported that its efficiency had outperformed some of the existing simulators in a proposal paper, &lt;a href="https://ieeexplore.ieee.org/document/8355954" rel="noopener noreferrer"&gt;Advanced Simulation of Quantum Computations&lt;/a&gt; (&lt;a href="https://arxiv.org/abs/1707.00865" rel="noopener noreferrer"&gt;the preprint&lt;/a&gt; is also available).&lt;/p&gt;

&lt;p&gt;By the way, as you already know or will find easily enough, both simulators have several modes. In this article, I only focus on their Quantum Assembly Language (QASM) simulators.&lt;/p&gt;

&lt;p&gt;I was wondering which one was faster. Thus, I performed a quick experiment and noted down the result here.&lt;/p&gt;

&lt;p&gt;Some of these parts are still a little ambitious for me. Please feel free to comment if you find any mistakes or places for suggested improvements.&lt;/p&gt;

&lt;h2&gt;
  
  
  Experimental Result
&lt;/h2&gt;

&lt;p&gt;This section is for the people who would like to know the result first. Life is busy and the less time it takes to do things, the more time we have for more things to do, especially for things like this kind of random article.&lt;/p&gt;

&lt;p&gt;In my limited experimental settings, &lt;code&gt;mqt.ddsim&lt;/code&gt;  significantly outperformed &lt;code&gt;qiskit_aer&lt;/code&gt;. The following graphs show you how much faster &lt;code&gt;mqt.ddsim&lt;/code&gt; was.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphnrvnzppveoz0yafwev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphnrvnzppveoz0yafwev.png" alt="execution time vs number of qubits" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt; (Left) The figure of the average success probabilities. Qiskit Aer  does not work with 31 qubits on my laptop whose specs will be noted later. (Right) The figure of the average execution times when the number of qubits is increased. From 25 qubits, the execution times by Qiskit Aer significantly increase. &lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjow9mo5y4tbjdscesu9h.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjow9mo5y4tbjdscesu9h.png" alt="execution time vs number of adders" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt; (Left) The figure of the average success probabilities. (Right) The figure of the average execution times when the number of adders is increased. The times by Qiskit Aer linearly grow up. &lt;/p&gt;

&lt;p&gt;The specs of my laptop are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS: MacBook Air&lt;/li&gt;
&lt;li&gt;Chip: M1&lt;/li&gt;
&lt;li&gt;Memory: 16 GB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I conducted the first experiment, which shows the time vs the number of qubits, with a quantum circuit having only one adder based on quantum Fourier transformation. For the second experiment, which shows the time vs the number of adders, I fixed the number of qubits as 21 since 21 is the largest number of qubits with which &lt;code&gt;qiskit_aer&lt;/code&gt; works relatively fast.&lt;/p&gt;

&lt;p&gt;Also, I employed an adder for this experiment because adder is one of the fundamental blocks for quantum arithmetics. One can construct other arithmetics such as multiplication and exponentiation (e.g., [&lt;a href="https://www.rintonpress.com/journals/doi/QIC3.2-8.html" rel="noopener noreferrer"&gt;Circuit for Shor's algorithm using 2n+3 qubits&lt;/a&gt; (&lt;a href="https://arxiv.org/abs/quant-ph/0205095" rel="noopener noreferrer"&gt;the preprint&lt;/a&gt;)]).&lt;/p&gt;

&lt;p&gt;It is worth noting that &lt;code&gt;mqt.ddsim&lt;/code&gt; is known for taking significant time if the circuit is the absolute worst case as reported in &lt;a href="https://ieeexplore.ieee.org/document/8355954" rel="noopener noreferrer"&gt;the original paper&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantum Computing
&lt;/h2&gt;

&lt;p&gt;Hereafter, I’ll write some introductions to quantum computing and show the result again with a little bit more of an in-depth explanation.&lt;/p&gt;

&lt;p&gt;First, we need to know what quantum computing is and the answer is computing with a quantum computer. That’s pretty easy, isn’t it? Then, one probably wonders what a quantum computer is. It is a computer based on quantum mechanics. The difference between classical and quantum computers is what they are based on. Whilst quantum computers work in a quantum manner, classical computers, which are computers we use every day, work based on classical physics. Unfortunately, my knowledge is limited here, so I cannot precisely explain the difference between them, but the important thing is that quantum mechanics have strange yet powerful phenomena to promise that quantum computing is faster than classical computing for certain fields and problems. One should be aware of this. Not everything works better with quantum computers, but some things work better with quantum computers and one finds significant improvements in them.&lt;/p&gt;

&lt;p&gt;However, our actual quantum computers are extremely limited compared to the theoretical results. Our current classical computers have a large number of bits, which are the fundamental units to store every data, and the error correction ability. That’s why we comfortably use our laptops for everyday use. The current quantum computers have a small number of qubits, which are the quantum version of bit, and small, or as far as I know, no error correction. That’s why it is quite difficult to conduct physical experiments involving a large number of qubits at this era called Noisy Intermediate-Scale Quantum (NISQ). We all are waiting for the era of Fault-Tolerant Quantum Computing (FTQC), which allows us to use more qubits with error correction techniques.&lt;/p&gt;

&lt;p&gt;Although we are in the era of NISQ, there are some things we can do with our laptops. One of these is simulation. Simulating quantum computers, more precisely quantum circuits, is one of the ways to see how proposed quantum algorithms work. But, as you might expect, simulating is not efficient at all. As one qubit is represented by a two-dimensional vector, to take $n$ qubits into account, we need a $2^n$-dimensional vector, which goes up really easily. To make it more efficient to simulate it, several types of simulators have been proposed.&lt;/p&gt;

&lt;h2&gt;
  
  
  Qiskit
&lt;/h2&gt;

&lt;p&gt;We saw what quantum computing is and why simulators are needed in the previous section. In this section, we’ll see then how we can actually write a program for working with quantum computers.&lt;/p&gt;

&lt;p&gt;There are several ways to get on quantum computers. One of them is &lt;a href="https://docs.quantum.ibm.com/guides" rel="noopener noreferrer"&gt;Qiskit&lt;/a&gt;. It is an open-source SDK for working with quantum computers. This is a Python library to write the programs. As we are in the NISQ era, it allows us to simulate the program on our laptops and also to work with IBM’s actual quantum computers. The latter one is super cool. However, actual quantum computers are not the main topic here. There are several ways to simulate the programs on your laptop. Qiskit itself has some of them. Also, there is &lt;a href="https://qiskit.github.io/qiskit-aer/" rel="noopener noreferrer"&gt;Qiskit Aer&lt;/a&gt;, which is still in the Qiskit project but is separately installed from Qiskit itself now. Qiskit Aer is provided as high-performance quantum computing simulators with realistic noise models. As another option, I just found &lt;a href="https://mqt.readthedocs.io/projects/ddsim/en/latest/" rel="noopener noreferrer"&gt;MQT DDSIM&lt;/a&gt;. MQT DDSIM is known as a graph-based simulator and it has experimentally shown that it outperformed other existing simulators in &lt;a href="https://ieeexplore.ieee.org/document/8355954" rel="noopener noreferrer"&gt;the original paper&lt;/a&gt;. It is noteworthy that their theoretical worst case is significantly worse than some of the other existing methods, which are called array-based. But, anyway, they showed that within their experiments, their proposal was the best.&lt;/p&gt;

&lt;p&gt;So, I am wondering which one is better, Qiskit Aer or MQT DDSIM. I have got to admit that I could not find what the Qiskit Aer is based on. So, technically, it could be one of the methods that the DDSIM’s authors had compared with their method. But, well, it has also been a while since the paper came out and Qiskit Aer is constantly maintained. So, I believe it was worth it to conduct some experiments.&lt;/p&gt;

&lt;h1&gt;
  
  
  Results
&lt;/h1&gt;

&lt;p&gt;I conducted the following experiments with my laptop whose specs are&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;OS: MacBook Air&lt;/li&gt;
&lt;li&gt;Chip: M1&lt;/li&gt;
&lt;li&gt;Memory: 16 GB&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I mentioned in one of the previous sections, there are several types of simulators in &lt;code&gt;qiskit_aer&lt;/code&gt; and &lt;code&gt;mqt.ddsim&lt;/code&gt;. However, I compared only &lt;code&gt;qiskit_aer.AerSimulator&lt;/code&gt;  and &lt;code&gt;ddsim.qasmsimulator.QasmSimulatorBackend&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Execution time vs the number of qubits
&lt;/h2&gt;

&lt;p&gt;In this experiment, I investigated the relation between the execution time and the number of qubits. For that, I employed &lt;a href="https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.DraperQFTAdder" rel="noopener noreferrer"&gt;qiskit.circuit.library.DraperQFTAdder&lt;/a&gt;. Quantum adders are one of the fundamental blocks for other arithmetic circuits and arithmetics are one of the fundamental sub-routines of some quantum algorithms. Thus, I believe that it is not the worst choice. &lt;a href="https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.DraperQFTAdder" rel="noopener noreferrer"&gt;&lt;code&gt;qiskit.circuit.library.DraperQFTAdder&lt;/code&gt;&lt;/a&gt; takes the number of state qubits, which is the number of bits in each input register, as an argument. We observe the execution time whilst varying the number of state qubits. Also, I set &lt;code&gt;“half”&lt;/code&gt; in &lt;code&gt;kind&lt;/code&gt; argument. This setting adds another qubit to the circuit to prevent it from overflowing. Hence the number of qubits included in the circuit is $2n+1$, where $n$ is the number of state qubits. The experiments for each number of qubits were performed 10 times and are represented by the average in the graphs below.&lt;/p&gt;

&lt;p&gt;The result of this experiment is as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphnrvnzppveoz0yafwev.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fphnrvnzppveoz0yafwev.png" alt="execution time vs number of qubits" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The left graph shows that both &lt;code&gt;AerSimulator&lt;/code&gt; and &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; correctly added numbers in every number of qubits I attempted. However, because of the limited memory of my laptop, &lt;code&gt;AerSimulator&lt;/code&gt; wasn’t able to simulate the circuit whilst &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; was. The right graph shows that the &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; constantly simulated the adder very fast whilst the execution times of &lt;code&gt;AerSimulator&lt;/code&gt; shot up. As &lt;code&gt;AerSimulator&lt;/code&gt; didn’t work with 31 qubits, the last blue point represents the execution time, around 250 secs, with $29 (= 2 * 14 + 1)$.&lt;/p&gt;

&lt;p&gt;The only adder is not much in terms of how many gates are used in one circuit, yet &lt;code&gt;AerSimulator&lt;/code&gt; didn’t manage to handle this.&lt;/p&gt;

&lt;h2&gt;
  
  
  Execution time vs the number of adders
&lt;/h2&gt;

&lt;p&gt;In this experiment, I investigated the relation between the execution time and the number of adders with a fixed number of qubits, which is 21. I employed 21 from the previous result; it is relatively large yet the execution times of them are still quite close at least on the graph. As an adder, &lt;a href="https://docs.quantum.ibm.com/api/qiskit/qiskit.circuit.library.DraperQFTAdder" rel="noopener noreferrer"&gt;&lt;code&gt;qiskit.circuit.library.DraperQFTAdder&lt;/code&gt;&lt;/a&gt; is again employed. This time, I applied this adder multiple times and saw how much time the two simulators took. The experiments for each number of qubits were performed 10 times and are represented by the average in the graphs below.&lt;/p&gt;

&lt;p&gt;The result is as follows.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hf0g1zwi03j117spezq.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4hf0g1zwi03j117spezq.png" alt="execution time vs number of adders" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Again, even if multiple adders were applied, the left graph shows that they both correctly simulated. The right graph shows that &lt;code&gt;AerSimulator&lt;/code&gt; took linear time to execute the circuit when the number of adders was increased. Meanwhile, &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; took (at least from the graph) almost constant time.&lt;/p&gt;

&lt;h2&gt;
  
  
  Additional experiments on DDSIM
&lt;/h2&gt;

&lt;p&gt;The graphs above show that &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; works in almost constant time. I wondered if the experimental settings allow the simulator to work in exactly constant time in theory. Thus, I conducted additional two experiments for both settings. I performed the same experiments on &lt;code&gt;QasmSimulatorBackend&lt;/code&gt; but with additional cases.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xag75loabkxrdx9mh9w.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1xag75loabkxrdx9mh9w.png" alt="additional execution time vs number of qubits" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplm6pmr3v7c60fzdxez2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fplm6pmr3v7c60fzdxez2.png" alt="additional execution time vs number of adders" width="800" height="320"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As the graphs show, both settings can be difficult to run on a personal laptop. Specifically, it took more than a day to finish to perform the experiment with 101 qubits settings so I had to cancel the experiment and be compromised with 85 qubits.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion and a little bit more
&lt;/h1&gt;

&lt;p&gt;I conducted easy experiments with adders suggested by Draper on my personal Mac M1 laptop with 16GB memory to see the speed difference between QASM simulators of &lt;code&gt;qiskit_aer&lt;/code&gt; and &lt;code&gt;mqt.ddsim&lt;/code&gt;. As a result, &lt;code&gt;mqt.ddsim&lt;/code&gt; simulated the circuits much faster than &lt;code&gt;qiskit_aer&lt;/code&gt;. Moreover, I performed an additional experiment with only &lt;code&gt;mqt.ddsim&lt;/code&gt;. The additional experiment is to&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Verify the settings are not too good for &lt;code&gt;mqt.ddsim&lt;/code&gt; since the previous experiments looked like &lt;code&gt;mqt.ddsim&lt;/code&gt; simulated every circuit in the settings in almost constant time and I suspected that the settings might allow it to work in constant time.&lt;/li&gt;
&lt;li&gt;See the limitation if the settings do not allow it to work in constant time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;As I wrote this several times, all the experiments are quite limited in terms of the circuits that I employed. Hence, one cannot determine that &lt;code&gt;mqt.ddsim&lt;/code&gt; is absolutely better than &lt;code&gt;qiskit_aer&lt;/code&gt; from the experiment. However, the result gives us an idea that at least there are cases where &lt;code&gt;mqt.ddsim&lt;/code&gt; is significantly faster.&lt;/p&gt;

&lt;p&gt;For future investigation, it'd be nice to perform experiments with richer circuits, for instance, more complex arithmetics as well as multiple rotation gates. Rotation gates are often used in the field of quantum machine learning, especially in Variational Quantum Classifiers (VQCs). Most of those models were invented to work with the NISQ era. Thus, it is relatively easy to perform experiments, yet because of the number of data, it's not the easiest to access actual quantum computers. Therefore, simulating those quantum machine learning algorithms is still one of the main ways to measure performance.&lt;/p&gt;

&lt;p&gt;All the experiments and everything has be done with &lt;code&gt;jupyter notebook&lt;/code&gt; and the notebook is in &lt;a href="https://github.com/ksk0629/backend_comparison" rel="noopener noreferrer"&gt;https://github.com/ksk0629/backend_comparison&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;By the way, I met MQT DDSIM when I was reading &lt;a href="https://arxiv.org/abs/2503.23941" rel="noopener noreferrer"&gt;Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization&lt;/a&gt; and checking &lt;a href="https://github.com/JanusQ/Choco-Q" rel="noopener noreferrer"&gt;their implementations&lt;/a&gt;. This is completely on a different topic here, yet the paper was really interesting.&lt;/p&gt;

</description>
      <category>python</category>
      <category>quantumcomputer</category>
      <category>qiskit</category>
    </item>
    <item>
      <title>Quantum-Classical Machine Learning: Quanvolutional Neural Network</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Thu, 26 Sep 2024 22:42:43 +0000</pubDate>
      <link>https://dev.to/ksk0629/quantum-classical-machine-learning-quanvolutional-neural-network-bcl</link>
      <guid>https://dev.to/ksk0629/quantum-classical-machine-learning-quanvolutional-neural-network-bcl</guid>
      <description>&lt;p&gt;&lt;strong&gt;This post is about the paper &lt;a href="https://arxiv.org/abs/1904.04767" rel="noopener noreferrer"&gt;&lt;em&gt;Quanvolutional Neural Networks: Powering Image Recognition with Quantum Circuits&lt;/em&gt;&lt;/a&gt; and my implementation based on it. I have written this post based on my understanding. I would greatly appreciate your comments and any suggestions for modifying this post or my implementation in the GitHub repository. If you are interested, please refer to the paper yourself.&lt;/strong&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Introduction
&lt;/h2&gt;

&lt;p&gt;Quantum machine learning (QML) has rapidly grown in response to advancements in quantum computing research. Research in QML explores various directions, one of which involves rewriting classical methods or architectures in a quantum manner.&lt;/p&gt;

&lt;p&gt;The Quanvolutional Neural Network (QNN) is analogous to the classical Convolutional Neural Network (CNN). As the name suggests, the most distinctive feature of a CNN is the convolutional layer. In the case of the QNN, it is a hybrid quantum-classical machine learning model where the convolutional layer is replaced by a quanvolutional layer. The authors proposed the concept of the quanvolutional layer, which has exactly the same parameters as the convolutional layer. In theory, any convolutional layer in any CNN can be replaced with a quanvolutional layer.&lt;/p&gt;

&lt;p&gt;The primary contribution of the original paper is the concept of the Quanvolutional Neural Network, rather than its practical realisation. Nevertheless, I will also discuss the implementation proposed in the paper.&lt;/p&gt;

&lt;p&gt;For those who love coding, I have shared my GitHub repository containing the QNN implementation here: &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network" rel="noopener noreferrer"&gt;https://github.com/ksk0629/quanvolutional_neural_network&lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Convolutional Layer
&lt;/h2&gt;

&lt;p&gt;The convolutional layer allows us to exploit spatial locality and translational invariance in data. Let us have a look at how a convolutional layer works. Consider the following data, which has a 4x4 shape:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbjbjhugri5t87fvmolb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnbjbjhugri5t87fvmolb.png" alt="original data" width="410" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The convolutional layer uses a sliding window of a certain size. Here, we will use the simplest sliding window, which is 3x3:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmha4sesulyi9hwy6uc5a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fmha4sesulyi9hwy6uc5a.png" alt="sliding window" width="310" height="314"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The layer applies the sliding window to each local region, i.e., it exploits spatial locality. When the sliding window is applied to the first section, as shown below, we calculate a new value: 1x1 + 1x0 + 1x0 + 1x0 + 1x1 + 1x1 + 1x0 + 1x0 + 1x1 = 4.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv60cu4pt0ans2060wgqk.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fv60cu4pt0ans2060wgqk.png" alt="applying the sliding window to the first section" width="412" height="422"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Now, imagine that each element of the original data is shifted one pixel to the right. There are various ways to fill the left edge, but here we simply use 0:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o4vgaqarcwswnocwd45.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1o4vgaqarcwswnocwd45.png" alt="translated original data" width="410" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;As you may notice, the second section is identical to the first section of the original data:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweiqone64nzds9ei7gbn.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fweiqone64nzds9ei7gbn.png" alt="applying the sliding window to the second section of the translated original data" width="410" height="424"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;This demonstrates how the layer exploits translational invariance in data. The convolutional layer uses many convolutional filters, which are sliding windows. Each filter is different, and these filters are applied to the input data independently. Suppose the layer has 50 filters. The output of the layer would then consist of 50 new data sets.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quanvolutional Layer
&lt;/h2&gt;

&lt;p&gt;The QNN features the &lt;em&gt;quanvolutional layer&lt;/em&gt;, which is analogous to the convolutional layer. The quanvolutional layer consists of many quantum circuits, referred to as &lt;em&gt;quanvolutional filters&lt;/em&gt;, which are analogous to sliding windows. If the window size is 3x3, the number of qubits will be 3x3 = 9. In the earlier example of the convolutional filter, we simply multiplied each entry in the window by the corresponding entry in the data and summed the results. The quanvolutional filter operates in a similar way: each entry of the window corresponds to a qubit in the quantum circuit. However, instead of performing multiplication and addition, the quanvolutional filter computes a new value based on the quantum gates applied to the circuit.&lt;/p&gt;

&lt;p&gt;To process classical data—i.e., the data used by our current computers—we need to encode the data onto qubits. There are several encoding methods, and it is up to the user to choose which one to employ. After the data is encoded, the computation phase occurs, followed by the measurement of each qubit. Since quanvolutional layers and convolutional layers are interchangeable, the result of each quanvolutional filter computation must be a scalar. Therefore, the measurement outcome must be decoded, and again, there are several decoding methods available.&lt;/p&gt;

&lt;p&gt;The following circuit represents an example of a quantum filter:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76086uerobfvmelzizl9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F76086uerobfvmelzizl9.png" alt="example of quantum filter" width="800" height="467"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  One Realisation of the QNN
&lt;/h2&gt;

&lt;p&gt;I believe the most important contribution of the original paper is the introduction of the QNN concept. This concept provides a perfect analogy to CNN. However, the original paper also presents one realisation of the QNN. As it is meant to test whether there is a quantum advantage compared to CNN, the implementation is quite simple; the quantum part of the realisation is not even trainable, relying instead on the power of randomly initialised filters.&lt;/p&gt;

&lt;p&gt;The QNN is constructed with the following layers in order:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Quanvolutional Layer&lt;/li&gt;
&lt;li&gt;Convolutional Layer&lt;/li&gt;
&lt;li&gt;Pooling Layer&lt;/li&gt;
&lt;li&gt;Convolutional Layer&lt;/li&gt;
&lt;li&gt;Pooling Layer&lt;/li&gt;
&lt;li&gt;Fully Connected Layer&lt;/li&gt;
&lt;li&gt;Dropout Layer&lt;/li&gt;
&lt;li&gt;Fully Connected Layer&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The next question regarding this realisation is how the quanvolutional layer is constructed, or more specifically, how the quanvolutional filter is constructed, as the quanvolutional layer is simply a layer containing multiple quanvolutional filters. Here is how it works: first, the kernel size is fixed at 3x3 = 9, meaning each filter has 9 qubits.&lt;/p&gt;

&lt;h3&gt;
  
  
  Encoding method
&lt;/h3&gt;

&lt;p&gt;Each qubit is encoded corresponding to each input scalar. If the input scalar is greater than 0, it is encoded into the quantum state |1&amp;gt;; otherwise, it is encoded into |0&amp;gt;.&lt;/p&gt;

&lt;p&gt;Note that the MNIST dataset is the target in the original paper. If you wish to use a different dataset, you might want to adjust the encoding method accordingly.&lt;/p&gt;

&lt;h3&gt;
  
  
  Construct the quantum circuit
&lt;/h3&gt;

&lt;p&gt;The quantum circuit is constructed in the following four stages:&lt;/p&gt;

&lt;h4&gt;
  
  
  Assign connection probabilities
&lt;/h4&gt;

&lt;p&gt;Assign a "connection probability" (ranging from 0 to 1) to each pair of qubits.&lt;/p&gt;

&lt;p&gt;To account for the connections between one pixel and others, two-qubit gates should be applied to each pair of qubits. The connection probability is essential in determining whether a particular two-qubit gate will be applied to the pair later.&lt;/p&gt;

&lt;h4&gt;
  
  
  Select two-qubit gates
&lt;/h4&gt;

&lt;p&gt;Based on the connection probability, one of the two-qubit gates—either the controlled-NOT, swap, square root swap or controlled-U gate—is selected for each pair of qubits. Note that the selected gate is not applied at this stage.&lt;/p&gt;

&lt;h4&gt;
  
  
  Select one-qubit gates
&lt;/h4&gt;

&lt;p&gt;Select a random number in the range from 0 to 2n^2 = 2x3x3 = 18 of one-qubit gates from the gate set {X, Y, Z, U, P, T, H}. X, Y and Z are rotation gates around each axis. I interpret U as a generic single-qubit rotation gate. These gates have rotational parameters, which are also randomly chosen. P, T and H represent the phase, T, and Hadamard gates respectively. Again, the selected gates are not applied at this stage.&lt;/p&gt;

&lt;h4&gt;
  
  
  Apply the selected gates in random order
&lt;/h4&gt;

&lt;p&gt;Shuffle the order of the selected gates and apply them in the shuffled order to the quantum circuit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Decoding method
&lt;/h3&gt;

&lt;p&gt;Measure each qubit and count the number of qubits that are measured in the |1&amp;gt; state.&lt;/p&gt;

&lt;h2&gt;
  
  
  Result of the comparison
&lt;/h2&gt;

&lt;p&gt;As mentioned, the QNN and CNN are compared in the original paper. More precisely, both networks, along with a corresponding random model, are compared.&lt;/p&gt;

&lt;p&gt;The authors reported that the QNN outperforms the CNN in terms of accuracy. However, the random model and the QNN are indistinguishable in terms of accuracy.&lt;/p&gt;

&lt;h2&gt;
  
  
  My Implementation
&lt;/h2&gt;

&lt;p&gt;It is important to understand what the original paper presents. However, one of my main interests also lies in implementing the QNN. Therefore, I implemented both the QNN and CNN to compare their accuracies. Below, I will share parts of my implementation. I would greatly appreciate any feedback or comments related to my programming, either in the comments section below or via the GitHub repository issue.&lt;/p&gt;

&lt;p&gt;I created three classes for the QNN, based on what I believe to be its core components&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;QuanvFilter&lt;/code&gt; (quanvolutional filter)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;QuanvLayer&lt;/code&gt; (quanvolutional layer)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;QuanvNN&lt;/code&gt; (quanvolutional neural network)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We will walk through the essential parts of each. Please note that I have omitted all docstrings, despite having written them, for the sake of brevity.&lt;/p&gt;

&lt;h3&gt;
  
  
  QuanvFilter
&lt;/h3&gt;

&lt;p&gt;&lt;code&gt;QuanvNN&lt;/code&gt; utilises member functions of &lt;code&gt;QuanvLayer&lt;/code&gt;, and &lt;code&gt;QuanvLayer&lt;/code&gt; relies on member functions of &lt;code&gt;QuanvFilter&lt;/code&gt;. Therefore, we will first explore &lt;code&gt;QuanvFilter&lt;/code&gt;, which is defined in &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/src/quanv_filter.py" rel="noopener noreferrer"&gt;quanv_filter.py&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Constructor
&lt;/h4&gt;

&lt;p&gt;This class has only one argument for the constructor, which is the kernel size, equivalent to the window size in the "Convolutional Layer" section. As mentioned, the number of qubits is determined by the argument, as shown in the following piece of code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]):&lt;/span&gt;
        &lt;span class="c1"&gt;# Initialise the look-up table.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;
        &lt;span class="c1"&gt;# Set the simulator.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;simulator&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qiskit_aer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;AerSimulator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

        &lt;span class="c1"&gt;# Get the number of qubits.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_qubits&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;self.lookup_table&lt;/code&gt; is used to reduce execution time, but it is not essential to understanding this class, so we will ignore it for now.&lt;/p&gt;

&lt;p&gt;In the &lt;code&gt;__init__&lt;/code&gt; function, we need to prepare the quanvolutional filter. A quantum circuit representing the quanvolutional filter must be built first, as each filter is a quantum circuit.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 0: Build a quantum circuit as a filter.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__build_initial_circuit&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;self.__build_initial_circuit&lt;/code&gt; is simply a function that builds a new quantum circuit and stores it as a member variable, &lt;code&gt;self.circuit&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;After building the new plain circuit, the initialisation follows the procedure mentioned in the "One Realisation of the QNN" section. Note that encoding and decoding are not done when the circuit is initialised.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 1: assign a connection probability between each qubit.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connection_probabilities&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__set_connection_probabilities&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here, the connection probabilities are stored in the &lt;code&gt;dict&lt;/code&gt; variable &lt;code&gt;self.connection_probabilities&lt;/code&gt;. The keys are tuples representing pairs of qubit positions, and the values are the probabilities. Here is a simple example: consider a circuit with 3 qubits. After running &lt;code&gt;self.__set_connection_probabilities()&lt;/code&gt;, &lt;code&gt;self.connection_probabilities&lt;/code&gt; could look like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="p"&gt;{(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mf"&gt;0.1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="mf"&gt;0.8&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Next, two-qubit gates are randomly selected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 2: Select a two-qubit gate according to the connection probabilities.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;selected_gates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[]&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__set_two_qubit_gate_set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__select_two_qubit_gates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;self.__set_two_qubit_gate_set()&lt;/code&gt; sets the set of the two-qubit gates as described in the paper to a member variable. After setting the two-qubit gate set, the gates are selected according to &lt;code&gt;self.__set_connection_probabilities&lt;/code&gt; in the &lt;code&gt;self.__select_two_qubit_gates()&lt;/code&gt; function. I chose a threshold of 0.5 to determine whether a two-qubit gate is selected.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;### In __select_two_qubit_gates function ###
&lt;/span&gt;        &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;qubit_pair&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;connection_probability&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;connection_probabilities&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;items&lt;/span&gt;&lt;span class="p"&gt;():&lt;/span&gt;
            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;connection_probability&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;0.5&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="c1"&gt;# Skip the pair.
&lt;/span&gt;                &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the connection probability is equal or greater than 0.5, a two-qubit gate is selected. If the selected gate requires parameters, they are chosen from a range between 0 and 6.28, for instance &lt;code&gt;four_params = np.random.rand(4) * (2 * np.pi)&lt;/code&gt;. Since each key in &lt;code&gt;self.connection_probabilities&lt;/code&gt; is always in ascending order, such as &lt;code&gt;(0, 1)&lt;/code&gt;, &lt;code&gt;(0, 2)&lt;/code&gt; and &lt;code&gt;(1, 3)&lt;/code&gt;, the controlled and target qubits should be shuffled.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;### In __select_two_qubit_gates function ###
&lt;/span&gt;            &lt;span class="c1"&gt;# Shuffle the pair of qubits to randomly decide on the target and controlled qubits.
&lt;/span&gt;            &lt;span class="n"&gt;shuffled_qubit_pair&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;qubit_pair&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;  &lt;span class="c1"&gt;# key is tuple. Need to cast to list.
&lt;/span&gt;            &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;connection_probability&lt;/span&gt; &lt;span class="o"&gt;&amp;lt;=&lt;/span&gt; &lt;span class="mf"&gt;0.75&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
                &lt;span class="n"&gt;shuffled_qubit_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qubit_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
                &lt;span class="n"&gt;shuffled_qubit_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qubit_pair&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Since the connection probability is guaranteed to be equal to or greater than 0.5, this code randomly swaps the controlled and target qubits.&lt;/p&gt;

&lt;p&gt;Similar to selecting two-qubit gates, one-qubit gates are selected afterwards.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 3: Select one-qubit gates.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__set_one_qubit_gate_set&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_one_qubit_gates&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;random&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;randint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;low&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;high&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_qubits&lt;/span&gt;&lt;span class="o"&gt;**&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__select_one_qubit_gates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;self.num_one_qubit_gates&lt;/code&gt; is an &lt;code&gt;int&lt;/code&gt; member variable representing the number of one-qubit gates. The target qubit for each selected one-qubit gate is randomly chosen.&lt;/p&gt;

&lt;p&gt;By this stage, the selected one- and two-qubit gates are stored in &lt;code&gt;self.selected_gates&lt;/code&gt;. The next stage involves applying these gates to the circuit in a random order.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 4: Apply the randomly selected gates in an random order.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;__apply_selected_gates&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Although every step to create the quanvolutional filter has been implemented, this is still a quantum circuit, and we need to explicitly add the measurement part to the circuit as the final step.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;        &lt;span class="c1"&gt;# Step 5: Set measurements to the lot qubits.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;circuit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;measure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quantum_register&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classical_register&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h4&gt;
  
  
  Running part
&lt;/h4&gt;

&lt;p&gt;To use the quanvolutional filter, the class must include a function to process the input data.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndarray&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;encoding_method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;decoding_method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Encode the data to the corresponding quantum state.
&lt;/span&gt;        &lt;span class="n"&gt;encoded_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;encoding_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Make the circuit having the loading part.
&lt;/span&gt;        &lt;span class="n"&gt;ready_circuit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_data&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;encoded_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Run the circuit.
&lt;/span&gt;        &lt;span class="n"&gt;transpiled_circuit&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;qiskit&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;transpile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;ready_circuit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;simulator&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="n"&gt;result&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;simulator&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transpiled_circuit&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="nf"&gt;result&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
        &lt;span class="n"&gt;counts&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;result&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_counts&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;transpiled_circuit&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Decode the result data.
&lt;/span&gt;        &lt;span class="n"&gt;decoded_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;decoding_method&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counts&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;decoded_data&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The return value of the function &lt;code&gt;self.load_data&lt;/code&gt;, which is a new quantum circuit including &lt;code&gt;self.circuit&lt;/code&gt; and initialises the qubits (according to the encoded data by the given encoding method), is an instance of qiskit.QuantumCircuit. Once the complete circuit, returned by the function, is prepared, it is executed, and the result is decoded using the given decoding method.&lt;/p&gt;

&lt;p&gt;For now, I have implemented only one encoding and decoding method, as proposed in the paper. The details can be found in &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/src/encoders/z_basis_encoder.py" rel="noopener noreferrer"&gt;z_basis_encoder.py&lt;/a&gt; and &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/src/decoders/one_sum_decoder.py" rel="noopener noreferrer"&gt;one_sum_decoder.py&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Look-up table
&lt;/h4&gt;

&lt;p&gt;At the end of the previous section, the core of the quanvolutional filter was implemented. However, the time required to use this filter is enormous and impractical, at least on my machine, especially for a large dataset like MNIST. To address this, I implemented a look-up table, which is also used in the original paper. It is quite simple: we create a look-up table mapping every possible input to its corresponding output. Once the table is generated for all input data, there is no need to run the circuit anymore. This technique can be applied because:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The encoding method is simple, and the number of input patterns is finite.&lt;/li&gt;
&lt;li&gt;The filter is not trainable in this implementation.
&lt;/li&gt;
&lt;/ol&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;set_lookup_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;encoding_method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;decoding_method&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;callable&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;input_patterns&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;float&lt;/span&gt;&lt;span class="p"&gt;]],&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup_table&lt;/span&gt; &lt;span class="ow"&gt;is&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;vectorised_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vectorize&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;signature&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;(n),(),(),()-&amp;gt;()&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;output_patterns&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;vectorised_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;np&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_patterns&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
                &lt;span class="n"&gt;encoding_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;decoding_method&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;lookup_table&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;outputs&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;zip&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;input_patterns&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;output_patterns&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens here is simply running the circuit against the given inputs using the given encoding and decoding methods.&lt;/p&gt;

&lt;h3&gt;
  
  
  QuanvLayer
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;QuanvLayer&lt;/code&gt; class is defined in &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/src/quanv_layer.py" rel="noopener noreferrer"&gt;quanv_layer.py&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Constructor
&lt;/h4&gt;

&lt;p&gt;The constructor has four arguments. &lt;code&gt;kernel_size&lt;/code&gt; specifies the size of the kernels for each quanvolutional filter. &lt;code&gt;num_filters&lt;/code&gt; defines the number of quanvolutional filters that the class contains. &lt;code&gt;padding_mode&lt;/code&gt; determines how to pad the image so that the output size of each quanvolutional filter matches the original input size. I used the &lt;code&gt;torch.nn.functional.pad&lt;/code&gt; function to pad the image data, so &lt;code&gt;padding_mode&lt;/code&gt; corresponds to the mode argument of that function (see &lt;a href="https://pytorch.org/docs/stable/generated/torch.nn.functional.pad.html" rel="noopener noreferrer"&gt;torch.nn.functional.pad&lt;/a&gt;). &lt;code&gt;is_lookup_mode&lt;/code&gt; indicates whether the quanvolutional filters use the look-up tables.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;     &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;num_filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseDecoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;padding_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="c1"&gt;# Store the arguments to class variables.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;kernel_size&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;num_filters&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;encoder&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;decoder&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;padding_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;padding_mode&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_lookup_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;

        &lt;span class="c1"&gt;# Define constant.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__BATCH_DATA_DIM&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;

        &lt;span class="c1"&gt;# Create the quanvolutional filters.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="nc"&gt;QuanvFilter&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;_&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_filters&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;What happens here is essentially the creation of instances of QuanvFilter.&lt;/p&gt;

&lt;h4&gt;
  
  
  Running part
&lt;/h4&gt;

&lt;p&gt;After preparing the quanvolutional filters, the class requires a function to process the data as a layer.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batch_data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="c1"&gt;# Check the dataset shape.
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;batch_data&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;ndim&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__BATCH_DATA_DIM&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;&lt;span class="s"&gt;
                The dimension of the batch_data must be &lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;__BATCH_DATA_DIM&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;,
                which is [batch size, channel, height, width].
            &lt;/span&gt;&lt;span class="sh"&gt;"""&lt;/span&gt;
            &lt;span class="k"&gt;raise&lt;/span&gt; &lt;span class="nc"&gt;ValueError&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;msg&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="c1"&gt;# Set the appropriate function according to the mode.
&lt;/span&gt;        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="c1"&gt;# Get all possible input patterns.
&lt;/span&gt;            &lt;span class="n"&gt;possible_inputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get_all_input_patterns&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                &lt;span class="n"&gt;num_qubits&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="p"&gt;)&lt;/span&gt;
            &lt;span class="c1"&gt;# Set each look-up table.
&lt;/span&gt;            &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="n"&gt;quanv_filter&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_lookup_table&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;encoding_method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;encode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;decoding_method&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;decode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                    &lt;span class="n"&gt;input_patterns&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;possible_inputs&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;quanv_filter&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_filters&lt;/span&gt;
            &lt;span class="p"&gt;]&lt;/span&gt;
            &lt;span class="n"&gt;_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;run_single_channel_with_lookup_tables&lt;/span&gt;
        &lt;span class="k"&gt;else&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
            &lt;span class="n"&gt;_run&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run_single_channel&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="n"&gt;all_outputs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;stack&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="p"&gt;[&lt;/span&gt;
                &lt;span class="nf"&gt;_run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;tqdm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
                    &lt;span class="n"&gt;batch_data&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leave&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;desc&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Dataset&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
                &lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# for-loop for batched data
&lt;/span&gt;                &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;channel&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;data&lt;/span&gt;  &lt;span class="c1"&gt;# for-loop for each channel of each data
&lt;/span&gt;            &lt;span class="p"&gt;]&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;

        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;all_outputs&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This function is quite simple. After checking the data shape and whether the look-up tables are being used, it applies each filter to the given data. The return value, &lt;code&gt;all_outputs&lt;/code&gt;, is constructed from the processed image data. Although the actual running part, either &lt;code&gt;self.run_single_channel_with_lookup_tables&lt;/code&gt; or &lt;code&gt;self.run_single_channel&lt;/code&gt;, is the main part responsible for processing the input data, the procedure is quite similar to that of a classical convolutional layer, except that it uses a quanvolutional filter instead of a classical convolutional filter. Therefore, we will not delve into the function in this article.&lt;/p&gt;

&lt;h3&gt;
  
  
  QuanvNN
&lt;/h3&gt;

&lt;p&gt;The &lt;code&gt;QuanvNN&lt;/code&gt; class is written in &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/src/quanv_nn.py" rel="noopener noreferrer"&gt;quanv_nn.py&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Consctuctor
&lt;/h4&gt;

&lt;p&gt;The class has six arguments for the constructor.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__init__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;in_dim&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;quanv_kernel_size&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;
        &lt;span class="n"&gt;quanv_num_filters&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;quanv_encoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseEncoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;quanv_decoder&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;BaseDecoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;quanv_padding_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;str&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;constant&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;in_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;in_dim&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;num_classes&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_kernel_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quanv_kernel_size&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_num_filters&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quanv_num_filters&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_encoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quanv_encoder&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_decoder&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quanv_decoder&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_padding_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;quanv_padding_mode&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;is_lookup_mode&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;

        &lt;span class="c1"&gt;# Create and store the instance of the QuanvLayer class as a member variable.
&lt;/span&gt;        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_layer&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;QuanvLayer&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
            &lt;span class="n"&gt;kernel_size&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;quanv_kernel_size&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;num_filters&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;quanv_num_filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;encoder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_encoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;decoder&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_decoder&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;padding_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;quanv_padding_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;is_lookup_mode&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="c1"&gt;# Create and store the instance of the ClassicalCNN class as a member variable.
&lt;/span&gt;        &lt;span class="n"&gt;new_in_dim&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quanv_num_filters&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;in_dim&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="n"&gt;in_dim&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
        &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classical_cnn&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;ClassicalCNN&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;in_dim&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;new_in_dim&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;num_classes&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;in_dim&lt;/code&gt; and &lt;code&gt;num_classes&lt;/code&gt; are used to construct the CNN part, which is implemented using PyTorch and is not our primary focus here. The &lt;code&gt;quanv_kernel_size&lt;/code&gt;, &lt;code&gt;quanv_num_filters&lt;/code&gt;, &lt;code&gt;quanv_padding_mode&lt;/code&gt;, &lt;code&gt;quanv_encoder&lt;/code&gt;, &lt;code&gt;quanv_decoder&lt;/code&gt; and &lt;code&gt;is_lookup_mode&lt;/code&gt; arguments are used to create an instance of &lt;code&gt;QuanvLayer&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Running part
&lt;/h4&gt;

&lt;p&gt;What is essentially needed in the class is the method to forward the input data. To achieve this, I implemented &lt;code&gt;__call__&lt;/code&gt; and &lt;code&gt;classify&lt;/code&gt;, where &lt;code&gt;__call__&lt;/code&gt; is a special method that allows you to use the instance like a function. The &lt;code&gt;__call__&lt;/code&gt; function outputs the raw output data from the QNN whilst the classify function returns a scalar value representing the class label to which the input data is most likely to belong.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;__call__&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;quanvoluted_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_layer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classical_cnn&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quanvoluted_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

    &lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;-&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;Tensor&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
        &lt;span class="n"&gt;quanvoluted_x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;quanv_layer&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;run&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;batch_data&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;shots&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;self&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;classical_cnn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;classify&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;quanvoluted_x&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;self.quanv_layer&lt;/code&gt; is an instance of the &lt;code&gt;QuanvLayer&lt;/code&gt; class. Both functions contain &lt;code&gt;self.quanv_layer.run&lt;/code&gt;, which was introduced in an earlier section.&lt;/p&gt;

&lt;p&gt;The key takeaway is that these functions allow us to classify or simply obtain an output from the input data.&lt;/p&gt;

&lt;h3&gt;
  
  
  Other sources and scripts
&lt;/h3&gt;

&lt;p&gt;I have also written some other Python scripts to train the model using MNIST data. We will not go through each source in detail, but if you want to train the model, all you need to do is run &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/blob/main/scripts/train_model_with_mnist.py" rel="noopener noreferrer"&gt;train_model_with_mnist.py&lt;/a&gt;. The script requires one argument, which is the path to the configuration file. Examples of configuration files can be found in the &lt;a href="https://github.com/ksk0629/quanvolutional_neural_network/tree/main/configs" rel="noopener noreferrer"&gt;configs&lt;/a&gt; directory. Simply &lt;code&gt;run python scripts/train_model_with_mnist.py -c [config_path]&lt;/code&gt; from the root directory.&lt;/p&gt;

&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;p&gt;We have had a brief look at what the quanvolutional neural network (QNN) is. Essentially, it is a convolutional neural network with a quanvolutional layer, where the concept of the quanvolutional layer is a complete alternative to the classical convolutional layer.&lt;/p&gt;

&lt;p&gt;The introduced implementation of the QNN is readily applicable to the MNIST dataset. However, in my experience, the accuracy of the QNN model has not proven to be superior to that of the corresponding CNN.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fig8nsjykiqdz5z1198kx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fig8nsjykiqdz5z1198kx.png" alt="Accuracy of QNN vs CNN" width="676" height="701"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The original paper does not provide exhaustive details about the experiment, and there is some flexibility in the training algorithms and settings. I suspect that the reason my results differ from those in the original paper may lie in the differences between our configurations.&lt;/p&gt;

&lt;p&gt;Once again, feel free to leave any comments in the section below or raise an issue in the GitHub repository. I welcome your feedback and the opportunity to discuss the QNN or the Python code.&lt;/p&gt;

</description>
      <category>python</category>
      <category>machinelearning</category>
      <category>quantumcomputer</category>
    </item>
    <item>
      <title>Quantum-Classical Machine learning: QuClassi</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Thu, 23 Feb 2023 00:42:28 +0000</pubDate>
      <link>https://dev.to/ksk0629/quantum-classical-machine-learning-quclassi-297h</link>
      <guid>https://dev.to/ksk0629/quantum-classical-machine-learning-quclassi-297h</guid>
      <description>&lt;p&gt;&lt;strong&gt;This post is about the paper "&lt;a href="https://arxiv.org/abs/2103.11307" rel="noopener noreferrer"&gt;QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity&lt;/a&gt;" and was written according to my understanding and thoughts. Please see the paper by yourself if you are interested.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;Quantum computers are computers following quantum mechanics. They possibly solve many problems at a higher speed than classical ones. Many scientists have been working on the study. Here are small lists of quantum algorithms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Shor's algorithm&lt;/li&gt;
&lt;li&gt;Grover's algorithm&lt;/li&gt;
&lt;li&gt;quantum phase estimation algorithm&lt;/li&gt;
&lt;li&gt;Harrow-Hassidim-Lloyd algorithm&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Those are well-known algorithms, so there is much information you can access on google or chatGPT!&lt;/p&gt;

&lt;p&gt;Many people believe that quantum computers could change our lives in the future, and the field of machine learning is no exception. Many scientists have been working in the quantum machine learning field. I will introduce one of the quantum machine learning papers in this post. The paper is QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity. The first version was submitted to quant-ph on 21 March 2021, and the latest version was on 31 March 2022. it is clearly not today's latest paper, but some of the authors submitted a new one to quant-ph on 11 October 2022. The new one is QuCNN : A Quantum Convolutional Neural Network with Entanglement Based Backpropagation, and it looks like the proposed quantum CNN architecture has similar parts to QuClassi. I believe understanding about QuClassi helps to understand QuCNN, which I might write the introduction post in the future.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;I assume the reader has the foundation of quantum computing such as bracket notation and basic quantum gates, as well as qiskit, which is the Python library for quantum computing. If you are not sure what they are, I recommend that you get the knowledge first. The foundation is not difficult so you would understand them easily if you have the foundation of linear algebra.&lt;/strong&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Concept of QuClassi
&lt;/h1&gt;

&lt;p&gt;QuClassi is a quantum neural network for both binary and multi-class classification. This architecture is a hybrid quantum-classic design, which means it uses a classical computer for some preparation and a quantum one for calculation. The key points of QuClassi are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;has three different quantum layers&lt;/li&gt;
&lt;li&gt;has a quantum state fidelity based cost function&lt;/li&gt;
&lt;li&gt;encodes two-dimensional data into one qubit&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Technically, the last one might have already been suggested in another paper, but the method relates to the quantum layers, so we have to know it. The quantum part of QuClassi, obviously, is implemented as a quantum circuit. Basically, the quantum circuit can be broken down into two parts: the data loading and the classifier generating ones. QuClassi outputs the likelihoods of the categories from input data by calculating fidelity between a loaded state and a classifier state. The input data are classified into the category whose corresponding fidelity is maximum. So, the classifier states have to be trained by the dataset to generate appropriate ones before classifying unknown data.&lt;/p&gt;

&lt;h2&gt;
  
  
  Encoding Method
&lt;/h2&gt;

&lt;p&gt;Input data is encoded into a loaded state. But how? I would like to avoid going through mathematical equations in this post, but here, the equations would help us to understand the method more than just only sentences.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fco5uzlnspzvyzjuu9q7m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fco5uzlnspzvyzjuu9q7m.png" alt=" " width="684" height="316"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, the rotation angle of the gates applied to the loaded state for every element of input data is obtained by the above transform. If the dimensionality of the input data is not an even number, then you can extend the data by zero-padding.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantum Fidelity Based Cost Function
&lt;/h2&gt;

&lt;p&gt;Basically, quantum fidelity is the measure of similarity between two quantum states and, at the very least, in this case, the equation is just the squared inner product between two states. In QuClassi architecture, quantum fidelity is obtained by the well-known quantum algorithm SWAP test. The test requires measuring one qubit only once. Measuring many qubits could impose a large error so far, so doing it only once helps to keep the error small.&lt;/p&gt;

&lt;h2&gt;
  
  
  Quantum Layers
&lt;/h2&gt;

&lt;p&gt;The circuit can have three different layers. The actual structure is decided by users. The following image is just one example, which has all the different layers.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hi5s9x5q8qkzn2stwk1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F8hi5s9x5q8qkzn2stwk1.png" alt=" " width="800" height="212"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The input data size is four-dimension. The encoding method requires that the number of qubits is half of the input data; The number is two in this case. The classifier state must be the same size as the loaded state. To obtain the quantum fidelity between the classifier state and the loaded state, the circuit needs one more qubit. That is why the circuit has 5 (2 + 2 + 1) qubits. Inside of red and blue rectangles are the classifier-generating and data-loading parts, respectively. (By the way, I am sorry the words in the image and the ones in this post are not the same. "trained_qubit"'s corresponds to the classifier state and "loaded_qubit"'s does to loaded state.) QuClassi updates the rotation parameters of gates applied to the classifier state during learning time. I put each layer's name in the image: single qubit unitary layer, dual qubit unitary layer and controlled qubit unitary layer. Users combine arbitrary numbers of all or part of them in arbitrary order to create a good structure for their dataset. Unfortunately, the paper does not include a careful argument about the layer structure, however, they apparently prefer to use the architecture with only the single qubit unitary layer. It could depend on each dataset. The investigation of the layers would be useful.&lt;/p&gt;

&lt;p&gt;we went through some key points of QuClassi architecture so far, but, as you may have already noticed, the circuit can keep only one classifier state. Obviously, one classifier state corresponds to one category. It means n circuits must be prepared for n categories. But do not worry. We do not need to run the all circuits at once and their layers' sequence (not parameters) must be the same, which means the minimum number of qubits that QuClassi runs without problems is the same as for one circuit to run without any problems even during training the circuits.&lt;/p&gt;

&lt;h1&gt;
  
  
  On Learning
&lt;/h1&gt;

&lt;p&gt;We do not go through the algorithm in detail but roughly do it without mathematical equations. Learning circuits aim to obtain the parameters to represent the classifier states well. So, the rotation angles keep on updating by the quantum fidelity between the classifier state and the loaded state.&lt;/p&gt;

&lt;h1&gt;
  
  
  On Reported Result
&lt;/h1&gt;

&lt;p&gt;With the MNIST and Iris datasets, the author reported QuClassi achieved state-of-the-art performance and the performance compared to classical counterparts is also better in the sense of the number of parameters. They also reported the result of the performance on the actual quantum computer IBM-Q, which means we can try to train and classify data with QuClassi on the quantum computer freely! Note that, that is a brilliant chance, but it would take a long time even if the combination of layers and dataset are simple.&lt;/p&gt;

&lt;h1&gt;
  
  
  My thoughts
&lt;/h1&gt;

&lt;p&gt;Basically, QuClassi is the architecture classifying data into an appropriate category by similarity, which is quantum fidelity. Quantum fidelity is an inner product between two unit vectors as every quantum state is expressed as a complex unit vector. So, that sounds like the best performance for the dataset is achieved by the architecture with only the single qubit unitary layer whose the parameters obtained by averaging all encoded data belonging to the same category because the parameters create the state located in the centre among the data. In that sense, the initial parameters should probably be chosen near the averaged parameters. It would help to reduce the number of epochs.&lt;/p&gt;

&lt;p&gt;I am curious about how much the different layers affect the performance with other datasets. In the paper, their results related to compare to QuClassi's that have different layers are not rich. So, it might help us to understand the effect of each layer to try to learn and evaluate with other datasets.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;We roughly went through QuClassi architecture. I have already tried to use QuClassi as a simulation actually and succeeded in seeing similar results with MNIST and Iris datasets as well as breast cancer and wine datasets. It takes a massive time to train the parameters if the amount of data increases. Of course, the increase is natural but the time is so longer than the classical counterpart due to the limit of the quantum computing device and simulation. I am looking forward to the day we can access quantum computers without any concern.&lt;/p&gt;

&lt;p&gt;You also run QuClassi on our computer as a simulation with Python. The following list is my best knowledge about the implementation.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;a href="https://github.com/Samuelstein1224/QuClassiExample" rel="noopener noreferrer"&gt;Samuelstein1224/QuClassiExample&lt;/a&gt; by the author's&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/arkmohba/ARK_study_Quclassi" rel="noopener noreferrer"&gt;arkmohba/ARK_study_Quclassi&lt;/a&gt; by Japanese company&lt;/li&gt;
&lt;li&gt;
&lt;a href="https://github.com/ksk0629/quclassi" rel="noopener noreferrer"&gt;ksk0629/quclassi&lt;/a&gt; by forking from and modifying the Japanese company's one&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I might write a post about the above my git repository one day.&lt;/p&gt;

&lt;p&gt;I would really appreciate it if you tell me my miss understanding, ask me questions or comment below.&lt;/p&gt;

</description>
      <category>llm</category>
      <category>systemdesign</category>
      <category>discuss</category>
      <category>blockchain</category>
    </item>
    <item>
      <title>Using MLflow on google colaboratory with github to build cosy environment: on VS code</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Sat, 19 Mar 2022 03:38:15 +0000</pubDate>
      <link>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-github-to-build-cosy-environment-building-on-vs-code-1gd</link>
      <guid>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-github-to-build-cosy-environment-building-on-vs-code-1gd</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In some previous articles, I built a cosy environment to perform machine learning experiments with&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;google colaboratory (to perform experiments)&lt;/li&gt;
&lt;li&gt;github (to manage source codes and information of each experiment)&lt;/li&gt;
&lt;li&gt;ngrok (to connect to mlflow window)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It was not bad, but how cosy the environment is small a little bit. There are two reasons:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The text editor on google colab is not cosy.&lt;/li&gt;
&lt;li&gt;It is troublesome to run some cells when committing and pushing every time.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;We can use a terminal if we buy an account of google colab pro [&lt;a href="https://twitter.com/googlecolab/status/1336698772760379392?lang=en" rel="noopener noreferrer"&gt;official post on twitter&lt;/a&gt;], but I do not need a pro account for now except for those troubles. Then, what should I do? One of the answers is vscode with google colab.&lt;/p&gt;

&lt;p&gt;Note that, I usually use windows 10, the following discussion is for windows.&lt;/p&gt;

&lt;h1&gt;
  
  
  Process
&lt;/h1&gt;

&lt;p&gt;I'll suppose that vs code is installed on the computer.&lt;/p&gt;

&lt;h2&gt;
  
  
  Downloading cloudflare
&lt;/h2&gt;

&lt;p&gt;1.Download the execution file from [&lt;a href="https://developers.cloudflare.com/cloudflare-one/connections/connect-apps/install-and-setup/installation/#windows" rel="noopener noreferrer"&gt;cloudflare zero trust&lt;/a&gt;]&lt;/p&gt;

&lt;p&gt;2.Rename the file name to &lt;code&gt;cloudflare.exe&lt;/code&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Installing &lt;code&gt;remote-ssh&lt;/code&gt; extension on vs code
&lt;/h2&gt;

&lt;p&gt;1.Launch vs code&lt;/p&gt;

&lt;p&gt;2.Install &lt;code&gt;remote-ssh&lt;/code&gt; extension&lt;/p&gt;

&lt;p&gt;Pressing &lt;code&gt;Ctrl + Shift + X&lt;/code&gt;, the extensions tab is opened. Inputting &lt;code&gt;remote ssh&lt;/code&gt; in the search box like below, the extension would appear.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrao1ymw3fehcbml6xrz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fbrao1ymw3fehcbml6xrz.png" alt=" " width="305" height="288"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;There it is! It is installed when the green Install button is clicked though there is no green Install button on the extension in the picture because I have already got it. &lt;/p&gt;

&lt;p&gt;3.Setup ssh config&lt;/p&gt;

&lt;p&gt;There is the &lt;code&gt;config&lt;/code&gt; file in the home directory, like &lt;code&gt;~/.ssh/config&lt;/code&gt;. The file should be the following one.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight conf"&gt;&lt;code&gt;&lt;span class="n"&gt;Host&lt;/span&gt; *.&lt;span class="n"&gt;trycloudflare&lt;/span&gt;.&lt;span class="n"&gt;com&lt;/span&gt;
    &lt;span class="n"&gt;HostName&lt;/span&gt; %&lt;span class="n"&gt;h&lt;/span&gt;
    &lt;span class="n"&gt;User&lt;/span&gt; &lt;span class="n"&gt;root&lt;/span&gt;
    &lt;span class="n"&gt;Port&lt;/span&gt; &lt;span class="m"&gt;22&lt;/span&gt;
    &lt;span class="n"&gt;ProxyCommand&lt;/span&gt; &amp;lt;&lt;span class="n"&gt;absolute&lt;/span&gt; &lt;span class="n"&gt;path&lt;/span&gt; &lt;span class="n"&gt;to&lt;/span&gt; &lt;span class="n"&gt;cloudflare&lt;/span&gt;.&lt;span class="n"&gt;exe&lt;/span&gt;&amp;gt; &lt;span class="n"&gt;access&lt;/span&gt; &lt;span class="n"&gt;ssh&lt;/span&gt; --&lt;span class="n"&gt;hostname&lt;/span&gt; %&lt;span class="n"&gt;h&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In my case, I put &lt;code&gt;cloudflare.exe&lt;/code&gt; in C directory, so my ProxyCommand line is &lt;code&gt;ProxyCommand C:\\cloudflared.exe access ssh --hostname %h&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  Preparing config data
&lt;/h2&gt;

&lt;p&gt;1.Create &lt;code&gt;config&lt;/code&gt; directory on the google drive&lt;/p&gt;

&lt;p&gt;2.Create &lt;code&gt;general_config.yaml&lt;/code&gt; and upload it to the &lt;code&gt;config&lt;/code&gt; directory&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;general_config.yaml&lt;/code&gt; must have the following information.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;username&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;email&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;access token&lt;/span&gt;
&lt;span class="na"&gt;cloudflare&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;password&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;password_you_decided&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;token&lt;/code&gt; in &lt;code&gt;github&lt;/code&gt; block cloud be got by following [&lt;a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token" rel="noopener noreferrer"&gt;Creating a personal access token&lt;/a&gt;].&lt;/p&gt;

&lt;h2&gt;
  
  
  Creating a new google colab notebook
&lt;/h2&gt;

&lt;p&gt;1.Create a new google colab notebook to access google colab from vs code&lt;/p&gt;

&lt;p&gt;2.Run the following codes&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Prepare environment
&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;colab_ssh&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="n"&gt;upgrade&lt;/span&gt;

&lt;span class="c1"&gt;# Import necessary modules
&lt;/span&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;colab_ssh&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;launch_ssh_cloudflared&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.colab&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;drive&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;

&lt;span class="c1"&gt;# Mount my google drive
&lt;/span&gt;&lt;span class="n"&gt;drive_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/content/gdrive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;drive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drive_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load general config
&lt;/span&gt;&lt;span class="n"&gt;config_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drive_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MyDrive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general_config.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;yml&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;safe_load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Set git config
&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="c1"&gt;# Create symbolic link
&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;ln&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;sfn&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;content&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;gdrive&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;MyDrive&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt; &lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;root&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;workspace&lt;/span&gt;

&lt;span class="c1"&gt;# Launch ssh cloudflare
&lt;/span&gt;&lt;span class="nf"&gt;launch_ssh_cloudflared&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;password&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;cloudflare&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;password&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The outputted cell has the following information.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkiqf22z55wvc7a2ajvb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnkiqf22z55wvc7a2ajvb.png" alt=" " width="800" height="125"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;3.Copy the url in &lt;code&gt;VSCode Remote SSH&lt;/code&gt; block&lt;/p&gt;

&lt;h2&gt;
  
  
  Accessing to google colab from vs code
&lt;/h2&gt;

&lt;p&gt;1.Open command palette on vs code by pressing &lt;code&gt;Ctrl + Shift + P&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;2.Input &lt;code&gt;Remote-SSH: Connect to Host...&lt;/code&gt; in the box and press enter&lt;/p&gt;

&lt;p&gt;3.Input the copied url in the box and press enter&lt;/p&gt;

&lt;p&gt;4.Input &lt;code&gt;Continue&lt;/code&gt; in the box and press enter&lt;/p&gt;

&lt;p&gt;5.Input the password written in &lt;code&gt;general_config.yaml&lt;/code&gt; in the box and press enter&lt;/p&gt;

&lt;p&gt;There we go! The vs code has already been accessed to google colab. Press &lt;code&gt;Ctrl + Shift + @&lt;/code&gt; and then, input &lt;code&gt;python -V&lt;/code&gt; or &lt;code&gt;pip list&lt;/code&gt;. We could see the same output as one when we run the same codes on a google colab notebook.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;Now, we could edit any file on vs code and also commit and push on the terminal of vs code. It is quite cosy for me. Furthermore, we cloud use mlflow that I introduced a little bit in the previous article. Just run &lt;code&gt;mlflow ui&lt;/code&gt; in the terminal and open the outputted url in any browser.&lt;/p&gt;

&lt;p&gt;I would appreciate it if someone shares tips in the discussion box below.&lt;/p&gt;

</description>
      <category>vscode</category>
      <category>googlecloud</category>
      <category>python</category>
    </item>
    <item>
      <title>My own chatbot by fine-tuning GPT-2</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Sat, 19 Feb 2022 14:29:10 +0000</pubDate>
      <link>https://dev.to/ksk0629/my-own-chatbot-by-fine-tuning-gpt-2-m0n</link>
      <guid>https://dev.to/ksk0629/my-own-chatbot-by-fine-tuning-gpt-2-m0n</guid>
      <description>&lt;p&gt;(Updated at 20, February, 2022)&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;In this post, I will fine-tune GPT-2, especially rinna's, which are one of the Japanese GPT-2 models. I am Japanese and most of my chat histories are in Japanese. Because of that, I will fine-tune "Japanese" GPT-2.&lt;/p&gt;

&lt;p&gt;GPT-2 stands for Generative pre-trained transformer 2 and it generates sentences as the name shows. We could build a chatbot by fine-tuning a pre-trained model with tiny training data.&lt;/p&gt;

&lt;p&gt;I will not go through GPT-2 in detail. I highly recommend the article &lt;a href="https://dev.to/oursky/how-to-build-an-ai-text-generator-text-generation-with-a-gpt-2-model-4346"&gt;How to Build an AI Text Generator: Text Generation with a GPT-2 Model&lt;/a&gt; on dev.to to understand what is GPT-2 and what is a language model.&lt;/p&gt;

&lt;p&gt;git repository: &lt;a href="https://github.com/ksk0629/chatbot_with_gpt2" rel="noopener noreferrer"&gt;chatbot_with_gpt2&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;I would appreciate the author of the following two articles.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://qiita.com/Yokohide/items/e74254f334e1335cd502" rel="noopener noreferrer"&gt;GPT-2で友達を再現して対話してみた&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://dev.to/oursky/how-to-build-an-ai-text-generator-text-generation-with-a-gpt-2-model-4346"&gt;How to Build an AI Text Generator: Text Generation with a GPT-2 Model&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Thanks to the first author, I could build my chatbot model. The sources in my git repository are almost constructed with his codes. I just summarized them. Thanks to the second author, I could go through GPT-2.&lt;/p&gt;

&lt;h1&gt;
  
  
  What is rinna
&lt;/h1&gt;

&lt;p&gt;rinna is a conversational pre-trained model given from rinna Co., Ltd. and five pre-trained models are available on hugging face [&lt;a href="https://huggingface.co/rinna" rel="noopener noreferrer"&gt;rinna Co., Ltd.&lt;/a&gt;] on 19, February 2022. rinna is a bit famous in Japanese because they published rinna AI on LINE, one of the most popular SNS apps in Japan. She is a junior high school girl. We could take conversations on LINE.&lt;/p&gt;

&lt;p&gt;I am not sure when the models are published on hugging face, but anyways, the models are available now. I will fine-tune &lt;code&gt;rinna/japanese-gpt2-small&lt;/code&gt; whose number of parameters is small. By the way, I wanted to use &lt;code&gt;rinna/japanese-gpt-1b&lt;/code&gt; whose number of parameters is around one billion, but I couldn't because of the memory capacity on google colab.&lt;/p&gt;

&lt;h1&gt;
  
  
  Process
&lt;/h1&gt;

&lt;p&gt;I will suppose you have a google and git account and can use google colab.&lt;/p&gt;

&lt;p&gt;Furthermore, I will use a chat history on LINE. If you have no account on the app, it is okay. All you have to do is prepare a chat history and modify the data. I know these processes are the hardest and most bothering things though. If you have the account, the following processes would work. Note that, if your LINE setting language is Japanese, you should change it to English until exporting a chat history because the following processes are supposing the setting language (not message language) is English.&lt;/p&gt;

&lt;h2&gt;
  
  
  Prepare the environment
&lt;/h2&gt;

&lt;p&gt;At the end of this process, your google drive is constructed as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MyDrive ---- chatbot_with_gpt2.ipynb
           |
           |- config
           |    |- general_config.yaml
           |
           |- data
                |- chat_history.txt
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;1: Clone &lt;a href="https://github.com/ksk0629/chatbot_with_gpt2" rel="noopener noreferrer"&gt;chatbot_with_gpt2&lt;/a&gt; repository on your local machine.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is accomplished by running the following command on the git bash.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;git clone https://github.com/ksk0629/chatbot_with_gpt2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;ul&gt;
&lt;li&gt;&lt;p&gt;2: Upload &lt;code&gt;chatbot_with_gpt2/chatbot_with_gpt2.ipynb&lt;/code&gt; to the google drive.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;3: Make a directory named &lt;code&gt;config&lt;/code&gt; on your google drive and create &lt;code&gt;general_config.yaml&lt;/code&gt; in the config folder.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;code&gt;general_config.yaml&lt;/code&gt; is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_github_username&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_email&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_access_token&lt;/span&gt;
&lt;span class="na"&gt;ngrok&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;anything&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The &lt;code&gt;ngrok&lt;/code&gt; block is needless, but it is needed to avoid an error below.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4: Get a chat history from LINE.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We can get the history by following the official announcement [&lt;a href="https://help.line.me/line/android/?contentId=20007388#:~:text=1.,want%20to%20send%20the%20file." rel="noopener noreferrer"&gt;Help centre - Chat history&lt;/a&gt;].&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5: Make a directory named &lt;code&gt;data&lt;/code&gt; on your google drive and move the chat history to the directory.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Prepare training data and build the model
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;1: Open &lt;code&gt;chatbot_with_gpt2.ipynb&lt;/code&gt; on google colaboratory.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;2: Run the cells in Preparation block.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The environment is prepared to get training data and build the model by running the cells.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;3: Change &lt;code&gt;chatbot_with_gpt2/pre_processor_config.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The initial yaml file is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;line&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;initial&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;input_username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;input_username"&lt;/span&gt;
    &lt;span class="na"&gt;output_username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;output_username"&lt;/span&gt;
    &lt;span class="na"&gt;target_year_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;[2016,2017,2018,2019,2020,2021,2022]"&lt;/span&gt;
  &lt;span class="na"&gt;path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;input_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/content/gdrive/MyDrive/data/chat_history.txt"&lt;/span&gt;
    &lt;span class="na"&gt;output_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_history_cleaned.pk"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to change at least initial block. The meaning of each line is as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;input_username: a username of messages that you want to input into the model&lt;/li&gt;
&lt;li&gt;output_username: a username of messages that you want the model to output&lt;/li&gt;
&lt;li&gt;target_year_list: years that you want to use to train the model&lt;/li&gt;
&lt;li&gt;input_path: path to the raw chat history&lt;/li&gt;
&lt;li&gt;output_path: path to the cleaned data that is obtained by the following process&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Note that, if you do not change output_path, then your training data would not be available after closing the notebook. Of course, it is available whilst the notebook is working.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;4: Run the cell in Preprocessing data block.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The data is cleaned in the cell.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;5: Change &lt;code&gt;chatbot_with_gpt2/model_config.yaml&lt;/code&gt;.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The initial yaml file is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;general&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;basemodel&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;rinna/japanese-gpt2-xsmall"&lt;/span&gt;
&lt;span class="na"&gt;dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;input_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;chat_history_cleaned.pk"&lt;/span&gt;
  &lt;span class="na"&gt;output_path&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;gpt2_train_data.txt"&lt;/span&gt;
&lt;span class="na"&gt;train&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;epochs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10&lt;/span&gt;
  &lt;span class="na"&gt;save_steps&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;10000&lt;/span&gt;
  &lt;span class="na"&gt;save_total_limit&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;per_device_train_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;per_device_eval_batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
  &lt;span class="na"&gt;output_dir&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;model/default"&lt;/span&gt;
  &lt;span class="na"&gt;use_fast_tokenizer&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;False&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You have to change input_path in dataset block to the path to the cleaned data, which is specified in &lt;code&gt;pre_processor_config.yaml&lt;/code&gt;. You can change basemodel to rinna/japanese-gpt2-small, but others (medium and 1b) would not work because of a lack of GPU memory as I mentioned in What is rinna section.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;6: Run the cells in Training data preparation and Building model block.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is all! After running this cell, all you have to do is wait for a while. You would see your model file in the directory that is specified in &lt;code&gt;model_config.yaml&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;
  
  
  Let's talk to the model
&lt;/h1&gt;

&lt;p&gt;Again, all you have to do is run the only one cell in Talking with the model block. Then, the source code is running and you could talk with the model, like the following.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsais8av6ghwium6oz0pa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fsais8av6ghwium6oz0pa.png" alt=" " width="685" height="402"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;I fine-tuned GPT-2 with my chat history on LINE. I certainly did it, but there are the following problems as you could see in Let's talk to the model section.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;There is unnecessary line &lt;code&gt;Setting 'pad_token_id' to 'eos_token_id':2 for open-end generation.&lt;/code&gt; in each conversation.&lt;/li&gt;
&lt;li&gt;There are some tokens, like &lt;code&gt;&amp;lt;br:&lt;/code&gt;, &lt;code&gt;[&amp;lt;unk&amp;gt;hoto]&amp;lt;br///&lt;/code&gt;, and &lt;code&gt;&amp;lt;br/ゥ&amp;gt;&lt;/code&gt;, that disturb coherence sentence.&lt;/li&gt;
&lt;li&gt;The model did not reply well.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The first response&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;帰ったんか
おつかれさま!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;looks quite good because "おっす" means "Hey" and the response means "You are home. You’ve got to be exhausted". Something like these. But the others look wrong. To improve the model, I could clean training data more and I need to understand GPT-2 and the source codes.&lt;/p&gt;

&lt;p&gt;If you have any suggestions, comments, or questions about this article, please comment below. I'd appreciate it.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Toward understanding DNN (deep neural network) well: iris dataset</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Sun, 13 Feb 2022 09:21:11 +0000</pubDate>
      <link>https://dev.to/ksk0629/toward-understanding-dnn-deep-neural-network-well-iris-dataset-5179</link>
      <guid>https://dev.to/ksk0629/toward-understanding-dnn-deep-neural-network-well-iris-dataset-5179</guid>
      <description>&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;This is my second "toward understanding DNN (deep neural network) well" series. I will explore the effect of the numbers of layers and units again with the iris dataset.&lt;/p&gt;

&lt;p&gt;github repository: &lt;a href="https://github.com/ksk0629/comparison_of_dnn" rel="noopener noreferrer"&gt;comparison_of_dnn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that, this is not a "guide", this is a memo from a beginner to beginners. If you have any comments, suggestions, questions, etc. whilst reading this article, please let me know in the comments below.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;a href="https://scikit-learn.org/stable/auto_examples/datasets/plot_iris_dataset.html" rel="noopener noreferrer"&gt;Iris dataset&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;Obviously, it is a so famous dataset. Most people would not need an explanation about this dataset. But I will see a little bit because I am a beginner.&lt;/p&gt;

&lt;p&gt;We can use this dataset by &lt;code&gt;sklearn.datasets.load_iris()&lt;/code&gt; function. This is for multi-classification. It contains 150 data and each data has the following four features.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sepal length (cm)&lt;/li&gt;
&lt;li&gt;sepal width (cm)&lt;/li&gt;
&lt;li&gt;petal length (cm)&lt;/li&gt;
&lt;li&gt;petal width (cm)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The number of classes is three and the dataset has the same numbers of data belonging to each class. This dataset is the three-classification dataset. As most of us know, there has no missing data, but this is like a tutorial article, so I check whether there are missing values.&lt;/p&gt;

&lt;p&gt;Input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;datasets&lt;/span&gt;

&lt;span class="n"&gt;iris_dataset&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;sklearn&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;datasets&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_iris&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;as_frame&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;frame&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;iris_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Cool. There are no missing values. Next, I check the basic statistics.&lt;/p&gt;

&lt;p&gt;Input&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;iris_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;      sepal length (cm) sepal width (cm) petal length (cm) /
mean           5.843333         3.057333          3.758000 /
std            0.828066         0.435866          1.765298 /
min            4.300000         2.000000          1.000000 /
25%            5.100000         2.800000          1.600000 /
50%            5.800000         3.000000          4.350000 /
75%            6.400000         3.300000          5.100000 /
max            7.900000         4.400000          6.900000 /

      petal width (cm)    target
              1.199333  1.000000
              0.762238  0.819232
              0.100000  0.000000
              0.300000  0.000000
              1.300000  1.000000
              1.800000  2.000000
              2.500000  2.000000
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, I am interested in data analysis, but I have no ability for analysing it so far. I will analyse the data someday.&lt;/p&gt;

&lt;h1&gt;
  
  
  Comparison
&lt;/h1&gt;

&lt;p&gt;For the sake of simplicity, I suppose the following conditions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A model is fixed all conditions except for the number of layers and the numbers of units of each layer.&lt;/li&gt;
&lt;li&gt;Any data preprocessing is not performed.&lt;/li&gt;
&lt;li&gt;Seed is fixed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Most of those conditions can be changed or removed. All you have to do is change &lt;code&gt;config_iris.yaml&lt;/code&gt;. The yaml file has the following lines.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;mlflow&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;experiment_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;iris&lt;/span&gt;
  &lt;span class="na"&gt;run_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;default&lt;/span&gt;
&lt;span class="na"&gt;dataset&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;eval_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.25&lt;/span&gt;
  &lt;span class="na"&gt;test_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.25&lt;/span&gt;
  &lt;span class="na"&gt;train_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;0.75&lt;/span&gt;
  &lt;span class="na"&gt;shuffle&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;True&lt;/span&gt;
&lt;span class="na"&gt;dnn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n_layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;n_units_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;activation_function_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;softmax&lt;/span&gt;
  &lt;span class="na"&gt;seed&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;57&lt;/span&gt;
&lt;span class="na"&gt;dnn_train&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;epochs&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;30&lt;/span&gt;
  &lt;span class="na"&gt;batch_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;4&lt;/span&gt;
  &lt;span class="na"&gt;patience&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The following changes work to build a model that has five layers (four dense layers plus one output layer), which have relu function as their activation functions, and 8 units.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;dnn&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;n_layers&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
  &lt;span class="na"&gt;n_units_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;8&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="m"&gt;3&lt;/span&gt;
  &lt;span class="na"&gt;activation_function_list&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;relu&lt;/span&gt;
    &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;softmax&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Note that, some of the model's information is hard coding. You have to write codes to change them. For example, model's loss function is cross entropy, which is calculated by &lt;code&gt;keras.losses.SparseCategoricalCrossentropy()&lt;/code&gt; function and it is specified in &lt;code&gt;iris_dnn.py&lt;/code&gt;: &lt;br&gt;
&lt;a href="https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/iris_dnn.py#L28-L30" rel="noopener noreferrer"&gt;https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/iris_dnn.py#L28-L30&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  result
&lt;/h2&gt;

&lt;p&gt;First, I summarise all results. The losses and accuracy are as follows.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#layers&lt;/th&gt;
&lt;th&gt;#parameters&lt;/th&gt;
&lt;th&gt;training loss&lt;/th&gt;
&lt;th&gt;evaluation loss&lt;/th&gt;
&lt;th&gt;test loss&lt;/th&gt;
&lt;th&gt;test accuracy&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;35&lt;/td&gt;
&lt;td&gt;0.166&lt;/td&gt;
&lt;td&gt;0.136&lt;/td&gt;
&lt;td&gt;0.157&lt;/td&gt;
&lt;td&gt;0.947&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;67&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.086&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.022&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.039&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;131&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.086&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.033&lt;/td&gt;
&lt;td&gt;0.043&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;1.0&lt;/strong&gt;&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;2&lt;/td&gt;
&lt;td&gt;259&lt;/td&gt;
&lt;td&gt;0.09&lt;/td&gt;
&lt;td&gt;0.024&lt;/td&gt;
&lt;td&gt;0.047&lt;/td&gt;
&lt;td&gt;0.974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;3&lt;/td&gt;
&lt;td&gt;263&lt;/td&gt;
&lt;td&gt;0.104&lt;/td&gt;
&lt;td&gt;&lt;strong&gt;0.018&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.069&lt;/td&gt;
&lt;td&gt;0.974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;4&lt;/td&gt;
&lt;td&gt;260&lt;/td&gt;
&lt;td&gt;0.123&lt;/td&gt;
&lt;td&gt;0.05&lt;/td&gt;
&lt;td&gt;0.115&lt;/td&gt;
&lt;td&gt;0.947&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;5&lt;/td&gt;
&lt;td&gt;261&lt;/td&gt;
&lt;td&gt;0.089&lt;/td&gt;
&lt;td&gt;0.089&lt;/td&gt;
&lt;td&gt;0.075&lt;/td&gt;
&lt;td&gt;0.974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;6&lt;/td&gt;
&lt;td&gt;255&lt;/td&gt;
&lt;td&gt;0.138&lt;/td&gt;
&lt;td&gt;0.043&lt;/td&gt;
&lt;td&gt;0.119&lt;/td&gt;
&lt;td&gt;0.947&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;7&lt;/td&gt;
&lt;td&gt;263&lt;/td&gt;
&lt;td&gt;0.091&lt;/td&gt;
&lt;td&gt;0.023&lt;/td&gt;
&lt;td&gt;0.047&lt;/td&gt;
&lt;td&gt;0.974&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;8&lt;/td&gt;
&lt;td&gt;261&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;0.316&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;9&lt;/td&gt;
&lt;td&gt;259&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;1.099&lt;/td&gt;
&lt;td&gt;0.316&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The amount of test data is 38 and the dataset has 12 data belonging to class 0, 13 data belonging to class 1, and 13 data belonging to class 2.&lt;/p&gt;

&lt;p&gt;I performed 11 experiences to explore the following two things.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;effect of the number of parameters&lt;/li&gt;
&lt;li&gt;effect of the number of layers&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The experiments from the first to the fourth are for the first one and the experiments from the fourth to eleventh are for the second one.&lt;/p&gt;

&lt;p&gt;The result says the following facts.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;The model that has two layers and 67 parameters is the best one in the sense of the test loss value.&lt;/li&gt;
&lt;li&gt;The model that has two layers and 131 parameters is the best one in the sense of the test accuracy.&lt;/li&gt;
&lt;li&gt;The models that have 8 layers and 9 layers are the worst ones.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;It is a bit surprising to me because I expected the best model would be one whose layers and parameters are more than the above best ones. It is possibly due to the distribution of the test data because it might be too small to evaluate the performance. But at least under the above conditions, the two models that have two layers are the best ones. It possibly means that other ones became overfitting.&lt;/p&gt;

&lt;p&gt;As mentioned later, the vanishing gradient occurred in the eight and nine layers model experiments. That is, the eight layers are too much to learn well at least with the iris data under the above conditions.&lt;/p&gt;

&lt;p&gt;Except for the models that were occurred the vanishing gradient problem and the best one in the sense of test accuracy, all of the models classified correctly 36 or 37 test data. And interestingly, one of the data classified wrongly is the same one. It possibly implies the distribution of the test data is not great, which means there is a difference between the training data and the test data.&lt;/p&gt;

&lt;p&gt;Furthermore, most of the models correctly classified most of the data, which means DNN is so effective to the iris data even though the model structure is so simple.&lt;/p&gt;
&lt;h3&gt;
  
  
  two layers with 35 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 4)                 20        

 dense_1 (Dense)             (None, 3)                 15        

=================================================================
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.166&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.136&lt;/li&gt;
&lt;li&gt;test loss: 0.157&lt;/li&gt;
&lt;li&gt;test accuracy: 0.947&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The number of the correct outputted results is 36 since the amount of test data is 38. It looks great and it actually works great. At least for iris data, DNN is a very powerful tool even though the model has a very simple structure.&lt;/p&gt;

&lt;h3&gt;
  
  
  two layers with 67 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 3)                 27        

=================================================================
Total params: 67
Trainable params: 67
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.086&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.022&lt;/li&gt;
&lt;li&gt;test loss: 0.039&lt;/li&gt;
&lt;li&gt;test accuracy: 0.974 &lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 test data.&lt;/p&gt;

&lt;h3&gt;
  
  
  two layers with 131 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                80        

 dense_1 (Dense)             (None, 3)                 51        

=================================================================
Total params: 131
Trainable params: 131
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.086&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.033&lt;/li&gt;
&lt;li&gt;test loss: 0.043&lt;/li&gt;
&lt;li&gt;test accuracy: 1.0&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified all test data.&lt;/p&gt;

&lt;h3&gt;
  
  
  two layers with 259 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                160       

 dense_1 (Dense)             (None, 3)                 99        

=================================================================
Total params: 259
Trainable params: 259
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.09&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.024&lt;/li&gt;
&lt;li&gt;test loss: 0.047&lt;/li&gt;
&lt;li&gt;test accuracy: 0.974&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 test data.&lt;/p&gt;

&lt;h3&gt;
  
  
  three layers with 263 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt; Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                80        

 dense_1 (Dense)             (None, 9)                 153       

 dense_2 (Dense)             (None, 3)                 30        

=================================================================
Total params: 263
Trainable params: 263
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.104&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.018&lt;/li&gt;
&lt;li&gt;test loss: 0.069&lt;/li&gt;
&lt;li&gt;test accuracy: 0.974&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 data too.&lt;/p&gt;

&lt;h3&gt;
  
  
  four layers with 260 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                70        

 dense_1 (Dense)             (None, 9)                 135       

 dense_2 (Dense)             (None, 4)                 40        

 dense_3 (Dense)             (None, 3)                 15        

=================================================================
Total params: 260
Trainable params: 260
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.123&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.05&lt;/li&gt;
&lt;li&gt;test loss: 0.115&lt;/li&gt;
&lt;li&gt;test accuracy: 0.947&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 36 data.&lt;/p&gt;

&lt;h3&gt;
  
  
  five layers with 261 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 12)                60        

 dense_1 (Dense)             (None, 8)                 104       

 dense_2 (Dense)             (None, 6)                 54        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 3)                 15        

=================================================================
Total params: 261
Trainable params: 261
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.089&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.089&lt;/li&gt;
&lt;li&gt;test loss: 0.075&lt;/li&gt;
&lt;li&gt;test accuracy: 0.974&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 data.&lt;/p&gt;

&lt;h3&gt;
  
  
  six layers with 255 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10)                50        

 dense_1 (Dense)             (None, 8)                 88        

 dense_2 (Dense)             (None, 6)                 54        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 4)                 20        

 dense_5 (Dense)             (None, 3)                 15        

=================================================================
Total params: 255
Trainable params: 255
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.138&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.043&lt;/li&gt;
&lt;li&gt;test loss: 0.119&lt;/li&gt;
&lt;li&gt;test accuracy: 0.947&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 data.&lt;/p&gt;

&lt;h3&gt;
  
  
  seven layers with 263 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10)                50        

 dense_1 (Dense)             (None, 6)                 66        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 6)                 42        

 dense_4 (Dense)             (None, 4)                 28        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 3)                 15        

=================================================================
Total params: 263
Trainable params: 263
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.091&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.023&lt;/li&gt;
&lt;li&gt;test loss: 0.047&lt;/li&gt;
&lt;li&gt;test accuracy: 0.974&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This model correctly classified 37 data.&lt;/p&gt;

&lt;h3&gt;
  
  
  eight layers with 261 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 6)                 54        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 6)                 42        

 dense_4 (Dense)             (None, 4)                 28        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 4)                 20        

 dense_7 (Dense)             (None, 3)                 15        

=================================================================
Total params: 261
Trainable params: 261
Non-trainable params: 0
________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 1.099&lt;/li&gt;
&lt;li&gt;evaluation loss: 1.099&lt;/li&gt;
&lt;li&gt;test loss: 1.099&lt;/li&gt;
&lt;li&gt;test accuracy: 0.316&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vanishing gradient problem occurred whilst learning. In fact, the training loss converged soon:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1xhoeti2hi001dvl53q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq1xhoeti2hi001dvl53q.png" alt=" " width="800" height="211"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It implies 8 layers are too much to learn at least with the iris data.&lt;/p&gt;
&lt;h3&gt;
  
  
  nine layers with 259 parameters
&lt;/h3&gt;

&lt;p&gt;The structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 6)                 54        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 4)                 20        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 4)                 20        

 dense_7 (Dense)             (None, 4)                 20        

 dense_8 (Dense)             (None, 3)                 15        

=================================================================
Total params: 259
Trainable params: 259
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final indices are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 1.099&lt;/li&gt;
&lt;li&gt;evaluation loss: 1.099&lt;/li&gt;
&lt;li&gt;test loss: 1.099&lt;/li&gt;
&lt;li&gt;test accuracy: 0.316&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The vanishing gradient occurred too. I have already observed this problem in the experiment of eight layers model. This experiment is for just checking whether it was certainly due to the number of layers and the vanishing gradient problem occurred again.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;I explored the effect of the numbers of layers and the number of parameters with the iris dataset. As the result, I found the two layers models are the best ones in the sense of the test loss value and the test accuracy though it might be due to the small test size. The eight and nine layers models learnt anything. The vanishing gradient occurred. It implies it is too much to learn if the amount of layers is more than eight.&lt;/p&gt;

&lt;p&gt;As mentioned in result section, the data that most of the models were classified wrongly is the same and the data is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;sepal length (cm)    6.3
sepal width (cm)     2.5
petal length (cm)    4.9
petal width (cm)     1.5
target               1.0
Name: 72, dtype: float64
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;I guess it is certainly important to check whether or not the data is an outlier.&lt;/p&gt;

&lt;p&gt;All of the experiences were performed under 57 seed. It sounds interesting to change the seed and perform the same experiences. Note that, the seed also affects a way of splitting the iris data into ones for training, evaluation, and test data. To use the same test data, it is needed to change &lt;code&gt;load_splitted_dataset_with_eval()&lt;/code&gt; function in &lt;code&gt;custom_dataset.py&lt;/code&gt;:&lt;br&gt;
&lt;a href="https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/custom_dataset.py#L75-L110" rel="noopener noreferrer"&gt;https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/custom_dataset.py#L75-L110&lt;/a&gt;&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Toward understanding DNN (deep neural network) well: California housing dataset</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Sun, 06 Feb 2022 12:57:31 +0000</pubDate>
      <link>https://dev.to/ksk0629/toward-understanding-dnn-deep-neural-network-well-california-housing-dataset-3jp3</link>
      <guid>https://dev.to/ksk0629/toward-understanding-dnn-deep-neural-network-well-california-housing-dataset-3jp3</guid>
      <description>&lt;p&gt;(Updated on 12, 2, 2022)&lt;/p&gt;

&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;This article is about machine learning and for beginners like me. I am always not sure how I decide the numbers of layers and units of each layer when I build a model. In this article, I will explore the effect of them those with California housing dataset.&lt;/p&gt;

&lt;p&gt;All of the following codes are in my git repository. You can perform the following each experiment by cloning and then running the notebook.&lt;/p&gt;

&lt;p&gt;github repository: &lt;a href="https://github.com/ksk0629/comparison_of_dnn.git" rel="noopener noreferrer"&gt;comparison_of_dnn&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Note that, this is not a "guide", this is a memo from a beginner to beginners. If you have any comment, suggestion, question, etc. whilst reading this article, please let me know in the comments below.&lt;/p&gt;

&lt;h1&gt;
  
  
  &lt;a href="https://inria.github.io/scikit-learn-mooc/python_scripts/datasets_california_housing.html" rel="noopener noreferrer"&gt;California housing dataset&lt;/a&gt;
&lt;/h1&gt;

&lt;p&gt;California housing dataset is for regression. It has eight features and one target value. We can get the dataset using &lt;code&gt;sklearn.datasets.fetch_california_housing()&lt;/code&gt; function. The eight features are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MedInc: median income in block group&lt;/li&gt;
&lt;li&gt;HouseAge: median house age in block group&lt;/li&gt;
&lt;li&gt;AveRooms: average number of rooms per household&lt;/li&gt;
&lt;li&gt;AveBedrms: average number of bedrooms per household&lt;/li&gt;
&lt;li&gt;Population: block group population&lt;/li&gt;
&lt;li&gt;AveOccup: average number of household members&lt;/li&gt;
&lt;li&gt;Latitude: block group latitude&lt;/li&gt;
&lt;li&gt;Longitude: block group longitud&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The one target value is as follows:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MedHouseVal: median house value for California districts,
expressed in hundreds of thousands of dollars&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;As I said, it is for regression, so I will build a model whose the inputs are those features and the output is the target value.&lt;/p&gt;

&lt;p&gt;I will not analyze this dataset carefully, but do just a little bit using &lt;code&gt;pandas.DataFrame&lt;/code&gt; class' methods.&lt;/p&gt;

&lt;p&gt;Let's see the information to check if there are missing values.&lt;br&gt;
Input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;sys&lt;/span&gt;
&lt;span class="n"&gt;sys&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;append&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;./src&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;src.utils&lt;/span&gt;

&lt;span class="c1"&gt;# Load dataset
&lt;/span&gt;&lt;span class="n"&gt;callifornia_df&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utils&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;load_california_housing&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;span class="n"&gt;callifornia_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;info&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;&amp;lt;class 'pandas.core.frame.DataFrame'&amp;gt;
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   MedInc       20640 non-null  float64
 1   HouseAge     20640 non-null  float64
 2   AveRooms     20640 non-null  float64
 3   AveBedrms    20640 non-null  float64
 4   Population   20640 non-null  float64
 5   AveOccup     20640 non-null  float64
 6   Latitude     20640 non-null  float64
 7   Longitude    20640 non-null  float64
 8   MedHouseVal  20640 non-null  float64
dtypes: float64(9)
memory usage: 1.4 MB
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There are no missing values. Then, let's see some statistics.&lt;br&gt;
Input:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;callifornia_df&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;describe&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;drop&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;count&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Output:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;         MedInc   HouseAge    AveRooms  AveBedrms    Population \
mean   3.870671  28.639486    5.429000   1.096675   1425.476744
std    1.899822  12.585558    2.474173   0.473911   1132.462122   
min    0.499900   1.000000    0.846154   0.333333      3.000000
25%    2.563400  18.000000    4.440716   1.006079    787.000000
50%    3.534800  29.000000    5.229129   1.048780   1166.000000
75%    4.743250  37.000000    6.052381   1.099526   1725.000000
max   15.000100  52.000000  141.909091  34.066667  35682.000000

        AveOccup   Latitude   Longitude  MedHouseVal
mean    3.070655  35.631861 -119.569704     2.068558
std    10.386050   2.135952    2.003532     1.153956
min     0.692308  32.540000 -124.350000     0.149990
25%     2.429741  33.930000 -121.800000     1.196000
50%     2.818116  34.260000 -118.490000     1.797000
75%     3.282261  37.710000 -118.010000     2.647250
max  1243.333333  41.950000 -114.310000     5.000010
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h1&gt;
  
  
  Comparison
&lt;/h1&gt;

&lt;p&gt;For the sake of simplicity, I suppose the following conditions.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;A model is fixed all conditions except for the number of layers and the numbers of units of each layer.&lt;/li&gt;
&lt;li&gt;Any data preprocessing is not performed.&lt;/li&gt;
&lt;li&gt;Seed is fixed.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;So, the following discussion is under those conditions, but if you want to change those conditions or remove those conditions, it will be done soon. Most of them, all of you have to do is changing &lt;code&gt;config_california.yaml&lt;/code&gt;. It has the following statements.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;mlflow:
  experiment_name: california
  run_name: default
dataset:
  eval_size: 0.25
  test_size: 0.25
  train_size: 0.75
  shuffle: True
dnn:
  n_layers: 3
  n_units_list:
    - 8
    - 4
    - 1
  activation_function_list:
    - relu
    - relu
    - linear
  seed: 57
dnn_train:
  epochs: 30
  batch_size: 4
  patience: 5
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You only have to change &lt;code&gt;57&lt;/code&gt; in seed block to other integer or &lt;code&gt;None&lt;/code&gt; if you perform an experiment under other fixed seed or without fixing seed, respectively. &lt;/p&gt;

&lt;h2&gt;
  
  
  result
&lt;/h2&gt;

&lt;p&gt;The loss summary is as follows.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;#layers&lt;/th&gt;
&lt;th&gt;training loss&lt;/th&gt;
&lt;th&gt;evaluation loss&lt;/th&gt;
&lt;th&gt;test loss&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;three (tiny #units)&lt;/td&gt;
&lt;td&gt;0.616&lt;/td&gt;
&lt;td&gt;0.565&lt;/td&gt;
&lt;td&gt;0.596&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;four&lt;/td&gt;
&lt;td&gt;0.54&lt;/td&gt;
&lt;td&gt;0.506&lt;/td&gt;
&lt;td&gt;0.53&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;five&lt;/td&gt;
&lt;td&gt;0.543&lt;/td&gt;
&lt;td&gt;1.126&lt;/td&gt;
&lt;td&gt;1.2&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;six&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;0.515&lt;/td&gt;
&lt;td&gt;0.49&lt;/td&gt;
&lt;td&gt;0.512&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;seven&lt;/td&gt;
&lt;td&gt;1.31&lt;/td&gt;
&lt;td&gt;1.335&lt;/td&gt;
&lt;td&gt;1.377&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;three (many #units)&lt;/td&gt;
&lt;td&gt;0.537&lt;/td&gt;
&lt;td&gt;0.515&lt;/td&gt;
&lt;td&gt;0.555&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The best model is one that has six layers in the sense of the test loss (actually, in the sense of all of losses). The test losses of  the five and the seven ones are near, but the training results are not near at all. In the training of seventh one, vanishing gradient problem occurs. This is probably occurred due to the deep depth.&lt;/p&gt;

&lt;p&gt;I will plot the predicted values and the true target values. It tells us the fourth one is also the best in the sense of each prediction.&lt;/p&gt;

&lt;p&gt;The models from the first to fifth have different numbers of the number of units. The last model whose three layers and the number of units are similar to the one of the fourth model was built to compare the effect of the number of layers and the one of the number of units. As the result, the fourth model is more better than another and it implies the depth of layers is more important that the number of units at least for California dataset. Obviously, the difference between the maximum true target value and the mean one is greater than the difference between the minimum one and mean one in California dataset. The depth of layers probably is effective in tolerance to outliers.&lt;/p&gt;

&lt;p&gt;Further, by comparing the first model and the last model, the effect of the number of nodes is appeared. Obviously, the number of nodes contributes to fit better.&lt;/p&gt;

&lt;p&gt;See the following sections for more information.&lt;/p&gt;

&lt;h2&gt;
  
  
  tiny three layers (two hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_3 (Dense)             (None, 8)                 72        

 dense_4 (Dense)             (None, 4)                 36        

 dense_5 (Dense)             (None, 1)                 5         

=================================================================
Total params: 113
Trainable params: 113
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.616&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.565&lt;/li&gt;
&lt;li&gt;test loss: 0.596&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F636snwnabv19sxxdsv3a.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F636snwnabv19sxxdsv3a.png" alt=" " width="755" height="520"&gt;&lt;/a&gt; &lt;br&gt;
The green line represents the predicted target values and the red line does the true target values.&lt;/p&gt;

&lt;p&gt;The invisible lower limit line is there and some predicted values are greater than the maximum true value. It probably implies that the model is underfitting the data.&lt;/p&gt;
&lt;h2&gt;
  
  
  four layers (three hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 16)                144       

 dense_11 (Dense)            (None, 8)                 136       

 dense_12 (Dense)            (None, 4)                 36        

 dense_13 (Dense)            (None, 1)                 5         

=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.54&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.506&lt;/li&gt;
&lt;li&gt;test loss: 0.53&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qx4njjx76mp9iu9q4rg.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4qx4njjx76mp9iu9q4rg.png" alt=" " width="755" height="517"&gt;&lt;/a&gt;&lt;br&gt;
The green line represents the predicted target values and the red line does the true target values. Note that, it is drown the first 500 values for good visibility.&lt;/p&gt;

&lt;p&gt;The invisible line is still there. The number of the predicted values that are greater than the maximum true value is less than the predicted values from the three layers model, but there are still some predicted values that are greater than the maximum true value.&lt;/p&gt;
&lt;h2&gt;
  
  
  five layers (four hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_19 (Dense)            (None, 32)                288       

 dense_20 (Dense)            (None, 16)                528       

 dense_21 (Dense)            (None, 8)                 136       

 dense_22 (Dense)            (None, 4)                 36        

 dense_23 (Dense)            (None, 1)                 5         

=================================================================
Total params: 993
Trainable params: 993
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.543&lt;/li&gt;
&lt;li&gt;evaluation loss: 1.126&lt;/li&gt;
&lt;li&gt;test loss: 1.2&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Founn3ryreszkarjcrolh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Founn3ryreszkarjcrolh.png" alt=" " width="755" height="517"&gt;&lt;/a&gt; &lt;br&gt;
The green line represents the predicted target values and the red line does the true target values.&lt;/p&gt;

&lt;p&gt;The invisible line is still there. The number of high predicted values is less than before. Whilst that, some of the predicted values are smaller than before. It might be overfitting.&lt;/p&gt;
&lt;h2&gt;
  
  
  six layers (five hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_30 (Dense)            (None, 64)                576       

 dense_31 (Dense)            (None, 32)                2080      

 dense_32 (Dense)            (None, 16)                528       

 dense_33 (Dense)            (None, 8)                 136       

 dense_34 (Dense)            (None, 4)                 36        

 dense_35 (Dense)            (None, 1)                 5         

=================================================================
Total params: 3,361
Trainable params: 3,361
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.515&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.49&lt;/li&gt;
&lt;li&gt;test loss: 0.512&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvjocd02xnca0ix4m0uh.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffvjocd02xnca0ix4m0uh.png" alt=" " width="755" height="517"&gt;&lt;/a&gt;&lt;br&gt;
The green line represents the predicted target values and the red line does the true target values.&lt;/p&gt;

&lt;p&gt;The invisible horizontal line is collapsed a bit. The line is still there, but the horizontality is certainly less than before. Also the number of high predicted values are less than before.&lt;/p&gt;
&lt;h2&gt;
  
  
  seven layers (six hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_43 (Dense)            (None, 128)               1152      

 dense_44 (Dense)            (None, 64)                8256      

 dense_45 (Dense)            (None, 32)                2080      

 dense_46 (Dense)            (None, 16)                528       

 dense_47 (Dense)            (None, 8)                 136       

 dense_48 (Dense)            (None, 4)                 36        

 dense_49 (Dense)            (None, 1)                 5         

=================================================================
Total params: 12,193
Trainable params: 12,193
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 1.31&lt;/li&gt;
&lt;li&gt;evaluation loss: 1.335&lt;/li&gt;
&lt;li&gt;test loss: 1.377&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmp1mzdnelkl6wrt8ytz.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fnmp1mzdnelkl6wrt8ytz.png" alt=" " width="755" height="517"&gt;&lt;/a&gt; &lt;br&gt;
The green line represents the predicted target values and the red line does the true target values.&lt;/p&gt;

&lt;p&gt;The model outputted constant for all of inputs. Vanishing gradient probably occurred. Actually the training loss converged soon:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcp1r1fivrecf38avdpb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fzcp1r1fivrecf38avdpb.png" alt=" " width="800" height="221"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  three layers with almost 3300 units (two hidden layers plus one output layer)
&lt;/h2&gt;

&lt;p&gt;This structure is as follows.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_67 (Dense)            (None, 72)                648       

 dense_68 (Dense)            (None, 36)                2628      

 dense_69 (Dense)            (None, 1)                 37        

=================================================================
Total params: 3,313
Trainable params: 3,313
Non-trainable params: 0
_________________________________________________________________
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The final losses are as follows.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;training loss: 0.537&lt;/li&gt;
&lt;li&gt;evaluation loss: 0.515&lt;/li&gt;
&lt;li&gt;test loss: 0.555&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The prediction results are as follows.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxfd531be8f0e9he31ff.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdxfd531be8f0e9he31ff.png" alt=" " width="755" height="517"&gt;&lt;/a&gt; &lt;br&gt;
Same to six layers model, the horizontality of the invisible line is less than others. Whilst that, the number of the high predicted values are obviously greater than one of six layers model.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;I explored effect of the number of layers and nodes with California dataset. As the result, I found that the bigger number of layers and the bigger number of units are effective to fit better, but the huge number of layers causes vanishing gradient problem.&lt;/p&gt;

&lt;p&gt;It is excellent to write codes and perform experiments by myself, but unfortunately, I am still not sure how I decide the numbers. I will explore the effect with other datasets again.&lt;/p&gt;

&lt;p&gt;Again, I would so appreciate you if you give me comment, suggestion, question, etc.&lt;/p&gt;

</description>
      <category>machinelearning</category>
      <category>python</category>
      <category>deeplearning</category>
    </item>
    <item>
      <title>Using MLflow on google colaboratory with github to build cosy environment: building</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Fri, 28 Jan 2022 11:43:54 +0000</pubDate>
      <link>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-github-to-build-cosy-environment-building-jb5</link>
      <guid>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-github-to-build-cosy-environment-building-jb5</guid>
      <description>&lt;p&gt;(Updated on 19, March 2022)&lt;br&gt;
(Updated on 6, February 2022)&lt;br&gt;
(Updated on 30, January 2022)&lt;/p&gt;
&lt;h1&gt;
  
  
  Introduction
&lt;/h1&gt;

&lt;p&gt;I built my first cosy environment. The following is how I build it. &lt;/p&gt;

&lt;p&gt;github repository: &lt;a href="https://github.com/ksk0629/template_with_mlflow" rel="noopener noreferrer"&gt;template_with_mlflow&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Preparation
&lt;/h1&gt;

&lt;p&gt;Here from, I suppose you've got accounts of google, ngrok, and github. If you haven't, please create them before starting to read the following.&lt;/p&gt;

&lt;p&gt;You have to upload yaml file &lt;code&gt;general_config.yaml&lt;/code&gt; including github and ngrok information like the following image.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d5e4sa1ydmx403u1e6n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4d5e4sa1ydmx403u1e6n.png" alt=" " width="289" height="157"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;It is written like the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;github&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;username&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_username&lt;/span&gt;
  &lt;span class="na"&gt;email&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_email@gmail.com&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;your_personal_access_token&lt;/span&gt;
&lt;span class="na"&gt;ngrok&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;token&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;ngrok_authentication_token&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you haven't got any personal access token, you have to create it by following [&lt;a href="https://docs.github.com/en/authentication/keeping-your-account-and-data-secure/creating-a-personal-access-token" rel="noopener noreferrer"&gt;Creating a personal access token&lt;/a&gt;]. You can find another token on your ngrok top page:&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaz5jsegdj1ae8iie2zx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fiaz5jsegdj1ae8iie2zx.png" alt=" " width="789" height="175"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1&gt;
  
  
  Process
&lt;/h1&gt;

&lt;p&gt;I'll show how I built my cosy environment.&lt;/p&gt;

&lt;p&gt;1: Create a new &lt;a href="https://colab.research.google.com/?utm_source=scs-index" rel="noopener noreferrer"&gt;google colaboratory notebook&lt;/a&gt;&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof921z7oi9odxpvfpvuo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fof921z7oi9odxpvfpvuo.png" alt=" " width="800" height="552"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;2: Install and import mlflow and pyngrok to visualize your model information running the following codes on the google colaboratory notebook.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;mlflow&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;pip&lt;/span&gt; &lt;span class="n"&gt;install&lt;/span&gt; &lt;span class="n"&gt;pyngrok&lt;/span&gt;

&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;pyngrok&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;ngrok&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;3: Set your information running the following codes.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Mount my google drive
&lt;/span&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;google.colab&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;drive&lt;/span&gt;
&lt;span class="n"&gt;drive_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/content/gdrive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;drive&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;mount&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drive_path&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="c1"&gt;# Load the general config
&lt;/span&gt;&lt;span class="n"&gt;config_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;drive_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MyDrive&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;config&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;general_config.yaml&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;r&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;yml&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
  &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;yaml&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;safe_load&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;yml&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;config_github&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;github&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="n"&gt;config_ngrok&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;ngrok&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# Set git configs
&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;email&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;email&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;config&lt;/span&gt; &lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="k"&gt;global&lt;/span&gt; &lt;span class="n"&gt;user&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="c1"&gt;# Clone the repository
&lt;/span&gt;&lt;span class="n"&gt;repository_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;template_with_mlflow&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;git_repository&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://github.com/ksk0629/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;repository_name&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;.git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;repository_path&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;/content/&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;repository_name&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;clone&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;git_repository&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Change the current directory to the cloned directory
&lt;/span&gt;&lt;span class="o"&gt;%&lt;/span&gt;&lt;span class="n"&gt;cd&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repository_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Checkout branch
&lt;/span&gt;&lt;span class="n"&gt;branch_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;main&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;checkout&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;branch_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Pull
&lt;/span&gt;&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;pull&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can replace &lt;code&gt;"template_with_mlflow"&lt;/code&gt; with a repository name you want to clone.&lt;/p&gt;

&lt;p&gt;4: Train your model containing MLflow codes like the following.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;experiment_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mnist with cnn&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;run_name&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;first run&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="n"&gt;validation_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.2&lt;/span&gt;
&lt;span class="n"&gt;epochs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;1000&lt;/span&gt;
&lt;span class="n"&gt;batch_size&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;2048&lt;/span&gt;
&lt;span class="n"&gt;n_features&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;784&lt;/span&gt;
&lt;span class="n"&gt;n_hidden&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;100&lt;/span&gt;
&lt;span class="n"&gt;learning_rate&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mf"&gt;0.01&lt;/span&gt;
&lt;span class="n"&gt;seed&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;57&lt;/span&gt;

&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;python&lt;/span&gt; &lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;src&lt;/span&gt;&lt;span class="o"&gt;/&lt;/span&gt;&lt;span class="n"&gt;mlflow_example&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;py&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{experiment_name}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{run_name}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;seed&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;validation_size&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n_hidden&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;n_features&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;epochs&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;batch_size&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;learning_rate&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;experiment_name = "mnist with cnn"
run_name = "second run"
validation_size = 0.2
epochs = 1000
batch_size = 2048
n_features = 784
n_hidden = 300
learning_rate = 0.01
seed = 57

!python ./src/mlflow_example.py "{experiment_name}" "{run_name}" {seed} {validation_size} {n_hidden} {n_features} {epochs} {batch_size} {learning_rate}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can train models whatever you want.&lt;/p&gt;

&lt;p&gt;5: Run MLflow and see your models' information through ngrok.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Run mlflow
&lt;/span&gt;&lt;span class="nf"&gt;get_ipython&lt;/span&gt;&lt;span class="p"&gt;().&lt;/span&gt;&lt;span class="nf"&gt;system_raw&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlflow ui --port 5000 &amp;amp;&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="c1"&gt;# run tracking UI in the background
&lt;/span&gt;
&lt;span class="c1"&gt;# Terminate open tunnels if exist
&lt;/span&gt;&lt;span class="n"&gt;ngrok&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;kill&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;

&lt;span class="c1"&gt;# Setting the authtoken of ngrok
&lt;/span&gt;&lt;span class="n"&gt;ngrok&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;set_auth_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;config_ngrok&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="c1"&gt;# Open an HTTPs tunnel on port 5000 for http://localhost:5000
&lt;/span&gt;&lt;span class="n"&gt;ngrok_tunnel&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;ngrok&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;connect&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;addr&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;5000&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;proto&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;http&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;bind_tls&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;print&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;MLflow Tracking UI:&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;ngrok_tunnel&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;public_url&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You would get a global IP on the output cell:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;MLflow Tracking UI: https://cexx-xx-xxx-xxx-xx.ngrok.io
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;You can see your models' information on the page like the following.&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9lbcyozvsohf6kenxvo.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fm9lbcyozvsohf6kenxvo.png" alt=" " width="800" height="285"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;6: Commit and push your changes to the remote repository.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;add_objects&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;os&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;path&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;join&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;repository_path&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mlruns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;*&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;add_objects&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;commit_msg&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Add new mlruns&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;commit&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;m&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;{commit_msg}&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;

&lt;span class="n"&gt;html&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="sa"&gt;f&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;https://&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;token&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;@github.com/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;config_github&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="s"&gt;username&lt;/span&gt;&lt;span class="sh"&gt;'&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;/&lt;/span&gt;&lt;span class="si"&gt;{&lt;/span&gt;&lt;span class="n"&gt;repository_name&lt;/span&gt;&lt;span class="si"&gt;}&lt;/span&gt;&lt;span class="s"&gt;.git&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;remote&lt;/span&gt; &lt;span class="nb"&gt;set&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="n"&gt;url&lt;/span&gt; &lt;span class="n"&gt;origin&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;html&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="err"&gt;!&lt;/span&gt;&lt;span class="n"&gt;git&lt;/span&gt; &lt;span class="n"&gt;push&lt;/span&gt; &lt;span class="n"&gt;origin&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="n"&gt;branch_name&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Of course, you can choose files you commit and change the commit message to whatever you want to.&lt;/p&gt;

&lt;h1&gt;
  
  
  Conclusion
&lt;/h1&gt;

&lt;p&gt;You've already got your cosy environment! By the way, I said&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;I have to add the commit number information to MLflow information after pushing new source codes on a remote repository.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;but, apparently, MLflow was so smart. I didn't do anything, but the git commit number was already written!&lt;br&gt;
&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4p2f2e9ctqcrbw0fzdbd.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4p2f2e9ctqcrbw0fzdbd.png" alt=" " width="800" height="94"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>mlflow</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>mlops</category>
    </item>
    <item>
      <title>Using MLflow on google colaboratory with github to build cosy environment: design</title>
      <dc:creator>Keisuke Sato</dc:creator>
      <pubDate>Tue, 25 Jan 2022 12:11:21 +0000</pubDate>
      <link>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-gitlab-to-build-cosy-environment-1-design-2217</link>
      <guid>https://dev.to/ksk0629/using-mlflow-on-google-colaboratory-with-gitlab-to-build-cosy-environment-1-design-2217</guid>
      <description>&lt;p&gt;(Updated on 19, March 2022)&lt;br&gt;
(Updated on 6, February 2022)&lt;br&gt;
(Updated on 28, January 2022)&lt;/p&gt;

&lt;h1&gt;
  
  
  What do I want to do?
&lt;/h1&gt;

&lt;p&gt;It's so troublesome to manage all settings (e.g., epochs, optimizer, etc) for building a certain model and I'd be interested in Kaggle these days. Before I join the competitions, I build a cosy environment.&lt;/p&gt;

&lt;h1&gt;
  
  
  Pieces of my cosy environment
&lt;/h1&gt;

&lt;p&gt;I'll use the following one library and four services.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;MLflow Tracking&lt;/li&gt;
&lt;li&gt;Google Colaboratory&lt;/li&gt;
&lt;li&gt;Google Drive&lt;/li&gt;
&lt;li&gt;Github&lt;/li&gt;
&lt;li&gt;ngrok&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;MLflow and google drive are for managing settings for building models, google colaboratory is for building models, github is for managing the source codes, and the result of mlflow is seen through ngrok.&lt;/p&gt;

&lt;h1&gt;
  
  
  Concept
&lt;/h1&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7q88ntoj8cf0q8yxa82.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fq7q88ntoj8cf0q8yxa82.png" alt=" " width="800" height="232"&gt;&lt;/a&gt; &lt;/p&gt;

&lt;p&gt;Apparently, MLflow Projects can connect to gitlab and manage the source codes, but at first, I just directly clone a repository and push new changes from google colaboratory notebook. Because of that, I have to add the commit number information to MLflow information after pushing new source codes on a remote repository.&lt;/p&gt;

</description>
      <category>mlflow</category>
      <category>machinelearning</category>
      <category>python</category>
      <category>mlops</category>
    </item>
  </channel>
</rss>
