DEV Community: Keisuke Sato

Qiskit Aer vs MQT DDSIM

Keisuke Sato — Tue, 08 Apr 2025 02:04:09 +0000

Introduction

Hi there, this article is about qiskit_aer and mqt.ddsim, both Python libraries for quantum computing to simulate quantum circuits on one's laptop. qiskit_aer is known as high-performance quantum computing simulators with realistic noise models [Qiskit Aer documentation]. Meanwhile, mqt.ddsim is a graph-based simulator, which reported that its efficiency had outperformed some of the existing simulators in a proposal paper, Advanced Simulation of Quantum Computations (the preprint is also available).

By the way, as you already know or will find easily enough, both simulators have several modes. In this article, I only focus on their Quantum Assembly Language (QASM) simulators.

I was wondering which one was faster. Thus, I performed a quick experiment and noted down the result here.

Some of these parts are still a little ambitious for me. Please feel free to comment if you find any mistakes or places for suggested improvements.

Experimental Result

This section is for the people who would like to know the result first. Life is busy and the less time it takes to do things, the more time we have for more things to do, especially for things like this kind of random article.

In my limited experimental settings, mqt.ddsim significantly outperformed qiskit_aer. The following graphs show you how much faster mqt.ddsim was.

(Left) The figure of the average success probabilities. Qiskit Aer does not work with 31 qubits on my laptop whose specs will be noted later. (Right) The figure of the average execution times when the number of qubits is increased. From 25 qubits, the execution times by Qiskit Aer significantly increase.

(Left) The figure of the average success probabilities. (Right) The figure of the average execution times when the number of adders is increased. The times by Qiskit Aer linearly grow up.

The specs of my laptop are as follows.

OS: MacBook Air
Chip: M1
Memory: 16 GB

I conducted the first experiment, which shows the time vs the number of qubits, with a quantum circuit having only one adder based on quantum Fourier transformation. For the second experiment, which shows the time vs the number of adders, I fixed the number of qubits as 21 since 21 is the largest number of qubits with which qiskit_aer works relatively fast.

Also, I employed an adder for this experiment because adder is one of the fundamental blocks for quantum arithmetics. One can construct other arithmetics such as multiplication and exponentiation (e.g., [Circuit for Shor's algorithm using 2n+3 qubits (the preprint)]).

It is worth noting that mqt.ddsim is known for taking significant time if the circuit is the absolute worst case as reported in the original paper.

Quantum Computing

Hereafter, I’ll write some introductions to quantum computing and show the result again with a little bit more of an in-depth explanation.

First, we need to know what quantum computing is and the answer is computing with a quantum computer. That’s pretty easy, isn’t it? Then, one probably wonders what a quantum computer is. It is a computer based on quantum mechanics. The difference between classical and quantum computers is what they are based on. Whilst quantum computers work in a quantum manner, classical computers, which are computers we use every day, work based on classical physics. Unfortunately, my knowledge is limited here, so I cannot precisely explain the difference between them, but the important thing is that quantum mechanics have strange yet powerful phenomena to promise that quantum computing is faster than classical computing for certain fields and problems. One should be aware of this. Not everything works better with quantum computers, but some things work better with quantum computers and one finds significant improvements in them.

However, our actual quantum computers are extremely limited compared to the theoretical results. Our current classical computers have a large number of bits, which are the fundamental units to store every data, and the error correction ability. That’s why we comfortably use our laptops for everyday use. The current quantum computers have a small number of qubits, which are the quantum version of bit, and small, or as far as I know, no error correction. That’s why it is quite difficult to conduct physical experiments involving a large number of qubits at this era called Noisy Intermediate-Scale Quantum (NISQ). We all are waiting for the era of Fault-Tolerant Quantum Computing (FTQC), which allows us to use more qubits with error correction techniques.

Although we are in the era of NISQ, there are some things we can do with our laptops. One of these is simulation. Simulating quantum computers, more precisely quantum circuits, is one of the ways to see how proposed quantum algorithms work. But, as you might expect, simulating is not efficient at all. As one qubit is represented by a two-dimensional vector, to take $n$ qubits into account, we need a $2^n$-dimensional vector, which goes up really easily. To make it more efficient to simulate it, several types of simulators have been proposed.

Qiskit

We saw what quantum computing is and why simulators are needed in the previous section. In this section, we’ll see then how we can actually write a program for working with quantum computers.

There are several ways to get on quantum computers. One of them is Qiskit. It is an open-source SDK for working with quantum computers. This is a Python library to write the programs. As we are in the NISQ era, it allows us to simulate the program on our laptops and also to work with IBM’s actual quantum computers. The latter one is super cool. However, actual quantum computers are not the main topic here. There are several ways to simulate the programs on your laptop. Qiskit itself has some of them. Also, there is Qiskit Aer, which is still in the Qiskit project but is separately installed from Qiskit itself now. Qiskit Aer is provided as high-performance quantum computing simulators with realistic noise models. As another option, I just found MQT DDSIM. MQT DDSIM is known as a graph-based simulator and it has experimentally shown that it outperformed other existing simulators in the original paper. It is noteworthy that their theoretical worst case is significantly worse than some of the other existing methods, which are called array-based. But, anyway, they showed that within their experiments, their proposal was the best.

So, I am wondering which one is better, Qiskit Aer or MQT DDSIM. I have got to admit that I could not find what the Qiskit Aer is based on. So, technically, it could be one of the methods that the DDSIM’s authors had compared with their method. But, well, it has also been a while since the paper came out and Qiskit Aer is constantly maintained. So, I believe it was worth it to conduct some experiments.

Results

I conducted the following experiments with my laptop whose specs are

OS: MacBook Air
Chip: M1
Memory: 16 GB

As I mentioned in one of the previous sections, there are several types of simulators in qiskit_aer and mqt.ddsim. However, I compared only qiskit_aer.AerSimulator and ddsim.qasmsimulator.QasmSimulatorBackend.

Execution time vs the number of qubits

In this experiment, I investigated the relation between the execution time and the number of qubits. For that, I employed qiskit.circuit.library.DraperQFTAdder. Quantum adders are one of the fundamental blocks for other arithmetic circuits and arithmetics are one of the fundamental sub-routines of some quantum algorithms. Thus, I believe that it is not the worst choice. qiskit.circuit.library.DraperQFTAdder takes the number of state qubits, which is the number of bits in each input register, as an argument. We observe the execution time whilst varying the number of state qubits. Also, I set “half” in kind argument. This setting adds another qubit to the circuit to prevent it from overflowing. Hence the number of qubits included in the circuit is $2n+1$, where $n$ is the number of state qubits. The experiments for each number of qubits were performed 10 times and are represented by the average in the graphs below.

The result of this experiment is as follows.

The left graph shows that both AerSimulator and QasmSimulatorBackend correctly added numbers in every number of qubits I attempted. However, because of the limited memory of my laptop, AerSimulator wasn’t able to simulate the circuit whilst QasmSimulatorBackend was. The right graph shows that the QasmSimulatorBackend constantly simulated the adder very fast whilst the execution times of AerSimulator shot up. As AerSimulator didn’t work with 31 qubits, the last blue point represents the execution time, around 250 secs, with $29 (= 2 * 14 + 1)$.

The only adder is not much in terms of how many gates are used in one circuit, yet AerSimulator didn’t manage to handle this.

Execution time vs the number of adders

In this experiment, I investigated the relation between the execution time and the number of adders with a fixed number of qubits, which is 21. I employed 21 from the previous result; it is relatively large yet the execution times of them are still quite close at least on the graph. As an adder, qiskit.circuit.library.DraperQFTAdder is again employed. This time, I applied this adder multiple times and saw how much time the two simulators took. The experiments for each number of qubits were performed 10 times and are represented by the average in the graphs below.

The result is as follows.

Again, even if multiple adders were applied, the left graph shows that they both correctly simulated. The right graph shows that AerSimulator took linear time to execute the circuit when the number of adders was increased. Meanwhile, QasmSimulatorBackend took (at least from the graph) almost constant time.

Additional experiments on DDSIM

The graphs above show that QasmSimulatorBackend works in almost constant time. I wondered if the experimental settings allow the simulator to work in exactly constant time in theory. Thus, I conducted additional two experiments for both settings. I performed the same experiments on QasmSimulatorBackend but with additional cases.

As the graphs show, both settings can be difficult to run on a personal laptop. Specifically, it took more than a day to finish to perform the experiment with 101 qubits settings so I had to cancel the experiment and be compromised with 85 qubits.

Conclusion and a little bit more

I conducted easy experiments with adders suggested by Draper on my personal Mac M1 laptop with 16GB memory to see the speed difference between QASM simulators of qiskit_aer and mqt.ddsim. As a result, mqt.ddsim simulated the circuits much faster than qiskit_aer. Moreover, I performed an additional experiment with only mqt.ddsim. The additional experiment is to

Verify the settings are not too good for mqt.ddsim since the previous experiments looked like mqt.ddsim simulated every circuit in the settings in almost constant time and I suspected that the settings might allow it to work in constant time.
See the limitation if the settings do not allow it to work in constant time.

As I wrote this several times, all the experiments are quite limited in terms of the circuits that I employed. Hence, one cannot determine that mqt.ddsim is absolutely better than qiskit_aer from the experiment. However, the result gives us an idea that at least there are cases where mqt.ddsim is significantly faster.

For future investigation, it'd be nice to perform experiments with richer circuits, for instance, more complex arithmetics as well as multiple rotation gates. Rotation gates are often used in the field of quantum machine learning, especially in Variational Quantum Classifiers (VQCs). Most of those models were invented to work with the NISQ era. Thus, it is relatively easy to perform experiments, yet because of the number of data, it's not the easiest to access actual quantum computers. Therefore, simulating those quantum machine learning algorithms is still one of the main ways to measure performance.

All the experiments and everything has be done with jupyter notebook and the notebook is in https://github.com/ksk0629/backend_comparison.

By the way, I met MQT DDSIM when I was reading Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization and checking their implementations. This is completely on a different topic here, yet the paper was really interesting.

Quantum-Classical Machine Learning: Quanvolutional Neural Network

Keisuke Sato — Thu, 26 Sep 2024 22:42:43 +0000

This post is about the paper Quanvolutional Neural Networks: Powering Image Recognition with Quantum Circuits and my implementation based on it. I have written this post based on my understanding. I would greatly appreciate your comments and any suggestions for modifying this post or my implementation in the GitHub repository. If you are interested, please refer to the paper yourself.

Introduction

Quantum machine learning (QML) has rapidly grown in response to advancements in quantum computing research. Research in QML explores various directions, one of which involves rewriting classical methods or architectures in a quantum manner.

The Quanvolutional Neural Network (QNN) is analogous to the classical Convolutional Neural Network (CNN). As the name suggests, the most distinctive feature of a CNN is the convolutional layer. In the case of the QNN, it is a hybrid quantum-classical machine learning model where the convolutional layer is replaced by a quanvolutional layer. The authors proposed the concept of the quanvolutional layer, which has exactly the same parameters as the convolutional layer. In theory, any convolutional layer in any CNN can be replaced with a quanvolutional layer.

The primary contribution of the original paper is the concept of the Quanvolutional Neural Network, rather than its practical realisation. Nevertheless, I will also discuss the implementation proposed in the paper.

For those who love coding, I have shared my GitHub repository containing the QNN implementation here: https://github.com/ksk0629/quanvolutional_neural_network.

Convolutional Layer

The convolutional layer allows us to exploit spatial locality and translational invariance in data. Let us have a look at how a convolutional layer works. Consider the following data, which has a 4x4 shape:

The convolutional layer uses a sliding window of a certain size. Here, we will use the simplest sliding window, which is 3x3:

The layer applies the sliding window to each local region, i.e., it exploits spatial locality. When the sliding window is applied to the first section, as shown below, we calculate a new value: 1x1 + 1x0 + 1x0 + 1x0 + 1x1 + 1x1 + 1x0 + 1x0 + 1x1 = 4.

Now, imagine that each element of the original data is shifted one pixel to the right. There are various ways to fill the left edge, but here we simply use 0:

As you may notice, the second section is identical to the first section of the original data:

This demonstrates how the layer exploits translational invariance in data. The convolutional layer uses many convolutional filters, which are sliding windows. Each filter is different, and these filters are applied to the input data independently. Suppose the layer has 50 filters. The output of the layer would then consist of 50 new data sets.

Quanvolutional Layer

The QNN features the quanvolutional layer, which is analogous to the convolutional layer. The quanvolutional layer consists of many quantum circuits, referred to as quanvolutional filters, which are analogous to sliding windows. If the window size is 3x3, the number of qubits will be 3x3 = 9. In the earlier example of the convolutional filter, we simply multiplied each entry in the window by the corresponding entry in the data and summed the results. The quanvolutional filter operates in a similar way: each entry of the window corresponds to a qubit in the quantum circuit. However, instead of performing multiplication and addition, the quanvolutional filter computes a new value based on the quantum gates applied to the circuit.

To process classical data—i.e., the data used by our current computers—we need to encode the data onto qubits. There are several encoding methods, and it is up to the user to choose which one to employ. After the data is encoded, the computation phase occurs, followed by the measurement of each qubit. Since quanvolutional layers and convolutional layers are interchangeable, the result of each quanvolutional filter computation must be a scalar. Therefore, the measurement outcome must be decoded, and again, there are several decoding methods available.

The following circuit represents an example of a quantum filter:

One Realisation of the QNN

I believe the most important contribution of the original paper is the introduction of the QNN concept. This concept provides a perfect analogy to CNN. However, the original paper also presents one realisation of the QNN. As it is meant to test whether there is a quantum advantage compared to CNN, the implementation is quite simple; the quantum part of the realisation is not even trainable, relying instead on the power of randomly initialised filters.

The QNN is constructed with the following layers in order:

Quanvolutional Layer
Convolutional Layer
Pooling Layer
Convolutional Layer
Pooling Layer
Fully Connected Layer
Dropout Layer
Fully Connected Layer

The next question regarding this realisation is how the quanvolutional layer is constructed, or more specifically, how the quanvolutional filter is constructed, as the quanvolutional layer is simply a layer containing multiple quanvolutional filters. Here is how it works: first, the kernel size is fixed at 3x3 = 9, meaning each filter has 9 qubits.

Encoding method

Each qubit is encoded corresponding to each input scalar. If the input scalar is greater than 0, it is encoded into the quantum state |1>; otherwise, it is encoded into |0>.

Note that the MNIST dataset is the target in the original paper. If you wish to use a different dataset, you might want to adjust the encoding method accordingly.

Construct the quantum circuit

The quantum circuit is constructed in the following four stages:

Assign connection probabilities

Assign a "connection probability" (ranging from 0 to 1) to each pair of qubits.

To account for the connections between one pixel and others, two-qubit gates should be applied to each pair of qubits. The connection probability is essential in determining whether a particular two-qubit gate will be applied to the pair later.

Select two-qubit gates

Based on the connection probability, one of the two-qubit gates—either the controlled-NOT, swap, square root swap or controlled-U gate—is selected for each pair of qubits. Note that the selected gate is not applied at this stage.

Select one-qubit gates

Select a random number in the range from 0 to 2n^2 = 2x3x3 = 18 of one-qubit gates from the gate set {X, Y, Z, U, P, T, H}. X, Y and Z are rotation gates around each axis. I interpret U as a generic single-qubit rotation gate. These gates have rotational parameters, which are also randomly chosen. P, T and H represent the phase, T, and Hadamard gates respectively. Again, the selected gates are not applied at this stage.

Apply the selected gates in random order

Shuffle the order of the selected gates and apply them in the shuffled order to the quantum circuit.

Decoding method

Measure each qubit and count the number of qubits that are measured in the |1> state.

Result of the comparison

As mentioned, the QNN and CNN are compared in the original paper. More precisely, both networks, along with a corresponding random model, are compared.

The authors reported that the QNN outperforms the CNN in terms of accuracy. However, the random model and the QNN are indistinguishable in terms of accuracy.

My Implementation

It is important to understand what the original paper presents. However, one of my main interests also lies in implementing the QNN. Therefore, I implemented both the QNN and CNN to compare their accuracies. Below, I will share parts of my implementation. I would greatly appreciate any feedback or comments related to my programming, either in the comments section below or via the GitHub repository issue.

I created three classes for the QNN, based on what I believe to be its core components

QuanvFilter (quanvolutional filter)
QuanvLayer (quanvolutional layer)
QuanvNN (quanvolutional neural network)

We will walk through the essential parts of each. Please note that I have omitted all docstrings, despite having written them, for the sake of brevity.

QuanvFilter

QuanvNN utilises member functions of QuanvLayer, and QuanvLayer relies on member functions of QuanvFilter. Therefore, we will first explore QuanvFilter, which is defined in quanv_filter.py.

Constructor

This class has only one argument for the constructor, which is the kernel size, equivalent to the window size in the "Convolutional Layer" section. As mentioned, the number of qubits is determined by the argument, as shown in the following piece of code:

    def __init__(self, kernel_size: tuple[int, int]):
        # Initialise the look-up table.
        self.lookup_table = None
        # Set the simulator.
        self.simulator = qiskit_aer.AerSimulator()

        # Get the number of qubits.
        self.num_qubits: int = kernel_size[0] * kernel_size[1]

The self.lookup_table is used to reduce execution time, but it is not essential to understanding this class, so we will ignore it for now.

In the __init__ function, we need to prepare the quanvolutional filter. A quantum circuit representing the quanvolutional filter must be built first, as each filter is a quantum circuit.

        # Step 0: Build a quantum circuit as a filter.
        self.__build_initial_circuit()

self.__build_initial_circuit is simply a function that builds a new quantum circuit and stores it as a member variable, self.circuit.

After building the new plain circuit, the initialisation follows the procedure mentioned in the "One Realisation of the QNN" section. Note that encoding and decoding are not done when the circuit is initialised.

        # Step 1: assign a connection probability between each qubit.
        self.connection_probabilities = {}
        self.__set_connection_probabilities()

Here, the connection probabilities are stored in the dict variable self.connection_probabilities. The keys are tuples representing pairs of qubit positions, and the values are the probabilities. Here is a simple example: consider a circuit with 3 qubits. After running self.__set_connection_probabilities(), self.connection_probabilities could look like this:

{(0, 1): 0.5, (0, 2): 0.1, (1, 2): 0.8}

Next, two-qubit gates are randomly selected.

        # Step 2: Select a two-qubit gate according to the connection probabilities.
        self.selected_gates = []
        self.__set_two_qubit_gate_set()
        self.__select_two_qubit_gates()

self.__set_two_qubit_gate_set() sets the set of the two-qubit gates as described in the paper to a member variable. After setting the two-qubit gate set, the gates are selected according to self.__set_connection_probabilities in the self.__select_two_qubit_gates() function. I chose a threshold of 0.5 to determine whether a two-qubit gate is selected.

### In __select_two_qubit_gates function ###
        for qubit_pair, connection_probability in self.connection_probabilities.items():
            if connection_probability <= 0.5:
                # Skip the pair.
                pass

If the connection probability is equal or greater than 0.5, a two-qubit gate is selected. If the selected gate requires parameters, they are chosen from a range between 0 and 6.28, for instance four_params = np.random.rand(4) * (2 * np.pi). Since each key in self.connection_probabilities is always in ascending order, such as (0, 1), (0, 2) and (1, 3), the controlled and target qubits should be shuffled.

### In __select_two_qubit_gates function ###
            # Shuffle the pair of qubits to randomly decide on the target and controlled qubits.
            shuffled_qubit_pair = [*qubit_pair]  # key is tuple. Need to cast to list.
            if connection_probability <= 0.75:
                shuffled_qubit_pair[0] = qubit_pair[1]
                shuffled_qubit_pair[1] = qubit_pair[0]

Since the connection probability is guaranteed to be equal to or greater than 0.5, this code randomly swaps the controlled and target qubits.

Similar to selecting two-qubit gates, one-qubit gates are selected afterwards.

        # Step 3: Select one-qubit gates.
        self.__set_one_qubit_gate_set()
        self.num_one_qubit_gates = np.random.randint(
            low=0, high=2 * self.num_qubits**2 - 1
        )
        self.__select_one_qubit_gates()

self.num_one_qubit_gates is an int member variable representing the number of one-qubit gates. The target qubit for each selected one-qubit gate is randomly chosen.

By this stage, the selected one- and two-qubit gates are stored in self.selected_gates. The next stage involves applying these gates to the circuit in a random order.

        # Step 4: Apply the randomly selected gates in an random order.
        self.__apply_selected_gates()

Although every step to create the quanvolutional filter has been implemented, this is still a quantum circuit, and we need to explicitly add the measurement part to the circuit as the final step.

        # Step 5: Set measurements to the lot qubits.
        self.circuit.measure(self.quantum_register, self.classical_register)

Running part

To use the quanvolutional filter, the class must include a function to process the input data.

    def run(
        self,
        data: np.ndarray,
        encoding_method: callable,
        decoding_method: callable,
        shots: int,
    ) -> int:
        # Encode the data to the corresponding quantum state.
        encoded_data = encoding_method(data)

        # Make the circuit having the loading part.
        ready_circuit = self.load_data(encoded_data)

        # Run the circuit.
        transpiled_circuit = qiskit.transpile(ready_circuit, self.simulator)
        result = self.simulator.run(transpiled_circuit, shots=shots).result()
        counts = result.get_counts(transpiled_circuit)

        # Decode the result data.
        decoded_data = decoding_method(counts)

        return decoded_data

The return value of the function self.load_data, which is a new quantum circuit including self.circuit and initialises the qubits (according to the encoded data by the given encoding method), is an instance of qiskit.QuantumCircuit. Once the complete circuit, returned by the function, is prepared, it is executed, and the result is decoded using the given decoding method.

For now, I have implemented only one encoding and decoding method, as proposed in the paper. The details can be found in z_basis_encoder.py and one_sum_decoder.py.

Look-up table

At the end of the previous section, the core of the quanvolutional filter was implemented. However, the time required to use this filter is enormous and impractical, at least on my machine, especially for a large dataset like MNIST. To address this, I implemented a look-up table, which is also used in the original paper. It is quite simple: we create a look-up table mapping every possible input to its corresponding output. Once the table is generated for all input data, there is no need to run the circuit anymore. This technique can be applied because:

The encoding method is simple, and the number of input patterns is finite.
The filter is not trainable in this implementation.

    def set_lookup_table(
        self,
        encoding_method: callable,
        decoding_method: callable,
        shots: int,
        input_patterns: list[tuple[int, int] | tuple[float, float]],
    ):
        if self.lookup_table is None:
            vectorised_run = np.vectorize(self.run, signature="(n),(),(),()->()")
            output_patterns = vectorised_run(
                np.array(input_patterns),
                encoding_method,
                decoding_method,
                shots,
            )
            self.lookup_table = {
                inputs: outputs
                for inputs, outputs in zip(input_patterns, output_patterns)
            }

What happens here is simply running the circuit against the given inputs using the given encoding and decoding methods.

QuanvLayer

The QuanvLayer class is defined in quanv_layer.py.

Constructor

The constructor has four arguments. kernel_size specifies the size of the kernels for each quanvolutional filter. num_filters defines the number of quanvolutional filters that the class contains. padding_mode determines how to pad the image so that the output size of each quanvolutional filter matches the original input size. I used the torch.nn.functional.pad function to pad the image data, so padding_mode corresponds to the mode argument of that function (see torch.nn.functional.pad). is_lookup_mode indicates whether the quanvolutional filters use the look-up tables.

     def __init__(
        self,
        kernel_size: tuple[int, int],
        num_filters: int,
        encoder: BaseEncoder,
        decoder: BaseDecoder,
        padding_mode: str | None = "constant",
        is_lookup_mode: bool = True,
    ):
        # Store the arguments to class variables.
        self.kernel_size = kernel_size
        self.num_filters = num_filters
        self.encoder = encoder
        self.decoder = decoder
        self.padding_mode = padding_mode
        self.is_lookup_mode = is_lookup_mode

        # Define constant.
        self.__BATCH_DATA_DIM = 4

        # Create the quanvolutional filters.
        self.quanv_filters = [
            QuanvFilter(self.kernel_size) for _ in range(self.num_filters)
        ]

What happens here is essentially the creation of instances of QuanvFilter.

Running part

After preparing the quanvolutional filters, the class requires a function to process the data as a layer.

    def run(self, batch_data: torch.Tensor, shots: int) -> torch.Tensor:
        # Check the dataset shape.
        if batch_data.ndim != self.__BATCH_DATA_DIM:
            msg = f"""
                The dimension of the batch_data must be {self.__BATCH_DATA_DIM},
                which is [batch size, channel, height, width].
            """
            raise ValueError(msg)

        # Set the appropriate function according to the mode.
        if self.is_lookup_mode:
            # Get all possible input patterns.
            possible_inputs = self.encoder.get_all_input_patterns(
                num_qubits=self.kernel_size[0] * self.kernel_size[1]
            )
            # Set each look-up table.
            [
                quanv_filter.set_lookup_table(
                    encoding_method=self.encoder.encode,
                    decoding_method=self.decoder.decode,
                    shots=shots,
                    input_patterns=possible_inputs,
                )
                for quanv_filter in self.quanv_filters
            ]
            _run = self.run_single_channel_with_lookup_tables
        else:
            _run = lambda data: self.run_single_channel(data=data, shots=shots)

        all_outputs = torch.stack(
            [
                _run(data=channel)
                for data in tqdm(
                    batch_data, leave=True, desc="Dataset"
                )  # for-loop for batched data
                for channel in data  # for-loop for each channel of each data
            ]
        )

        return all_outputs

This function is quite simple. After checking the data shape and whether the look-up tables are being used, it applies each filter to the given data. The return value, all_outputs, is constructed from the processed image data. Although the actual running part, either self.run_single_channel_with_lookup_tables or self.run_single_channel, is the main part responsible for processing the input data, the procedure is quite similar to that of a classical convolutional layer, except that it uses a quanvolutional filter instead of a classical convolutional filter. Therefore, we will not delve into the function in this article.

QuanvNN

The QuanvNN class is written in quanv_nn.py.

Consctuctor

The class has six arguments for the constructor.

    def __init__(
        self,
        in_dim: tuple[int, int, int],
        num_classes: int,
        quanv_kernel_size: tuple[int, int],
        quanv_num_filters: int,
        quanv_encoder: BaseEncoder,
        quanv_decoder: BaseDecoder,
        quanv_padding_mode: str | None = "constant",
        is_lookup_mode: bool = True,
    ):
        self.in_dim = in_dim
        self.num_classes = num_classes
        self.quanv_kernel_size = quanv_kernel_size
        self.quanv_num_filters = quanv_num_filters
        self.quanv_encoder = quanv_encoder
        self.quanv_decoder = quanv_decoder
        self.quanv_padding_mode = quanv_padding_mode
        self.is_lookup_mode = is_lookup_mode

        # Create and store the instance of the QuanvLayer class as a member variable.
        self.quanv_layer = QuanvLayer(
            kernel_size=quanv_kernel_size,
            num_filters=quanv_num_filters,
            encoder=self.quanv_encoder,
            decoder=self.quanv_decoder,
            padding_mode=quanv_padding_mode,
            is_lookup_mode=is_lookup_mode,
        )
        # Create and store the instance of the ClassicalCNN class as a member variable.
        new_in_dim = (quanv_num_filters, in_dim[1], in_dim[2])
        self.classical_cnn = ClassicalCNN(in_dim=new_in_dim, num_classes=num_classes)

in_dim and num_classes are used to construct the CNN part, which is implemented using PyTorch and is not our primary focus here. The quanv_kernel_size, quanv_num_filters, quanv_padding_mode, quanv_encoder, quanv_decoder and is_lookup_mode arguments are used to create an instance of QuanvLayer.

Running part

What is essentially needed in the class is the method to forward the input data. To achieve this, I implemented __call__ and classify, where __call__ is a special method that allows you to use the instance like a function. The __call__ function outputs the raw output data from the QNN whilst the classify function returns a scalar value representing the class label to which the input data is most likely to belong.

    def __call__(self, x: torch.Tensor, shots: int) -> torch.Tensor:
        quanvoluted_x = self.quanv_layer.run(batch_data=x, shots=shots)
        return self.classical_cnn(quanvoluted_x)

    def classify(self, x: torch.Tensor, shots: int) -> torch.Tensor:
        quanvoluted_x = self.quanv_layer.run(batch_data=x, shots=shots)
        return self.classical_cnn.classify(quanvoluted_x)

self.quanv_layer is an instance of the QuanvLayer class. Both functions contain self.quanv_layer.run, which was introduced in an earlier section.

The key takeaway is that these functions allow us to classify or simply obtain an output from the input data.

Other sources and scripts

I have also written some other Python scripts to train the model using MNIST data. We will not go through each source in detail, but if you want to train the model, all you need to do is run train_model_with_mnist.py. The script requires one argument, which is the path to the configuration file. Examples of configuration files can be found in the configs directory. Simply run python scripts/train_model_with_mnist.py -c [config_path] from the root directory.

Summary

We have had a brief look at what the quanvolutional neural network (QNN) is. Essentially, it is a convolutional neural network with a quanvolutional layer, where the concept of the quanvolutional layer is a complete alternative to the classical convolutional layer.

The introduced implementation of the QNN is readily applicable to the MNIST dataset. However, in my experience, the accuracy of the QNN model has not proven to be superior to that of the corresponding CNN.

The original paper does not provide exhaustive details about the experiment, and there is some flexibility in the training algorithms and settings. I suspect that the reason my results differ from those in the original paper may lie in the differences between our configurations.

Once again, feel free to leave any comments in the section below or raise an issue in the GitHub repository. I welcome your feedback and the opportunity to discuss the QNN or the Python code.

Quantum-Classical Machine learning: QuClassi

Keisuke Sato — Thu, 23 Feb 2023 00:42:28 +0000

This post is about the paper "QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity" and was written according to my understanding and thoughts. Please see the paper by yourself if you are interested.

Introduction

Quantum computers are computers following quantum mechanics. They possibly solve many problems at a higher speed than classical ones. Many scientists have been working on the study. Here are small lists of quantum algorithms:

Shor's algorithm
Grover's algorithm
quantum phase estimation algorithm
Harrow-Hassidim-Lloyd algorithm

Those are well-known algorithms, so there is much information you can access on google or chatGPT!

Many people believe that quantum computers could change our lives in the future, and the field of machine learning is no exception. Many scientists have been working in the quantum machine learning field. I will introduce one of the quantum machine learning papers in this post. The paper is QuClassi: A Hybrid Deep Neural Network Architecture based on Quantum State Fidelity. The first version was submitted to quant-ph on 21 March 2021, and the latest version was on 31 March 2022. it is clearly not today's latest paper, but some of the authors submitted a new one to quant-ph on 11 October 2022. The new one is QuCNN : A Quantum Convolutional Neural Network with Entanglement Based Backpropagation, and it looks like the proposed quantum CNN architecture has similar parts to QuClassi. I believe understanding about QuClassi helps to understand QuCNN, which I might write the introduction post in the future.

I assume the reader has the foundation of quantum computing such as bracket notation and basic quantum gates, as well as qiskit, which is the Python library for quantum computing. If you are not sure what they are, I recommend that you get the knowledge first. The foundation is not difficult so you would understand them easily if you have the foundation of linear algebra.

Concept of QuClassi

QuClassi is a quantum neural network for both binary and multi-class classification. This architecture is a hybrid quantum-classic design, which means it uses a classical computer for some preparation and a quantum one for calculation. The key points of QuClassi are as follows.

has three different quantum layers
has a quantum state fidelity based cost function
encodes two-dimensional data into one qubit

Technically, the last one might have already been suggested in another paper, but the method relates to the quantum layers, so we have to know it. The quantum part of QuClassi, obviously, is implemented as a quantum circuit. Basically, the quantum circuit can be broken down into two parts: the data loading and the classifier generating ones. QuClassi outputs the likelihoods of the categories from input data by calculating fidelity between a loaded state and a classifier state. The input data are classified into the category whose corresponding fidelity is maximum. So, the classifier states have to be trained by the dataset to generate appropriate ones before classifying unknown data.

Encoding Method

Input data is encoded into a loaded state. But how? I would like to avoid going through mathematical equations in this post, but here, the equations would help us to understand the method more than just only sentences.

So, the rotation angle of the gates applied to the loaded state for every element of input data is obtained by the above transform. If the dimensionality of the input data is not an even number, then you can extend the data by zero-padding.

Quantum Fidelity Based Cost Function

Basically, quantum fidelity is the measure of similarity between two quantum states and, at the very least, in this case, the equation is just the squared inner product between two states. In QuClassi architecture, quantum fidelity is obtained by the well-known quantum algorithm SWAP test. The test requires measuring one qubit only once. Measuring many qubits could impose a large error so far, so doing it only once helps to keep the error small.

Quantum Layers

The circuit can have three different layers. The actual structure is decided by users. The following image is just one example, which has all the different layers.

The input data size is four-dimension. The encoding method requires that the number of qubits is half of the input data; The number is two in this case. The classifier state must be the same size as the loaded state. To obtain the quantum fidelity between the classifier state and the loaded state, the circuit needs one more qubit. That is why the circuit has 5 (2 + 2 + 1) qubits. Inside of red and blue rectangles are the classifier-generating and data-loading parts, respectively. (By the way, I am sorry the words in the image and the ones in this post are not the same. "trained_qubit"'s corresponds to the classifier state and "loaded_qubit"'s does to loaded state.) QuClassi updates the rotation parameters of gates applied to the classifier state during learning time. I put each layer's name in the image: single qubit unitary layer, dual qubit unitary layer and controlled qubit unitary layer. Users combine arbitrary numbers of all or part of them in arbitrary order to create a good structure for their dataset. Unfortunately, the paper does not include a careful argument about the layer structure, however, they apparently prefer to use the architecture with only the single qubit unitary layer. It could depend on each dataset. The investigation of the layers would be useful.

we went through some key points of QuClassi architecture so far, but, as you may have already noticed, the circuit can keep only one classifier state. Obviously, one classifier state corresponds to one category. It means n circuits must be prepared for n categories. But do not worry. We do not need to run the all circuits at once and their layers' sequence (not parameters) must be the same, which means the minimum number of qubits that QuClassi runs without problems is the same as for one circuit to run without any problems even during training the circuits.

On Learning

We do not go through the algorithm in detail but roughly do it without mathematical equations. Learning circuits aim to obtain the parameters to represent the classifier states well. So, the rotation angles keep on updating by the quantum fidelity between the classifier state and the loaded state.

On Reported Result

With the MNIST and Iris datasets, the author reported QuClassi achieved state-of-the-art performance and the performance compared to classical counterparts is also better in the sense of the number of parameters. They also reported the result of the performance on the actual quantum computer IBM-Q, which means we can try to train and classify data with QuClassi on the quantum computer freely! Note that, that is a brilliant chance, but it would take a long time even if the combination of layers and dataset are simple.

My thoughts

Basically, QuClassi is the architecture classifying data into an appropriate category by similarity, which is quantum fidelity. Quantum fidelity is an inner product between two unit vectors as every quantum state is expressed as a complex unit vector. So, that sounds like the best performance for the dataset is achieved by the architecture with only the single qubit unitary layer whose the parameters obtained by averaging all encoded data belonging to the same category because the parameters create the state located in the centre among the data. In that sense, the initial parameters should probably be chosen near the averaged parameters. It would help to reduce the number of epochs.

I am curious about how much the different layers affect the performance with other datasets. In the paper, their results related to compare to QuClassi's that have different layers are not rich. So, it might help us to understand the effect of each layer to try to learn and evaluate with other datasets.

Conclusion

We roughly went through QuClassi architecture. I have already tried to use QuClassi as a simulation actually and succeeded in seeing similar results with MNIST and Iris datasets as well as breast cancer and wine datasets. It takes a massive time to train the parameters if the amount of data increases. Of course, the increase is natural but the time is so longer than the classical counterpart due to the limit of the quantum computing device and simulation. I am looking forward to the day we can access quantum computers without any concern.

You also run QuClassi on our computer as a simulation with Python. The following list is my best knowledge about the implementation.

Samuelstein1224/QuClassiExample by the author's
arkmohba/ARK_study_Quclassi by Japanese company
ksk0629/quclassi by forking from and modifying the Japanese company's one

I might write a post about the above my git repository one day.

I would really appreciate it if you tell me my miss understanding, ask me questions or comment below.

Using MLflow on google colaboratory with github to build cosy environment: on VS code

Keisuke Sato — Sat, 19 Mar 2022 03:38:15 +0000

Introduction

In some previous articles, I built a cosy environment to perform machine learning experiments with

google colaboratory (to perform experiments)
github (to manage source codes and information of each experiment)
ngrok (to connect to mlflow window)

It was not bad, but how cosy the environment is small a little bit. There are two reasons:

The text editor on google colab is not cosy.
It is troublesome to run some cells when committing and pushing every time.

We can use a terminal if we buy an account of google colab pro [official post on twitter], but I do not need a pro account for now except for those troubles. Then, what should I do? One of the answers is vscode with google colab.

Note that, I usually use windows 10, the following discussion is for windows.

Process

I'll suppose that vs code is installed on the computer.

Downloading cloudflare

1.Download the execution file from [cloudflare zero trust]

2.Rename the file name to cloudflare.exe

Installing `remote-ssh` extension on vs code

1.Launch vs code

2.Install remote-ssh extension

Pressing Ctrl + Shift + X, the extensions tab is opened. Inputting remote ssh in the search box like below, the extension would appear.

There it is! It is installed when the green Install button is clicked though there is no green Install button on the extension in the picture because I have already got it.

3.Setup ssh config

There is the config file in the home directory, like ~/.ssh/config. The file should be the following one.

Host *.trycloudflare.com
    HostName %h
    User root
    Port 22
    ProxyCommand <absolute path to cloudflare.exe> access ssh --hostname %h

In my case, I put cloudflare.exe in C directory, so my ProxyCommand line is ProxyCommand C:\\cloudflared.exe access ssh --hostname %h.

Preparing config data

1.Create config directory on the google drive

2.Create general_config.yaml and upload it to the config directory

The general_config.yaml must have the following information.

github:
  username: username
  email: email
  token: access token
cloudflare:
  password: password_you_decided

token in github block cloud be got by following [Creating a personal access token].

Creating a new google colab notebook

1.Create a new google colab notebook to access google colab from vs code

2.Run the following codes

# Prepare environment
!pip install colab_ssh --upgrade

# Import necessary modules
import os
from colab_ssh import launch_ssh_cloudflared
from google.colab import drive
import yaml

# Mount my google drive
drive_path = "/content/gdrive"
drive.mount(drive_path)

# Load general config
config_path = os.path.join(drive_path, "MyDrive", "config", "general_config.yaml")
with open(config_path, 'r') as yml:
  config = yaml.safe_load(yml)

# Set git config
config_github = config["github"]
!git config --global user.email {config_github["email"]}
!git config --global user.name {config_github["username"]}

# Create symbolic link
!ln -sfn /content/gdrive/MyDrive/ /root/workspace

# Launch ssh cloudflare
launch_ssh_cloudflared(password=config["cloudflare"]["password"])

The outputted cell has the following information.

3.Copy the url in VSCode Remote SSH block

Accessing to google colab from vs code

1.Open command palette on vs code by pressing Ctrl + Shift + P

2.Input Remote-SSH: Connect to Host... in the box and press enter

3.Input the copied url in the box and press enter

4.Input Continue in the box and press enter

5.Input the password written in general_config.yaml in the box and press enter

There we go! The vs code has already been accessed to google colab. Press Ctrl + Shift + @ and then, input python -V or pip list. We could see the same output as one when we run the same codes on a google colab notebook.

Conclusion

Now, we could edit any file on vs code and also commit and push on the terminal of vs code. It is quite cosy for me. Furthermore, we cloud use mlflow that I introduced a little bit in the previous article. Just run mlflow ui in the terminal and open the outputted url in any browser.

I would appreciate it if someone shares tips in the discussion box below.

My own chatbot by fine-tuning GPT-2

Keisuke Sato — Sat, 19 Feb 2022 14:29:10 +0000

(Updated at 20, February, 2022)

Introduction

In this post, I will fine-tune GPT-2, especially rinna's, which are one of the Japanese GPT-2 models. I am Japanese and most of my chat histories are in Japanese. Because of that, I will fine-tune "Japanese" GPT-2.

GPT-2 stands for Generative pre-trained transformer 2 and it generates sentences as the name shows. We could build a chatbot by fine-tuning a pre-trained model with tiny training data.

I will not go through GPT-2 in detail. I highly recommend the article How to Build an AI Text Generator: Text Generation with a GPT-2 Model on dev.to to understand what is GPT-2 and what is a language model.

git repository: chatbot_with_gpt2

I would appreciate the author of the following two articles.

Thanks to the first author, I could build my chatbot model. The sources in my git repository are almost constructed with his codes. I just summarized them. Thanks to the second author, I could go through GPT-2.

What is rinna

rinna is a conversational pre-trained model given from rinna Co., Ltd. and five pre-trained models are available on hugging face [rinna Co., Ltd.] on 19, February 2022. rinna is a bit famous in Japanese because they published rinna AI on LINE, one of the most popular SNS apps in Japan. She is a junior high school girl. We could take conversations on LINE.

I am not sure when the models are published on hugging face, but anyways, the models are available now. I will fine-tune rinna/japanese-gpt2-small whose number of parameters is small. By the way, I wanted to use rinna/japanese-gpt-1b whose number of parameters is around one billion, but I couldn't because of the memory capacity on google colab.

Process

I will suppose you have a google and git account and can use google colab.

Furthermore, I will use a chat history on LINE. If you have no account on the app, it is okay. All you have to do is prepare a chat history and modify the data. I know these processes are the hardest and most bothering things though. If you have the account, the following processes would work. Note that, if your LINE setting language is Japanese, you should change it to English until exporting a chat history because the following processes are supposing the setting language (not message language) is English.

Prepare the environment

At the end of this process, your google drive is constructed as follows.

MyDrive ---- chatbot_with_gpt2.ipynb
           |
           |- config
           |    |- general_config.yaml
           |
           |- data
                |- chat_history.txt

1: Clone chatbot_with_gpt2 repository on your local machine.

It is accomplished by running the following command on the git bash.

git clone https://github.com/ksk0629/chatbot_with_gpt2

2: Upload chatbot_with_gpt2/chatbot_with_gpt2.ipynb to the google drive.
3: Make a directory named config on your google drive and create general_config.yaml in the config folder.

general_config.yaml is as follows.

github:
  username: your_github_username
  email: your_email
  token: your_access_token
ngrok:
  token: anything

The ngrok block is needless, but it is needed to avoid an error below.

4: Get a chat history from LINE.

We can get the history by following the official announcement [Help centre - Chat history].

5: Make a directory named data on your google drive and move the chat history to the directory.

Prepare training data and build the model

1: Open chatbot_with_gpt2.ipynb on google colaboratory.
2: Run the cells in Preparation block.

The environment is prepared to get training data and build the model by running the cells.

3: Change chatbot_with_gpt2/pre_processor_config.yaml.

The initial yaml file is as follows.

line:
  initial:
    input_username: "input_username"
    output_username: "output_username"
    target_year_list: "[2016,2017,2018,2019,2020,2021,2022]"
  path:
    input_path: "/content/gdrive/MyDrive/data/chat_history.txt"
    output_path: "chat_history_cleaned.pk"

You have to change at least initial block. The meaning of each line is as follows.

input_username: a username of messages that you want to input into the model
output_username: a username of messages that you want the model to output
target_year_list: years that you want to use to train the model
input_path: path to the raw chat history
output_path: path to the cleaned data that is obtained by the following process

Note that, if you do not change output_path, then your training data would not be available after closing the notebook. Of course, it is available whilst the notebook is working.

4: Run the cell in Preprocessing data block.

The data is cleaned in the cell.

5: Change chatbot_with_gpt2/model_config.yaml.

The initial yaml file is as follows.

general:
  basemodel: "rinna/japanese-gpt2-xsmall"
dataset:
  input_path: "chat_history_cleaned.pk"
  output_path: "gpt2_train_data.txt"
train:
  epochs: 10
  save_steps: 10000
  save_total_limit: 3
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 1
  output_dir: "model/default"
  use_fast_tokenizer: False

You have to change input_path in dataset block to the path to the cleaned data, which is specified in pre_processor_config.yaml. You can change basemodel to rinna/japanese-gpt2-small, but others (medium and 1b) would not work because of a lack of GPU memory as I mentioned in What is rinna section.

6: Run the cells in Training data preparation and Building model block.

That is all! After running this cell, all you have to do is wait for a while. You would see your model file in the directory that is specified in model_config.yaml.

Let's talk to the model

Again, all you have to do is run the only one cell in Talking with the model block. Then, the source code is running and you could talk with the model, like the following.

Conclusion

I fine-tuned GPT-2 with my chat history on LINE. I certainly did it, but there are the following problems as you could see in Let's talk to the model section.

There is unnecessary line Setting 'pad_token_id' to 'eos_token_id':2 for open-end generation. in each conversation.
There are some tokens, like <br:, [<unk>hoto]<br///, and <br/ゥ>, that disturb coherence sentence.
The model did not reply well.

The first response

帰ったんか
おつかれさま!

looks quite good because "おっす" means "Hey" and the response means "You are home. You’ve got to be exhausted". Something like these. But the others look wrong. To improve the model, I could clean training data more and I need to understand GPT-2 and the source codes.

If you have any suggestions, comments, or questions about this article, please comment below. I'd appreciate it.

Toward understanding DNN (deep neural network) well: iris dataset

Keisuke Sato — Sun, 13 Feb 2022 09:21:11 +0000

Introduction

This is my second "toward understanding DNN (deep neural network) well" series. I will explore the effect of the numbers of layers and units again with the iris dataset.

github repository: comparison_of_dnn

Note that, this is not a "guide", this is a memo from a beginner to beginners. If you have any comments, suggestions, questions, etc. whilst reading this article, please let me know in the comments below.

Iris dataset

Obviously, it is a so famous dataset. Most people would not need an explanation about this dataset. But I will see a little bit because I am a beginner.

We can use this dataset by sklearn.datasets.load_iris() function. This is for multi-classification. It contains 150 data and each data has the following four features.

sepal length (cm)
sepal width (cm)
petal length (cm)
petal width (cm)

The number of classes is three and the dataset has the same numbers of data belonging to each class. This dataset is the three-classification dataset. As most of us know, there has no missing data, but this is like a tutorial article, so I check whether there are missing values.

Input:

import sklearn
from sklearn import datasets

iris_dataset = sklearn.datasets.load_iris(as_frame=True)["frame"]
iris_df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 150 entries, 0 to 149
Data columns (total 5 columns):
 #   Column             Non-Null Count  Dtype  
---  ------             --------------  -----  
 0   sepal length (cm)  150 non-null    float64
 1   sepal width (cm)   150 non-null    float64
 2   petal length (cm)  150 non-null    float64
 3   petal width (cm)   150 non-null    float64
 4   target             150 non-null    int64  
dtypes: float64(4), int64(1)
memory usage: 6.0 KB

Cool. There are no missing values. Next, I check the basic statistics.

Input

iris_df.describe().drop(["count"])

Output

      sepal length (cm) sepal width (cm) petal length (cm) /
mean           5.843333         3.057333          3.758000 /
std            0.828066         0.435866          1.765298 /
min            4.300000         2.000000          1.000000 /
25%            5.100000         2.800000          1.600000 /
50%            5.800000         3.000000          4.350000 /
75%            6.400000         3.300000          5.100000 /
max            7.900000         4.400000          6.900000 /

      petal width (cm)    target
              1.199333  1.000000
              0.762238  0.819232
              0.100000  0.000000
              0.300000  0.000000
              1.300000  1.000000
              1.800000  2.000000
              2.500000  2.000000

Of course, I am interested in data analysis, but I have no ability for analysing it so far. I will analyse the data someday.

Comparison

For the sake of simplicity, I suppose the following conditions.

A model is fixed all conditions except for the number of layers and the numbers of units of each layer.
Any data preprocessing is not performed.
Seed is fixed.

Most of those conditions can be changed or removed. All you have to do is change config_iris.yaml. The yaml file has the following lines.

mlflow:
  experiment_name: iris
  run_name: default
dataset:
  eval_size: 0.25
  test_size: 0.25
  train_size: 0.75
  shuffle: True
dnn:
  n_layers: 3
  n_units_list:
    - 8
    - 4
    - 3
  activation_function_list:
    - relu
    - relu
    - softmax
  seed: 57
dnn_train:
  epochs: 30
  batch_size: 4
  patience: 5

The following changes work to build a model that has five layers (four dense layers plus one output layer), which have relu function as their activation functions, and 8 units.

dnn:
  n_layers: 5
  n_units_list:
    - 8
    - 8
    - 8
    - 8
    - 3
  activation_function_list:
    - relu
    - relu
    - relu
    - relu
    - softmax

Note that, some of the model's information is hard coding. You have to write codes to change them. For example, model's loss function is cross entropy, which is calculated by keras.losses.SparseCategoricalCrossentropy() function and it is specified in iris_dnn.py:
https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/iris_dnn.py#L28-L30

result

First, I summarise all results. The losses and accuracy are as follows.

#layers	#parameters	training loss	evaluation loss	test loss	test accuracy
2	35	0.166	0.136	0.157	0.947
2	67	0.086	0.022	0.039	0.974
2	131	0.086	0.033	0.043	1.0
2	259	0.09	0.024	0.047	0.974
3	263	0.104	0.018	0.069	0.974
4	260	0.123	0.05	0.115	0.947
5	261	0.089	0.089	0.075	0.974
6	255	0.138	0.043	0.119	0.947
7	263	0.091	0.023	0.047	0.974
8	261	1.099	1.099	1.099	0.316
9	259	1.099	1.099	1.099	0.316

The amount of test data is 38 and the dataset has 12 data belonging to class 0, 13 data belonging to class 1, and 13 data belonging to class 2.

I performed 11 experiences to explore the following two things.

effect of the number of parameters
effect of the number of layers

The experiments from the first to the fourth are for the first one and the experiments from the fourth to eleventh are for the second one.

The result says the following facts.

The model that has two layers and 67 parameters is the best one in the sense of the test loss value.
The model that has two layers and 131 parameters is the best one in the sense of the test accuracy.
The models that have 8 layers and 9 layers are the worst ones.

It is a bit surprising to me because I expected the best model would be one whose layers and parameters are more than the above best ones. It is possibly due to the distribution of the test data because it might be too small to evaluate the performance. But at least under the above conditions, the two models that have two layers are the best ones. It possibly means that other ones became overfitting.

As mentioned later, the vanishing gradient occurred in the eight and nine layers model experiments. That is, the eight layers are too much to learn well at least with the iris data under the above conditions.

Except for the models that were occurred the vanishing gradient problem and the best one in the sense of test accuracy, all of the models classified correctly 36 or 37 test data. And interestingly, one of the data classified wrongly is the same one. It possibly implies the distribution of the test data is not great, which means there is a difference between the training data and the test data.

Furthermore, most of the models correctly classified most of the data, which means DNN is so effective to the iris data even though the model structure is so simple.

two layers with 35 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 4)                 20        

 dense_1 (Dense)             (None, 3)                 15        

=================================================================
Total params: 35
Trainable params: 35
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.166
evaluation loss: 0.136
test loss: 0.157
test accuracy: 0.947

The number of the correct outputted results is 36 since the amount of test data is 38. It looks great and it actually works great. At least for iris data, DNN is a very powerful tool even though the model has a very simple structure.

two layers with 67 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 3)                 27        

=================================================================
Total params: 67
Trainable params: 67
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.086
evaluation loss: 0.022
test loss: 0.039
test accuracy: 0.974

This model correctly classified 37 test data.

two layers with 131 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                80        

 dense_1 (Dense)             (None, 3)                 51        

=================================================================
Total params: 131
Trainable params: 131
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.086
evaluation loss: 0.033
test loss: 0.043
test accuracy: 1.0

This model correctly classified all test data.

two layers with 259 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 32)                160       

 dense_1 (Dense)             (None, 3)                 99        

=================================================================
Total params: 259
Trainable params: 259
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.09
evaluation loss: 0.024
test loss: 0.047
test accuracy: 0.974

This model correctly classified 37 test data.

three layers with 263 parameters

The structure is as follows.

 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 16)                80        

 dense_1 (Dense)             (None, 9)                 153       

 dense_2 (Dense)             (None, 3)                 30        

=================================================================
Total params: 263
Trainable params: 263
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.104
evaluation loss: 0.018
test loss: 0.069
test accuracy: 0.974

This model correctly classified 37 data too.

four layers with 260 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 14)                70        

 dense_1 (Dense)             (None, 9)                 135       

 dense_2 (Dense)             (None, 4)                 40        

 dense_3 (Dense)             (None, 3)                 15        

=================================================================
Total params: 260
Trainable params: 260
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.123
evaluation loss: 0.05
test loss: 0.115
test accuracy: 0.947

This model correctly classified 36 data.

five layers with 261 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 12)                60        

 dense_1 (Dense)             (None, 8)                 104       

 dense_2 (Dense)             (None, 6)                 54        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 3)                 15        

=================================================================
Total params: 261
Trainable params: 261
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.089
evaluation loss: 0.089
test loss: 0.075
test accuracy: 0.974

This model correctly classified 37 data.

six layers with 255 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10)                50        

 dense_1 (Dense)             (None, 8)                 88        

 dense_2 (Dense)             (None, 6)                 54        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 4)                 20        

 dense_5 (Dense)             (None, 3)                 15        

=================================================================
Total params: 255
Trainable params: 255
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.138
evaluation loss: 0.043
test loss: 0.119
test accuracy: 0.947

This model correctly classified 37 data.

seven layers with 263 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 10)                50        

 dense_1 (Dense)             (None, 6)                 66        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 6)                 42        

 dense_4 (Dense)             (None, 4)                 28        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 3)                 15        

=================================================================
Total params: 263
Trainable params: 263
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 0.091
evaluation loss: 0.023
test loss: 0.047
test accuracy: 0.974

This model correctly classified 37 data.

eight layers with 261 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 6)                 54        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 6)                 42        

 dense_4 (Dense)             (None, 4)                 28        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 4)                 20        

 dense_7 (Dense)             (None, 3)                 15        

=================================================================
Total params: 261
Trainable params: 261
Non-trainable params: 0
________________________________________________________________

The final indices are as follows.

training loss: 1.099
evaluation loss: 1.099
test loss: 1.099
test accuracy: 0.316

The vanishing gradient problem occurred whilst learning. In fact, the training loss converged soon:

It implies 8 layers are too much to learn at least with the iris data.

nine layers with 259 parameters

The structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 8)                 40        

 dense_1 (Dense)             (None, 6)                 54        

 dense_2 (Dense)             (None, 6)                 42        

 dense_3 (Dense)             (None, 4)                 28        

 dense_4 (Dense)             (None, 4)                 20        

 dense_5 (Dense)             (None, 4)                 20        

 dense_6 (Dense)             (None, 4)                 20        

 dense_7 (Dense)             (None, 4)                 20        

 dense_8 (Dense)             (None, 3)                 15        

=================================================================
Total params: 259
Trainable params: 259
Non-trainable params: 0
_________________________________________________________________

The final indices are as follows.

training loss: 1.099
evaluation loss: 1.099
test loss: 1.099
test accuracy: 0.316

The vanishing gradient occurred too. I have already observed this problem in the experiment of eight layers model. This experiment is for just checking whether it was certainly due to the number of layers and the vanishing gradient problem occurred again.

Conclusion

I explored the effect of the numbers of layers and the number of parameters with the iris dataset. As the result, I found the two layers models are the best ones in the sense of the test loss value and the test accuracy though it might be due to the small test size. The eight and nine layers models learnt anything. The vanishing gradient occurred. It implies it is too much to learn if the amount of layers is more than eight.

As mentioned in result section, the data that most of the models were classified wrongly is the same and the data is as follows.

sepal length (cm)    6.3
sepal width (cm)     2.5
petal length (cm)    4.9
petal width (cm)     1.5
target               1.0
Name: 72, dtype: float64

I guess it is certainly important to check whether or not the data is an outlier.

All of the experiences were performed under 57 seed. It sounds interesting to change the seed and perform the same experiences. Note that, the seed also affects a way of splitting the iris data into ones for training, evaluation, and test data. To use the same test data, it is needed to change load_splitted_dataset_with_eval() function in custom_dataset.py:
https://github.com/ksk0629/comparison_of_dnn/blob/8498a7d15ed6a4447f13f9f277e214f4821f46a1/src/custom_dataset.py#L75-L110

Toward understanding DNN (deep neural network) well: California housing dataset

Keisuke Sato — Sun, 06 Feb 2022 12:57:31 +0000

(Updated on 12, 2, 2022)

Introduction

This article is about machine learning and for beginners like me. I am always not sure how I decide the numbers of layers and units of each layer when I build a model. In this article, I will explore the effect of them those with California housing dataset.

All of the following codes are in my git repository. You can perform the following each experiment by cloning and then running the notebook.

github repository: comparison_of_dnn

Note that, this is not a "guide", this is a memo from a beginner to beginners. If you have any comment, suggestion, question, etc. whilst reading this article, please let me know in the comments below.

California housing dataset

California housing dataset is for regression. It has eight features and one target value. We can get the dataset using sklearn.datasets.fetch_california_housing() function. The eight features are as follows.

MedInc: median income in block group
HouseAge: median house age in block group
AveRooms: average number of rooms per household
AveBedrms: average number of bedrooms per household
Population: block group population
AveOccup: average number of household members
Latitude: block group latitude
Longitude: block group longitud

The one target value is as follows:

MedHouseVal: median house value for California districts, expressed in hundreds of thousands of dollars

As I said, it is for regression, so I will build a model whose the inputs are those features and the output is the target value.

I will not analyze this dataset carefully, but do just a little bit using pandas.DataFrame class' methods.

Let's see the information to check if there are missing values.
Input:

import sys
sys.path.append("./src")

import src.utils

# Load dataset
callifornia_df = src.utils.load_california_housing()
callifornia_df.info()

Output:

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20640 entries, 0 to 20639
Data columns (total 9 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   MedInc       20640 non-null  float64
 1   HouseAge     20640 non-null  float64
 2   AveRooms     20640 non-null  float64
 3   AveBedrms    20640 non-null  float64
 4   Population   20640 non-null  float64
 5   AveOccup     20640 non-null  float64
 6   Latitude     20640 non-null  float64
 7   Longitude    20640 non-null  float64
 8   MedHouseVal  20640 non-null  float64
dtypes: float64(9)
memory usage: 1.4 MB

There are no missing values. Then, let's see some statistics.
Input:

callifornia_df.describe().drop(["count"])

Output:

         MedInc   HouseAge    AveRooms  AveBedrms    Population \
mean   3.870671  28.639486    5.429000   1.096675   1425.476744
std    1.899822  12.585558    2.474173   0.473911   1132.462122   
min    0.499900   1.000000    0.846154   0.333333      3.000000
25%    2.563400  18.000000    4.440716   1.006079    787.000000
50%    3.534800  29.000000    5.229129   1.048780   1166.000000
75%    4.743250  37.000000    6.052381   1.099526   1725.000000
max   15.000100  52.000000  141.909091  34.066667  35682.000000

        AveOccup   Latitude   Longitude  MedHouseVal
mean    3.070655  35.631861 -119.569704     2.068558
std    10.386050   2.135952    2.003532     1.153956
min     0.692308  32.540000 -124.350000     0.149990
25%     2.429741  33.930000 -121.800000     1.196000
50%     2.818116  34.260000 -118.490000     1.797000
75%     3.282261  37.710000 -118.010000     2.647250
max  1243.333333  41.950000 -114.310000     5.000010

Comparison

For the sake of simplicity, I suppose the following conditions.

A model is fixed all conditions except for the number of layers and the numbers of units of each layer.
Any data preprocessing is not performed.
Seed is fixed.

So, the following discussion is under those conditions, but if you want to change those conditions or remove those conditions, it will be done soon. Most of them, all of you have to do is changing config_california.yaml. It has the following statements.

mlflow:
  experiment_name: california
  run_name: default
dataset:
  eval_size: 0.25
  test_size: 0.25
  train_size: 0.75
  shuffle: True
dnn:
  n_layers: 3
  n_units_list:
    - 8
    - 4
    - 1
  activation_function_list:
    - relu
    - relu
    - linear
  seed: 57
dnn_train:
  epochs: 30
  batch_size: 4
  patience: 5

You only have to change 57 in seed block to other integer or None if you perform an experiment under other fixed seed or without fixing seed, respectively.

result

The loss summary is as follows.

#layers	training loss	evaluation loss	test loss
three (tiny #units)	0.616	0.565	0.596
four	0.54	0.506	0.53
five	0.543	1.126	1.2
six	0.515	0.49	0.512
seven	1.31	1.335	1.377
three (many #units)	0.537	0.515	0.555

The best model is one that has six layers in the sense of the test loss (actually, in the sense of all of losses). The test losses of the five and the seven ones are near, but the training results are not near at all. In the training of seventh one, vanishing gradient problem occurs. This is probably occurred due to the deep depth.

I will plot the predicted values and the true target values. It tells us the fourth one is also the best in the sense of each prediction.

The models from the first to fifth have different numbers of the number of units. The last model whose three layers and the number of units are similar to the one of the fourth model was built to compare the effect of the number of layers and the one of the number of units. As the result, the fourth model is more better than another and it implies the depth of layers is more important that the number of units at least for California dataset. Obviously, the difference between the maximum true target value and the mean one is greater than the difference between the minimum one and mean one in California dataset. The depth of layers probably is effective in tolerance to outliers.

Further, by comparing the first model and the last model, the effect of the number of nodes is appeared. Obviously, the number of nodes contributes to fit better.

See the following sections for more information.

tiny three layers (two hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_3 (Dense)             (None, 8)                 72        

 dense_4 (Dense)             (None, 4)                 36        

 dense_5 (Dense)             (None, 1)                 5         

=================================================================
Total params: 113
Trainable params: 113
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 0.616
evaluation loss: 0.565
test loss: 0.596

The prediction results are as follows.

The green line represents the predicted target values and the red line does the true target values.

The invisible lower limit line is there and some predicted values are greater than the maximum true value. It probably implies that the model is underfitting the data.

four layers (three hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_10 (Dense)            (None, 16)                144       

 dense_11 (Dense)            (None, 8)                 136       

 dense_12 (Dense)            (None, 4)                 36        

 dense_13 (Dense)            (None, 1)                 5         

=================================================================
Total params: 321
Trainable params: 321
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 0.54
evaluation loss: 0.506
test loss: 0.53

The prediction results are as follows.

The green line represents the predicted target values and the red line does the true target values. Note that, it is drown the first 500 values for good visibility.

The invisible line is still there. The number of the predicted values that are greater than the maximum true value is less than the predicted values from the three layers model, but there are still some predicted values that are greater than the maximum true value.

five layers (four hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_19 (Dense)            (None, 32)                288       

 dense_20 (Dense)            (None, 16)                528       

 dense_21 (Dense)            (None, 8)                 136       

 dense_22 (Dense)            (None, 4)                 36        

 dense_23 (Dense)            (None, 1)                 5         

=================================================================
Total params: 993
Trainable params: 993
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 0.543
evaluation loss: 1.126
test loss: 1.2

The prediction results are as follows.

The green line represents the predicted target values and the red line does the true target values.

The invisible line is still there. The number of high predicted values is less than before. Whilst that, some of the predicted values are smaller than before. It might be overfitting.

six layers (five hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_30 (Dense)            (None, 64)                576       

 dense_31 (Dense)            (None, 32)                2080      

 dense_32 (Dense)            (None, 16)                528       

 dense_33 (Dense)            (None, 8)                 136       

 dense_34 (Dense)            (None, 4)                 36        

 dense_35 (Dense)            (None, 1)                 5         

=================================================================
Total params: 3,361
Trainable params: 3,361
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 0.515
evaluation loss: 0.49
test loss: 0.512

The prediction results are as follows.

The green line represents the predicted target values and the red line does the true target values.

The invisible horizontal line is collapsed a bit. The line is still there, but the horizontality is certainly less than before. Also the number of high predicted values are less than before.

seven layers (six hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_43 (Dense)            (None, 128)               1152      

 dense_44 (Dense)            (None, 64)                8256      

 dense_45 (Dense)            (None, 32)                2080      

 dense_46 (Dense)            (None, 16)                528       

 dense_47 (Dense)            (None, 8)                 136       

 dense_48 (Dense)            (None, 4)                 36        

 dense_49 (Dense)            (None, 1)                 5         

=================================================================
Total params: 12,193
Trainable params: 12,193
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 1.31
evaluation loss: 1.335
test loss: 1.377

The prediction results are as follows.

The green line represents the predicted target values and the red line does the true target values.

The model outputted constant for all of inputs. Vanishing gradient probably occurred. Actually the training loss converged soon:

three layers with almost 3300 units (two hidden layers plus one output layer)

This structure is as follows.

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_67 (Dense)            (None, 72)                648       

 dense_68 (Dense)            (None, 36)                2628      

 dense_69 (Dense)            (None, 1)                 37        

=================================================================
Total params: 3,313
Trainable params: 3,313
Non-trainable params: 0
_________________________________________________________________

The final losses are as follows.

training loss: 0.537
evaluation loss: 0.515
test loss: 0.555

The prediction results are as follows.

Same to six layers model, the horizontality of the invisible line is less than others. Whilst that, the number of the high predicted values are obviously greater than one of six layers model.

Conclusion

I explored effect of the number of layers and nodes with California dataset. As the result, I found that the bigger number of layers and the bigger number of units are effective to fit better, but the huge number of layers causes vanishing gradient problem.

It is excellent to write codes and perform experiments by myself, but unfortunately, I am still not sure how I decide the numbers. I will explore the effect with other datasets again.

Again, I would so appreciate you if you give me comment, suggestion, question, etc.

Using MLflow on google colaboratory with github to build cosy environment: building

Keisuke Sato — Fri, 28 Jan 2022 11:43:54 +0000

(Updated on 19, March 2022)
(Updated on 6, February 2022)
(Updated on 30, January 2022)

Introduction

I built my first cosy environment. The following is how I build it.

github repository: template_with_mlflow

Preparation

Here from, I suppose you've got accounts of google, ngrok, and github. If you haven't, please create them before starting to read the following.

You have to upload yaml file general_config.yaml including github and ngrok information like the following image.

It is written like the following.

github:
  username: your_username
  email: your_email@gmail.com
  token: your_personal_access_token
ngrok:
  token: ngrok_authentication_token

If you haven't got any personal access token, you have to create it by following [Creating a personal access token]. You can find another token on your ngrok top page:

Process

I'll show how I built my cosy environment.

1: Create a new google colaboratory notebook

2: Install and import mlflow and pyngrok to visualize your model information running the following codes on the google colaboratory notebook.

!pip install mlflow
!pip install pyngrok

import os
from pyngrok import ngrok
import yaml

3: Set your information running the following codes.

# Mount my google drive
from google.colab import drive
drive_path = "/content/gdrive"
drive.mount(drive_path)

# Load the general config
config_path = os.path.join(drive_path, "MyDrive", "config", "general_config.yaml")
with open(config_path, 'r') as yml:
  config = yaml.safe_load(yml)

config_github = config["github"]
config_ngrok = config["ngrok"]

# Set git configs
!git config --global user.email {config_github["email"]}
!git config --global user.name {config_github["username"]}

# Clone the repository
repository_name = "template_with_mlflow"
git_repository = f"https://github.com/ksk0629/" + repository_name + ".git"
repository_path = "/content/" + repository_name
!git clone {git_repository}

# Change the current directory to the cloned directory
%cd {repository_name}

# Checkout branch
branch_name = "main"
!git checkout {branch_name}

# Pull
!git pull

You can replace "template_with_mlflow" with a repository name you want to clone.

4: Train your model containing MLflow codes like the following.

experiment_name = "mnist with cnn"
run_name = "first run"
validation_size = 0.2
epochs = 1000
batch_size = 2048
n_features = 784
n_hidden = 100
learning_rate = 0.01
seed = 57

!python ./src/mlflow_example.py "{experiment_name}" "{run_name}" {seed} {validation_size} {n_hidden} {n_features} {epochs} {batch_size} {learning_rate}

experiment_name = "mnist with cnn"
run_name = "second run"
validation_size = 0.2
epochs = 1000
batch_size = 2048
n_features = 784
n_hidden = 300
learning_rate = 0.01
seed = 57

!python ./src/mlflow_example.py "{experiment_name}" "{run_name}" {seed} {validation_size} {n_hidden} {n_features} {epochs} {batch_size} {learning_rate}

You can train models whatever you want.

5: Run MLflow and see your models' information through ngrok.

# Run mlflow
get_ipython().system_raw("mlflow ui --port 5000 &") # run tracking UI in the background

# Terminate open tunnels if exist
ngrok.kill()

# Setting the authtoken of ngrok
ngrok.set_auth_token(config_ngrok["token"])

# Open an HTTPs tunnel on port 5000 for http://localhost:5000
ngrok_tunnel = ngrok.connect(addr="5000", proto="http", bind_tls=True)
print("MLflow Tracking UI:", ngrok_tunnel.public_url)

You would get a global IP on the output cell:

MLflow Tracking UI: https://cexx-xx-xxx-xxx-xx.ngrok.io

You can see your models' information on the page like the following.

6: Commit and push your changes to the remote repository.

add_objects = os.path.join(repository_path, "mlruns", "*")
!git add {add_objects}

commit_msg = "Add new mlruns"
!git commit -m "{commit_msg}"

html = f"https://{config_github['token']}@github.com/{config_github['username']}/{repository_name}.git"
!git remote set-url origin {html}
!git push origin {branch_name}

Of course, you can choose files you commit and change the commit message to whatever you want to.

Conclusion

You've already got your cosy environment! By the way, I said

I have to add the commit number information to MLflow information after pushing new source codes on a remote repository.

but, apparently, MLflow was so smart. I didn't do anything, but the git commit number was already written!

Using MLflow on google colaboratory with github to build cosy environment: design

Keisuke Sato — Tue, 25 Jan 2022 12:11:21 +0000

(Updated on 19, March 2022)
(Updated on 6, February 2022)
(Updated on 28, January 2022)

What do I want to do?

It's so troublesome to manage all settings (e.g., epochs, optimizer, etc) for building a certain model and I'd be interested in Kaggle these days. Before I join the competitions, I build a cosy environment.

Pieces of my cosy environment

I'll use the following one library and four services.

MLflow Tracking
Google Colaboratory
Google Drive
Github
ngrok

MLflow and google drive are for managing settings for building models, google colaboratory is for building models, github is for managing the source codes, and the result of mlflow is seen through ngrok.

Concept

Apparently, MLflow Projects can connect to gitlab and manage the source codes, but at first, I just directly clone a repository and push new changes from google colaboratory notebook. Because of that, I have to add the commit number information to MLflow information after pushing new source codes on a remote repository.

DEV Community: Keisuke Sato

Qiskit Aer vs MQT DDSIM

Introduction

Experimental Result

Quantum Computing

Qiskit

Results

Execution time vs the number of qubits

Execution time vs the number of adders

Additional experiments on DDSIM

Conclusion and a little bit more

Quantum-Classical Machine Learning: Quanvolutional Neural Network

Introduction

Convolutional Layer

Quanvolutional Layer

One Realisation of the QNN

Encoding method

Construct the quantum circuit

Assign connection probabilities

Select two-qubit gates

Select one-qubit gates

Apply the selected gates in random order

Decoding method

Result of the comparison

My Implementation

QuanvFilter

Constructor

Running part

Look-up table

QuanvLayer

Constructor

Running part

QuanvNN

Consctuctor

Running part

Other sources and scripts

Summary

Quantum-Classical Machine learning: QuClassi

Introduction

Concept of QuClassi

Encoding Method

Quantum Fidelity Based Cost Function

Quantum Layers

On Learning

On Reported Result

My thoughts

Conclusion

Using MLflow on google colaboratory with github to build cosy environment: on VS code

Introduction

Process

Downloading cloudflare

Installing remote-ssh extension on vs code

Preparing config data

Creating a new google colab notebook

Accessing to google colab from vs code

Conclusion

My own chatbot by fine-tuning GPT-2

Introduction

What is rinna

Process

Prepare the environment

Prepare training data and build the model

Let's talk to the model

Conclusion

Toward understanding DNN (deep neural network) well: iris dataset

Introduction

Iris dataset

Comparison

result

two layers with 35 parameters

two layers with 67 parameters

two layers with 131 parameters

two layers with 259 parameters

three layers with 263 parameters

four layers with 260 parameters

five layers with 261 parameters

six layers with 255 parameters

seven layers with 263 parameters

eight layers with 261 parameters

nine layers with 259 parameters

Installing `remote-ssh` extension on vs code