DEV Community

Cover image for Quantum-Classical Machine Learning: Quanvolutional Neural Network
Keisuke Sato
Keisuke Sato

Posted on

Quantum-Classical Machine Learning: Quanvolutional Neural Network

This post is about the paper Quanvolutional Neural Networks: Powering Image Recognition with Quantum Circuits and my implementation based on it. I have written this post based on my understanding. I would greatly appreciate your comments and any suggestions for modifying this post or my implementation in the GitHub repository. If you are interested, please refer to the paper yourself.

Introduction

Quantum machine learning (QML) has rapidly grown in response to advancements in quantum computing research. Research in QML explores various directions, one of which involves rewriting classical methods or architectures in a quantum manner.

The Quanvolutional Neural Network (QNN) is analogous to the classical Convolutional Neural Network (CNN). As the name suggests, the most distinctive feature of a CNN is the convolutional layer. In the case of the QNN, it is a hybrid quantum-classical machine learning model where the convolutional layer is replaced by a quanvolutional layer. The authors proposed the concept of the quanvolutional layer, which has exactly the same parameters as the convolutional layer. In theory, any convolutional layer in any CNN can be replaced with a quanvolutional layer.

The primary contribution of the original paper is the concept of the Quanvolutional Neural Network, rather than its practical realisation. Nevertheless, I will also discuss the implementation proposed in the paper.

For those who love coding, I have shared my GitHub repository containing the QNN implementation here: https://github.com/ksk0629/quanvolutional_neural_network.

Convolutional Layer

The convolutional layer allows us to exploit spatial locality and translational invariance in data. Let us have a look at how a convolutional layer works. Consider the following data, which has a 4x4 shape:

original data

The convolutional layer uses a sliding window of a certain size. Here, we will use the simplest sliding window, which is 3x3:

sliding window

The layer applies the sliding window to each local region, i.e., it exploits spatial locality. When the sliding window is applied to the first section, as shown below, we calculate a new value: 1x1 + 1x0 + 1x0 + 1x0 + 1x1 + 1x1 + 1x0 + 1x0 + 1x1 = 4.

applying the sliding window to the first section

Now, imagine that each element of the original data is shifted one pixel to the right. There are various ways to fill the left edge, but here we simply use 0:

translated original data

As you may notice, the second section is identical to the first section of the original data:

applying the sliding window to the second section of the translated original data

This demonstrates how the layer exploits translational invariance in data. The convolutional layer uses many convolutional filters, which are sliding windows. Each filter is different, and these filters are applied to the input data independently. Suppose the layer has 50 filters. The output of the layer would then consist of 50 new data sets.

Quanvolutional Layer

The QNN features the quanvolutional layer, which is analogous to the convolutional layer. The quanvolutional layer consists of many quantum circuits, referred to as quanvolutional filters, which are analogous to sliding windows. If the window size is 3x3, the number of qubits will be 3x3 = 9. In the earlier example of the convolutional filter, we simply multiplied each entry in the window by the corresponding entry in the data and summed the results. The quanvolutional filter operates in a similar way: each entry of the window corresponds to a qubit in the quantum circuit. However, instead of performing multiplication and addition, the quanvolutional filter computes a new value based on the quantum gates applied to the circuit.

To process classical data—i.e., the data used by our current computers—we need to encode the data onto qubits. There are several encoding methods, and it is up to the user to choose which one to employ. After the data is encoded, the computation phase occurs, followed by the measurement of each qubit. Since quanvolutional layers and convolutional layers are interchangeable, the result of each quanvolutional filter computation must be a scalar. Therefore, the measurement outcome must be decoded, and again, there are several decoding methods available.

The following circuit represents an example of a quantum filter:

example of quantum filter

One Realisation of the QNN

I believe the most important contribution of the original paper is the introduction of the QNN concept. This concept provides a perfect analogy to CNN. However, the original paper also presents one realisation of the QNN. As it is meant to test whether there is a quantum advantage compared to CNN, the implementation is quite simple; the quantum part of the realisation is not even trainable, relying instead on the power of randomly initialised filters.

The QNN is constructed with the following layers in order:

  1. Quanvolutional Layer
  2. Convolutional Layer
  3. Pooling Layer
  4. Convolutional Layer
  5. Pooling Layer
  6. Fully Connected Layer
  7. Dropout Layer
  8. Fully Connected Layer

The next question regarding this realisation is how the quanvolutional layer is constructed, or more specifically, how the quanvolutional filter is constructed, as the quanvolutional layer is simply a layer containing multiple quanvolutional filters. Here is how it works: first, the kernel size is fixed at 3x3 = 9, meaning each filter has 9 qubits.

Encoding method

Each qubit is encoded corresponding to each input scalar. If the input scalar is greater than 0, it is encoded into the quantum state |1>; otherwise, it is encoded into |0>.

Note that the MNIST dataset is the target in the original paper. If you wish to use a different dataset, you might want to adjust the encoding method accordingly.

Construct the quantum circuit

The quantum circuit is constructed in the following four stages:

Assign connection probabilities

Assign a "connection probability" (ranging from 0 to 1) to each pair of qubits.

To account for the connections between one pixel and others, two-qubit gates should be applied to each pair of qubits. The connection probability is essential in determining whether a particular two-qubit gate will be applied to the pair later.

Select two-qubit gates

Based on the connection probability, one of the two-qubit gates—either the controlled-NOT, swap, square root swap or controlled-U gate—is selected for each pair of qubits. Note that the selected gate is not applied at this stage.

Select one-qubit gates

Select a random number in the range from 0 to 2n^2 = 2x3x3 = 18 of one-qubit gates from the gate set {X, Y, Z, U, P, T, H}. X, Y and Z are rotation gates around each axis. I interpret U as a generic single-qubit rotation gate. These gates have rotational parameters, which are also randomly chosen. P, T and H represent the phase, T, and Hadamard gates respectively. Again, the selected gates are not applied at this stage.

Apply the selected gates in random order

Shuffle the order of the selected gates and apply them in the shuffled order to the quantum circuit.

Decoding method

Measure each qubit and count the number of qubits that are measured in the |1> state.

Result of the comparison

As mentioned, the QNN and CNN are compared in the original paper. More precisely, both networks, along with a corresponding random model, are compared.

The authors reported that the QNN outperforms the CNN in terms of accuracy. However, the random model and the QNN are indistinguishable in terms of accuracy.

My Implementation

It is important to understand what the original paper presents. However, one of my main interests also lies in implementing the QNN. Therefore, I implemented both the QNN and CNN to compare their accuracies. Below, I will share parts of my implementation. I would greatly appreciate any feedback or comments related to my programming, either in the comments section below or via the GitHub repository issue.

I created three classes for the QNN, based on what I believe to be its core components

  • QuanvFilter (quanvolutional filter)
  • QuanvLayer (quanvolutional layer)
  • QuanvNN (quanvolutional neural network)

We will walk through the essential parts of each. Please note that I have omitted all docstrings, despite having written them, for the sake of brevity.

QuanvFilter

QuanvNN utilises member functions of QuanvLayer, and QuanvLayer relies on member functions of QuanvFilter. Therefore, we will first explore QuanvFilter, which is defined in quanv_filter.py.

Constructor

This class has only one argument for the constructor, which is the kernel size, equivalent to the window size in the "Convolutional Layer" section. As mentioned, the number of qubits is determined by the argument, as shown in the following piece of code:

    def __init__(self, kernel_size: tuple[int, int]):
        # Initialise the look-up table.
        self.lookup_table = None
        # Set the simulator.
        self.simulator = qiskit_aer.AerSimulator()

        # Get the number of qubits.
        self.num_qubits: int = kernel_size[0] * kernel_size[1]
Enter fullscreen mode Exit fullscreen mode

The self.lookup_table is used to reduce execution time, but it is not essential to understanding this class, so we will ignore it for now.

In the __init__ function, we need to prepare the quanvolutional filter. A quantum circuit representing the quanvolutional filter must be built first, as each filter is a quantum circuit.

        # Step 0: Build a quantum circuit as a filter.
        self.__build_initial_circuit()
Enter fullscreen mode Exit fullscreen mode

self.__build_initial_circuit is simply a function that builds a new quantum circuit and stores it as a member variable, self.circuit.

After building the new plain circuit, the initialisation follows the procedure mentioned in the "One Realisation of the QNN" section. Note that encoding and decoding are not done when the circuit is initialised.

        # Step 1: assign a connection probability between each qubit.
        self.connection_probabilities = {}
        self.__set_connection_probabilities()
Enter fullscreen mode Exit fullscreen mode

Here, the connection probabilities are stored in the dict variable self.connection_probabilities. The keys are tuples representing pairs of qubit positions, and the values are the probabilities. Here is a simple example: consider a circuit with 3 qubits. After running self.__set_connection_probabilities(), self.connection_probabilities could look like this:

{(0, 1): 0.5, (0, 2): 0.1, (1, 2): 0.8}
Enter fullscreen mode Exit fullscreen mode

Next, two-qubit gates are randomly selected.

        # Step 2: Select a two-qubit gate according to the connection probabilities.
        self.selected_gates = []
        self.__set_two_qubit_gate_set()
        self.__select_two_qubit_gates()
Enter fullscreen mode Exit fullscreen mode

self.__set_two_qubit_gate_set() sets the set of the two-qubit gates as described in the paper to a member variable. After setting the two-qubit gate set, the gates are selected according to self.__set_connection_probabilities in the self.__select_two_qubit_gates() function. I chose a threshold of 0.5 to determine whether a two-qubit gate is selected.

### In __select_two_qubit_gates function ###
        for qubit_pair, connection_probability in self.connection_probabilities.items():
            if connection_probability <= 0.5:
                # Skip the pair.
                pass
Enter fullscreen mode Exit fullscreen mode

If the connection probability is equal or greater than 0.5, a two-qubit gate is selected. If the selected gate requires parameters, they are chosen from a range between 0 and 6.28, for instance four_params = np.random.rand(4) * (2 * np.pi). Since each key in self.connection_probabilities is always in ascending order, such as (0, 1), (0, 2) and (1, 3), the controlled and target qubits should be shuffled.

### In __select_two_qubit_gates function ###
            # Shuffle the pair of qubits to randomly decide on the target and controlled qubits.
            shuffled_qubit_pair = [*qubit_pair]  # key is tuple. Need to cast to list.
            if connection_probability <= 0.75:
                shuffled_qubit_pair[0] = qubit_pair[1]
                shuffled_qubit_pair[1] = qubit_pair[0]
Enter fullscreen mode Exit fullscreen mode

Since the connection probability is guaranteed to be equal to or greater than 0.5, this code randomly swaps the controlled and target qubits.

Similar to selecting two-qubit gates, one-qubit gates are selected afterwards.

        # Step 3: Select one-qubit gates.
        self.__set_one_qubit_gate_set()
        self.num_one_qubit_gates = np.random.randint(
            low=0, high=2 * self.num_qubits**2 - 1
        )
        self.__select_one_qubit_gates()
Enter fullscreen mode Exit fullscreen mode

self.num_one_qubit_gates is an int member variable representing the number of one-qubit gates. The target qubit for each selected one-qubit gate is randomly chosen.

By this stage, the selected one- and two-qubit gates are stored in self.selected_gates. The next stage involves applying these gates to the circuit in a random order.

        # Step 4: Apply the randomly selected gates in an random order.
        self.__apply_selected_gates()
Enter fullscreen mode Exit fullscreen mode

Although every step to create the quanvolutional filter has been implemented, this is still a quantum circuit, and we need to explicitly add the measurement part to the circuit as the final step.

        # Step 5: Set measurements to the lot qubits.
        self.circuit.measure(self.quantum_register, self.classical_register)
Enter fullscreen mode Exit fullscreen mode

Running part

To use the quanvolutional filter, the class must include a function to process the input data.

    def run(
        self,
        data: np.ndarray,
        encoding_method: callable,
        decoding_method: callable,
        shots: int,
    ) -> int:
        # Encode the data to the corresponding quantum state.
        encoded_data = encoding_method(data)

        # Make the circuit having the loading part.
        ready_circuit = self.load_data(encoded_data)

        # Run the circuit.
        transpiled_circuit = qiskit.transpile(ready_circuit, self.simulator)
        result = self.simulator.run(transpiled_circuit, shots=shots).result()
        counts = result.get_counts(transpiled_circuit)

        # Decode the result data.
        decoded_data = decoding_method(counts)

        return decoded_data
Enter fullscreen mode Exit fullscreen mode

The return value of the function self.load_data, which is a new quantum circuit including self.circuit and initialises the qubits (according to the encoded data by the given encoding method), is an instance of qiskit.QuantumCircuit. Once the complete circuit, returned by the function, is prepared, it is executed, and the result is decoded using the given decoding method.

For now, I have implemented only one encoding and decoding method, as proposed in the paper. The details can be found in z_basis_encoder.py and one_sum_decoder.py.

Look-up table

At the end of the previous section, the core of the quanvolutional filter was implemented. However, the time required to use this filter is enormous and impractical, at least on my machine, especially for a large dataset like MNIST. To address this, I implemented a look-up table, which is also used in the original paper. It is quite simple: we create a look-up table mapping every possible input to its corresponding output. Once the table is generated for all input data, there is no need to run the circuit anymore. This technique can be applied because:

  1. The encoding method is simple, and the number of input patterns is finite.
  2. The filter is not trainable in this implementation.
    def set_lookup_table(
        self,
        encoding_method: callable,
        decoding_method: callable,
        shots: int,
        input_patterns: list[tuple[int, int] | tuple[float, float]],
    ):
        if self.lookup_table is None:
            vectorised_run = np.vectorize(self.run, signature="(n),(),(),()->()")
            output_patterns = vectorised_run(
                np.array(input_patterns),
                encoding_method,
                decoding_method,
                shots,
            )
            self.lookup_table = {
                inputs: outputs
                for inputs, outputs in zip(input_patterns, output_patterns)
            }
Enter fullscreen mode Exit fullscreen mode

What happens here is simply running the circuit against the given inputs using the given encoding and decoding methods.

QuanvLayer

The QuanvLayer class is defined in quanv_layer.py.

Constructor

The constructor has four arguments. kernel_size specifies the size of the kernels for each quanvolutional filter. num_filters defines the number of quanvolutional filters that the class contains. padding_mode determines how to pad the image so that the output size of each quanvolutional filter matches the original input size. I used the torch.nn.functional.pad function to pad the image data, so padding_mode corresponds to the mode argument of that function (see torch.nn.functional.pad). is_lookup_mode indicates whether the quanvolutional filters use the look-up tables.

     def __init__(
        self,
        kernel_size: tuple[int, int],
        num_filters: int,
        encoder: BaseEncoder,
        decoder: BaseDecoder,
        padding_mode: str | None = "constant",
        is_lookup_mode: bool = True,
    ):
        # Store the arguments to class variables.
        self.kernel_size = kernel_size
        self.num_filters = num_filters
        self.encoder = encoder
        self.decoder = decoder
        self.padding_mode = padding_mode
        self.is_lookup_mode = is_lookup_mode

        # Define constant.
        self.__BATCH_DATA_DIM = 4

        # Create the quanvolutional filters.
        self.quanv_filters = [
            QuanvFilter(self.kernel_size) for _ in range(self.num_filters)
        ]
Enter fullscreen mode Exit fullscreen mode

What happens here is essentially the creation of instances of QuanvFilter.

Running part

After preparing the quanvolutional filters, the class requires a function to process the data as a layer.

    def run(self, batch_data: torch.Tensor, shots: int) -> torch.Tensor:
        # Check the dataset shape.
        if batch_data.ndim != self.__BATCH_DATA_DIM:
            msg = f"""
                The dimension of the batch_data must be {self.__BATCH_DATA_DIM},
                which is [batch size, channel, height, width].
            """
            raise ValueError(msg)

        # Set the appropriate function according to the mode.
        if self.is_lookup_mode:
            # Get all possible input patterns.
            possible_inputs = self.encoder.get_all_input_patterns(
                num_qubits=self.kernel_size[0] * self.kernel_size[1]
            )
            # Set each look-up table.
            [
                quanv_filter.set_lookup_table(
                    encoding_method=self.encoder.encode,
                    decoding_method=self.decoder.decode,
                    shots=shots,
                    input_patterns=possible_inputs,
                )
                for quanv_filter in self.quanv_filters
            ]
            _run = self.run_single_channel_with_lookup_tables
        else:
            _run = lambda data: self.run_single_channel(data=data, shots=shots)

        all_outputs = torch.stack(
            [
                _run(data=channel)
                for data in tqdm(
                    batch_data, leave=True, desc="Dataset"
                )  # for-loop for batched data
                for channel in data  # for-loop for each channel of each data
            ]
        )

        return all_outputs
Enter fullscreen mode Exit fullscreen mode

This function is quite simple. After checking the data shape and whether the look-up tables are being used, it applies each filter to the given data. The return value, all_outputs, is constructed from the processed image data. Although the actual running part, either self.run_single_channel_with_lookup_tables or self.run_single_channel, is the main part responsible for processing the input data, the procedure is quite similar to that of a classical convolutional layer, except that it uses a quanvolutional filter instead of a classical convolutional filter. Therefore, we will not delve into the function in this article.

QuanvNN

The QuanvNN class is written in quanv_nn.py.

Consctuctor

The class has six arguments for the constructor.

    def __init__(
        self,
        in_dim: tuple[int, int, int],
        num_classes: int,
        quanv_kernel_size: tuple[int, int],
        quanv_num_filters: int,
        quanv_encoder: BaseEncoder,
        quanv_decoder: BaseDecoder,
        quanv_padding_mode: str | None = "constant",
        is_lookup_mode: bool = True,
    ):
        self.in_dim = in_dim
        self.num_classes = num_classes
        self.quanv_kernel_size = quanv_kernel_size
        self.quanv_num_filters = quanv_num_filters
        self.quanv_encoder = quanv_encoder
        self.quanv_decoder = quanv_decoder
        self.quanv_padding_mode = quanv_padding_mode
        self.is_lookup_mode = is_lookup_mode

        # Create and store the instance of the QuanvLayer class as a member variable.
        self.quanv_layer = QuanvLayer(
            kernel_size=quanv_kernel_size,
            num_filters=quanv_num_filters,
            encoder=self.quanv_encoder,
            decoder=self.quanv_decoder,
            padding_mode=quanv_padding_mode,
            is_lookup_mode=is_lookup_mode,
        )
        # Create and store the instance of the ClassicalCNN class as a member variable.
        new_in_dim = (quanv_num_filters, in_dim[1], in_dim[2])
        self.classical_cnn = ClassicalCNN(in_dim=new_in_dim, num_classes=num_classes)
Enter fullscreen mode Exit fullscreen mode

in_dim and num_classes are used to construct the CNN part, which is implemented using PyTorch and is not our primary focus here. The quanv_kernel_size, quanv_num_filters, quanv_padding_mode, quanv_encoder, quanv_decoder and is_lookup_mode arguments are used to create an instance of QuanvLayer.

Running part

What is essentially needed in the class is the method to forward the input data. To achieve this, I implemented __call__ and classify, where __call__ is a special method that allows you to use the instance like a function. The __call__ function outputs the raw output data from the QNN whilst the classify function returns a scalar value representing the class label to which the input data is most likely to belong.

    def __call__(self, x: torch.Tensor, shots: int) -> torch.Tensor:
        quanvoluted_x = self.quanv_layer.run(batch_data=x, shots=shots)
        return self.classical_cnn(quanvoluted_x)

    def classify(self, x: torch.Tensor, shots: int) -> torch.Tensor:
        quanvoluted_x = self.quanv_layer.run(batch_data=x, shots=shots)
        return self.classical_cnn.classify(quanvoluted_x)
Enter fullscreen mode Exit fullscreen mode

self.quanv_layer is an instance of the QuanvLayer class. Both functions contain self.quanv_layer.run, which was introduced in an earlier section.

The key takeaway is that these functions allow us to classify or simply obtain an output from the input data.

Other sources and scripts

I have also written some other Python scripts to train the model using MNIST data. We will not go through each source in detail, but if you want to train the model, all you need to do is run train_model_with_mnist.py. The script requires one argument, which is the path to the configuration file. Examples of configuration files can be found in the configs directory. Simply run python scripts/train_model_with_mnist.py -c [config_path] from the root directory.

Summary

We have had a brief look at what the quanvolutional neural network (QNN) is. Essentially, it is a convolutional neural network with a quanvolutional layer, where the concept of the quanvolutional layer is a complete alternative to the classical convolutional layer.

The introduced implementation of the QNN is readily applicable to the MNIST dataset. However, in my experience, the accuracy of the QNN model has not proven to be superior to that of the corresponding CNN.

Accuracy of QNN vs CNN

The original paper does not provide exhaustive details about the experiment, and there is some flexibility in the training algorithms and settings. I suspect that the reason my results differ from those in the original paper may lie in the differences between our configurations.

Once again, feel free to leave any comments in the section below or raise an issue in the GitHub repository. I welcome your feedback and the opportunity to discuss the QNN or the Python code.

Top comments (0)