DEV Community

Cover image for If We Break an Image Into Waves, Can We Truly Put It Back Together?
AngyoKosaka
AngyoKosaka

Posted on

If We Break an Image Into Waves, Can We Truly Put It Back Together?

The 2D Fast Fourier Transform (FFT) is a powerful tool in image processing, often used for tasks like denoising. It works by decomposing an image into its fundamental frequency components—essentially, a collection of simple sine waves.

I always understood the decomposition part, but it led me to a question: can we reverse the process? Can we perfectly reconstruct the original image just by adding all those frequency components back together? To answer this, I built an app to see it for myself.

In This Article

Overview

This application consists of two main parts: an FFT module and a GUI. The former decomposes the original image, and the latter visualizes the entire process.

FFT Part

First, we perform an FFT shift to move the zero-frequency component to the center of the image. The Python function below then takes this shifted data and sorts all the frequency components. It does this by calculating each component's distance from the center, which corresponds to its frequency (from low to high).

Crucially, we also store the original (x, y) coordinates for each component. We'll need these to place everything back in the correct position during the re-synthesis phase.

fft_result = fft2(img)

def get_sorted_freq_components(self, fft_shifted: np.ndarray) -> List[Dict]:
    h, w = fft_shifted.shape
    center_x, center_y = h // 2, w // 2

    freq_components = []
    for y in range(h):
        for x in range(w):
            distance = np.sqrt((y - center_y)**2 + (x - center_x)**2)
            freq_components.append({
                "distance": distance,
                "value": fft_shifted[y, x],
                "y": y,
                "x": x
            })

    freq_components.sort(key = lambda item: item["distance"])
    return freq_components
Enter fullscreen mode Exit fullscreen mode

The GUI: Visualizing the Reconstruction

The main feature of the GUI is a display that updates in real-time as we reconstruct the image. The idea is simple: in a loop, we add one frequency component at a time (from low to high) and update the image view with the result, creating an animation.

The First Hurdle: A Black Screen

However, my first attempt didn't work as expected. When I passed the NumPy array from the inverse FFT process directly to PySide6's QImage, all I got was a black screen. The image simply wouldn't display correctly.

def _create_scaled_pixmap(self, img: np.ndarray, frame: QFrame) -> QPixmap:
    h, w = img.shape
    bytes_per_line = w
    q_image = QImage(img, w, h, bytes_per_line, QImage.Format.Format_Grayscale8)
    pixmap = QPixmap.fromImage(q_image.copy())

    return pixmap.scaled(
        frame.size(),
        Qt.AspectRatioMode.KeepAspectRatio,
        Qt.TransformationMode.SmoothTransformation
    )
Enter fullscreen mode Exit fullscreen mode

The "Aha!" Moment: Data Mismatch

After some debugging, I realized the issue was a data type mismatch. QImage with the format Format_Grayscale8 expects a very specific input: a NumPy array of 8-bit unsigned integers (uint8) with values in the 0-255 range.

My array, which was the result of the inverse FFT, was an array of floats with a completely different scale (e.g., from -50.0 to 3000.0). QImage didn't know how to interpret these float values as grayscale pixels, resulting in the black screen.

The Fix: Normalization is Key

To solve this, I had to add a pre-processing step. Before creating the QImage, the function now checks if the input array is the correct uint8 type. If it's not, it normalizes the array—scaling its values to the proper 0-255 range—and then converts its data type.

This ensures the data is always in a format that QImage can understand and display correctly.

def _create_scaled_pixmap(self, img: np.ndarray, frame: QFrame) -> QPixmap:
    if img.dtype != np.uint8:
        # Normalize float image to 0-255 range and convert to uint8
        min_val, max_val = np.min(img), np.max(img)
        if min_val == max_val:
            img_norm = np.zeros_like(img)
        else:
            img_norm = (img - min_val) / (max_val - min_val)
        img = (255 * img_norm).astype(np.uint8)

    if not img.flags['C_CONTIGUOUS']:
        img = np.ascontiguousarray(img)

    h, w = img.shape
    bytes_per_line = w
    q_image = QImage(img, w, h, bytes_per_line, QImage.Format.Format_Grayscale8)
    pixmap = QPixmap.fromImage(q_image.copy())

    return pixmap.scaled(
        frame.size(),
        Qt.AspectRatioMode.KeepAspectRatio,
        Qt.TransformationMode.SmoothTransformation
    )
Enter fullscreen mode Exit fullscreen mode

Conclusion

And there we have it! By building this simple application, we not only visualized the fascinating process of the Fourier Transform but also learned a valuable lesson in debugging. The biggest takeaway for me was realizing how crucial data types are when passing NumPy arrays to GUI frameworks like PySide6. That "black screen" moment taught me that normalization isn't just a theoretical concept, but a practical necessity.

The Final Result in Action

Here is one more look at our application, successfully reconstructing an image from a sea of frequencies, one wave at a time.

fft_reconstruction.gif

Check Out the Full Code on GitHub!

I've posted the entire source code for this application on my GitHub repository. Feel free to clone it, run it yourself, and experiment with your own images!

➡️ My_GitHub_Repository

If you found this article or the project helpful, please consider leaving a star ⭐️ on the repository. It would make my day!

Thanks for Reading!

What other mathematical concepts do you think would be cool to visualize in an app like this? Let me know your ideas in the comments below!

Top comments (0)