The 2D Fast Fourier Transform (FFT) is a powerful tool in image processing, often used for tasks like denoising. It works by decomposing an image into its fundamental frequency components—essentially, a collection of simple sine waves.
I always understood the decomposition part, but it led me to a question: can we reverse the process? Can we perfectly reconstruct the original image just by adding all those frequency components back together? To answer this, I built an app to see it for myself.
In This Article
- Overview
- FFT Part
- The GUI: Visualizing the Reconstruction
- Conclusion
- The Final Result in Action
- Check Out the Full Code on GitHub!
- Thanks for Reading!
Overview
This application consists of two main parts: an FFT module and a GUI. The former decomposes the original image, and the latter visualizes the entire process.
FFT Part
First, we perform an FFT shift to move the zero-frequency component to the center of the image. The Python function below then takes this shifted data and sorts all the frequency components. It does this by calculating each component's distance from the center, which corresponds to its frequency (from low to high).
Crucially, we also store the original (x, y)
coordinates for each component. We'll need these to place everything back in the correct position during the re-synthesis phase.
fft_result = fft2(img)
def get_sorted_freq_components(self, fft_shifted: np.ndarray) -> List[Dict]:
h, w = fft_shifted.shape
center_x, center_y = h // 2, w // 2
freq_components = []
for y in range(h):
for x in range(w):
distance = np.sqrt((y - center_y)**2 + (x - center_x)**2)
freq_components.append({
"distance": distance,
"value": fft_shifted[y, x],
"y": y,
"x": x
})
freq_components.sort(key = lambda item: item["distance"])
return freq_components
The GUI: Visualizing the Reconstruction
The main feature of the GUI is a display that updates in real-time as we reconstruct the image. The idea is simple: in a loop, we add one frequency component at a time (from low to high) and update the image view with the result, creating an animation.
The First Hurdle: A Black Screen
However, my first attempt didn't work as expected. When I passed the NumPy array from the inverse FFT process directly to PySide6's QImage
, all I got was a black screen. The image simply wouldn't display correctly.
def _create_scaled_pixmap(self, img: np.ndarray, frame: QFrame) -> QPixmap:
h, w = img.shape
bytes_per_line = w
q_image = QImage(img, w, h, bytes_per_line, QImage.Format.Format_Grayscale8)
pixmap = QPixmap.fromImage(q_image.copy())
return pixmap.scaled(
frame.size(),
Qt.AspectRatioMode.KeepAspectRatio,
Qt.TransformationMode.SmoothTransformation
)
The "Aha!" Moment: Data Mismatch
After some debugging, I realized the issue was a data type mismatch. QImage
with the format Format_Grayscale8
expects a very specific input: a NumPy array of 8-bit unsigned integers (uint8
) with values in the 0-255 range.
My array, which was the result of the inverse FFT, was an array of floats with a completely different scale (e.g., from -50.0 to 3000.0). QImage
didn't know how to interpret these float values as grayscale pixels, resulting in the black screen.
The Fix: Normalization is Key
To solve this, I had to add a pre-processing step. Before creating the QImage
, the function now checks if the input array is the correct uint8
type. If it's not, it normalizes the array—scaling its values to the proper 0-255 range—and then converts its data type.
This ensures the data is always in a format that QImage
can understand and display correctly.
def _create_scaled_pixmap(self, img: np.ndarray, frame: QFrame) -> QPixmap:
if img.dtype != np.uint8:
# Normalize float image to 0-255 range and convert to uint8
min_val, max_val = np.min(img), np.max(img)
if min_val == max_val:
img_norm = np.zeros_like(img)
else:
img_norm = (img - min_val) / (max_val - min_val)
img = (255 * img_norm).astype(np.uint8)
if not img.flags['C_CONTIGUOUS']:
img = np.ascontiguousarray(img)
h, w = img.shape
bytes_per_line = w
q_image = QImage(img, w, h, bytes_per_line, QImage.Format.Format_Grayscale8)
pixmap = QPixmap.fromImage(q_image.copy())
return pixmap.scaled(
frame.size(),
Qt.AspectRatioMode.KeepAspectRatio,
Qt.TransformationMode.SmoothTransformation
)
Conclusion
And there we have it! By building this simple application, we not only visualized the fascinating process of the Fourier Transform but also learned a valuable lesson in debugging. The biggest takeaway for me was realizing how crucial data types are when passing NumPy arrays to GUI frameworks like PySide6. That "black screen" moment taught me that normalization isn't just a theoretical concept, but a practical necessity.
The Final Result in Action
Here is one more look at our application, successfully reconstructing an image from a sea of frequencies, one wave at a time.
Check Out the Full Code on GitHub!
I've posted the entire source code for this application on my GitHub repository. Feel free to clone it, run it yourself, and experiment with your own images!
If you found this article or the project helpful, please consider leaving a star ⭐️ on the repository. It would make my day!
Thanks for Reading!
What other mathematical concepts do you think would be cool to visualize in an app like this? Let me know your ideas in the comments below!
Top comments (0)