Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

#machinelearning #webdev #ai #programming

After listening to the presentation on adversarial attacks, I feel like I understand AI much better than before. Even though AI looks very powerful and is used in many areas such as self-driving cars, medical image analysis, and computer vision systems, it actually has some important weaknesses. The main issue is that AI does not truly “understand” what it sees like humans do. Instead, it processes everything as numbers, like pixel values in an image, which are then treated as vectors. The model learns patterns from this numerical data, not meaning. Because of this, even a very small change in the input—something humans cannot even notice—can shift the data enough to change the model’s decision completely. This is what we call adversarial perturbation.

A good example is the Panda experiment, where adding a tiny amount of noise to an image causes the model to classify a panda as a gibbon, and even with higher confidence than before. This clearly shows that the model is not really understanding the object, but just reacting to patterns it has learned. There are also specific techniques used to create these adversarial examples. For example, FGSM uses the gradient of the loss function to find a direction that increases the model’s error, and then adds a small change in that direction. PGD is a stronger method because it repeats this process multiple times, making the attack more effective and harder for the model to resist.

Another interesting point from the presentation is that these attacks are not limited to digital images but can also happen in the real world. For instance, adversarial patches—special stickers designed to trick AI—can cause the model to focus on the wrong features. Instead of recognizing the real object, the model may pay more attention to the patch, leading to incorrect classification. In some cases, it can even fail to detect a person completely. This becomes a serious problem in real-world systems like CCTV, autonomous vehicles, and medical applications, where even small errors can lead to dangerous consequences.

Although modern models like CNNs and Vision Transformers (ViTs) have improved performance and can learn more complex patterns, they are still vulnerable to these kinds of attacks. The presentation also discussed several defense methods, such as adversarial training, anomaly detection, and data purification using generative models. However, each method has its own limitations, such as requiring high computational resources or reducing accuracy on normal data. This means that there is still no perfect solution to fully protect AI systems from adversarial attacks.

Overall, this presentation changed my perspective on AI quite a lot. Before this, I thought AI was very reliable and almost “intelligent” in the human sense. But now I see that it still has major limitations, especially in terms of understanding and robustness. In my opinion, AI is powerful, but it cannot be fully trusted yet, especially in safety-critical applications. There is still a lot of work needed to make these systems more robust and reliable in the future.

Summarized by
Miss Salinthip Keereerat
Student ID: 6630613037

DEV Community

Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

Top comments (0)