DEV Community

Kamolchanok Saengtong
Kamolchanok Saengtong

Posted on

Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures

Hello y'all, I'm back again in 2026πŸ”₯πŸ”₯

Last Wednesday I just had the opportunity to join in the special talk about Deep Learning Security with Anadi Goyal who's the talented research assistant from IT Guwahati under the topic:

"Adversarial Attacks and Defenses in Deep Learning Systems: Threats, Mechanisms, and Countermeasures"

In this special talk, he mainly focused about the potential threat or vulnerability and mechanisms that the attackers could use to attack the machine learning model in deep learning systems. At the same time, we also learned how to defend against these attacks and explored various countermeasures we could use to handle such potential threats.

This topic is especially interesting and important in the AI era where the machine learning model is becoming the prime targets for the attackers to tamper with them..

Ok...technically, for this post session, we will learn about how to be both attacker (mechanism for attacking the ML model) and learn to be the defender (countermeasure the threat) at the same time!!!!!πŸ”₯

I swear this topic is so interesting and also beginner friendly which is suitable for anyone who's just started to learn about ML and deep learning security and as a 3rd year cybersecurity student I would like to share what I have learnt from this special talk to y'all.

So, let's study together from this post!!!

But first of all let's review some definitions of necessary phase related to ML and deep learning security briefly before we getting to the main point

AI (Artificial Intelligence) vs ML (Machine Learning) vs DL (Deep Learning)πŸ’€πŸ§

First AL (Artificial Intelligence) eg. chatGPT, Claude, Gemini, etc.

  • This refers to the any method and technique that allows the computers to mimics human behavior to decide before making any decision.

ML (Machine Learning)

  • ML is the application of AI which means it' also the subset of AI if we talk about the goal to make our machines or computers to be smart (AI's goal) we also have to talk about the technique or any method to make that AI be as smart as expect and that method often related to machine learning (ML) which refers to the ability of the machines or computers to learn from the data without fully or explicitly programmed for the specific task or new scenario.

What's about deep learning???? Is it the same as ML?
Technically, deep learning is the subset of ML which means it's also the ML technique that makes the machine learn from the data but deep learning is more advanced than that it utilize neural network with multi layer to solve the complex problems and extract the pattern from the data to make machine learn better. eg MLP (Multi layer Perceptron), CNNs (Convolutional Neural Networks), ViTs (Vision Transformers) etc.

Ok now we already learnt about some basic background in AI world now our next question is... is there any potential threat that the attackers could do with it?
If our goal is to make our machines be smart and make the accurate prediction for any tasks by learning the pattern from data without being programmed fully for any new scenarios..
what if someone try to make it stupid by making our machine misclassify or wrong prediction???
this is real threat that likely to happen for sure, right?
Let's talk about the basic example of image classification
As we know the model doesn't see the picture as human see, the pictures are often seen as the pixel number by the computer and after being trained for multiple times until it learn, we can easily extract some pattern from data to classify something.

However this only works with the clean the data, if someone add some noise to the the image, the machine could be so confused and misclassify the image into something else.

For example,
before adding the noise
input: Cat (model predicts) -> Cat
after adding the noise
input: Cat (model predicts) -> Ducky (lmaoo)

This must look simple and often dangerous because what if they utilize this into the critical systems like automated driving system...what could be the worst...

So, basically the goal of attacker often to be make the machine confused to make wrong decision by creating x' that looks identical to human's eye (x) but somehow adding some noise in it that human couldn't see but leads the machine to misclassify...

that x' called "adversarial examples"

Now let's talk the technique that attacker utilize to create x':

FGSM: Fast Gradient Sign Method
Add noise in the same direction of gradient of the loss functionπŸ’€
(In other words, pixel values are either increased or decreased in a way that maximizes the model’s prediction errorπŸ”₯).

For technical details, let's see the formula (of course, I copied from my teacher's slide)

x' = x + Ξ΅ Β· sign( βˆ‡β‚“ L(f(x), y) )
Enter fullscreen mode Exit fullscreen mode

x = clean input (original data)
Ξ΅ = noise
sign( βˆ‡β‚“ L(f(x), y) ) = the sign of gradient of the loss function

which means for any clean input you add some noise to the data in the direction that could minimize the loss value (error).

Key concept: only implement in 1 step or 1 time

PGD: Projected Gradient Descent (iterative, stronger)
Instead of doing one big move like FGSM, PGD takes many small steps in the direction that increases the error, repeating the process (taking small step many times) until the attack becomes strong enough (reach the maximize of loss value).

x(t+1) = Clip_Ξ΅ { x(t) + Ξ± Β· sign( βˆ‡β‚“ L(f(x(t)), y) ) }
Enter fullscreen mode Exit fullscreen mode

Deepfool
In Deepfool method, it's different from other method a bit because you only try to find the smallest noise that could lead the model into misclassification. Which means the adversary image stills look like a cat to you even though the computer already see it as the donkey -_- (so it's harder for human to detect the noise).

Anyways instead of using these kind of methods, you can just use a visible patch like a watermark or something into one place on the picture, if that patch is designed well enough that can become the main or important region of the image. No matter what image you predict, the model will misclassify anyways since it's manipulated by that patch or watermark. This technique called "Adversarial Patch Attacks"

Now, we already have learnt about some adversary technique that the attacker could use to tamper the machine learning or deep learning model.

Let's talk about how we can defend this.πŸ˜‘πŸ”₯
(For this section we will talk about how to defend the ViT (vision transformer) from adversarial patch attacks.

Our main goal is to find the adversary patch area inside the image to detect and reduce the affect the could happen from that patch!

how...

First, ViT (Vision transformer) doesn't see the image as the sequence of pixel like the traditional deep learning method like CNNs, instead, it sees the image as the sequence of patch by splitting each part of image to be a patch where each patch (1 patch) = 1 token:

After that we could detect by using entropy by finding which part of patch inside the image has the high level of entropy (which signal to high randomness). This can be solved by utilizing STRAP-ViT Adversarial Defense Framework, let's see how we could use this in real defense,

This methods refers to the paper: "STRAP-ViT: Segregated Tokens with
Randomized - Transformations for Defense against Adversarial Patches in ViTs"

  • First, we insert the adversarial patch (the image that's included with the adversary patch) into the STRAP-ViT framework.
  • After that, we get into ViT pre-processing where it splits each part of the photo into the patch or the token.
  • After being splitted into the token or patch, we can now detect the malicious patch by measuring the entropy (utilizing jensen shannon divergence) where high entropy which indicate the high level of randomness.
  • Select the suspicious or malicious patch.
  • Mitigate the malicious patch by utilizing transformation method.
  • After all tokens/patches are clean we will forward into the transformation block and MLP head to make the model truly learn and classify from the real data and make decision.
  • Now we can classify the model correctly!!!

Cat is Cat
No donkey anymore !!!!!!!

Now we have learnt all about the DL attack and DL defense in real world application (which is so much fun and easily to understand, right?......)

Since this post, I only explained about the basic core concept in DL attack and defenses. For technical details, I recommend y'all to read the recommendation paper above for deeper insight!!!!

Now this session has ended... finally...
See y’all again!! Let’s see what the next post will be about...
😏

Top comments (0)