DEV Community

Cover image for Adversarial Examples - Non-Targeted Evasive Attacks
karapto
karapto

Posted on • Edited on

Adversarial Examples - Non-Targeted Evasive Attacks

OVERVIEW

In this article, I practice non-targeted attacks with Adversarial Examples.
Adversarial Examples are attacks that induce AI misclassification, deliberately changing features by adding minute changes called perturbations to the original data.
These perturbations are too small to be captured by the human eye, and it is very difficult to distinguish adversarial examples from the data.
There are two types of Adversarial Examples: untargeted, which misclassifies to an arbitrary class, and targeted, which misclassifies to a specific class.

ADVANCE PREPARETION

Here, it is necessary to import some library.
In this article, I will build image classifier model with keras.

#Install ART
#ART is The Adversarial Robustness Toolbox (ART) is a Python library for AI security.
With ART, you can validate attack methods against the AI (e.g., Adversarial Examples, Data Poisoning, Model Extraction, Membership Inference, etc.) and defense methods against them. In order to protect AI from attacks, it is necessary to understand the attack mechanisms and appropriate defense methods. Therefore, in this column, I will learn techniques to secure AI through ART!

!pip3 install adversarial-robustness-toolbox

#Import Library
import random
import numpy as np
import matplotlib.pyplot as plt

# TensorFlow with Keras.
import tensorflow as tf
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Input, Dense, Flatten, Conv2D
from tensorflow.keras.layers import MaxPooling2D, GlobalAveragePooling2D, Dropout
tf.compat.v1.disable_eager_execution()

# ART
import art
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import KerasClassifier
Enter fullscreen mode Exit fullscreen mode

DATASET

As target dataset, I use CIFAR10 dataset.CIFAR-10 is an established computer-vision dataset used for object recognition. It is a subset of the 80 million tiny images dataset and consists of 60,000 32x32 color images containing one of 10 object classes, with 6000 images per class. It was collected by Alex Krizhevsky, Vinod Nair, and Geoffrey Hinton.

Load CIFAR10
(X_train, y_train), (X_test, y_test) = tf.keras.datasets.cifar10.load_data()

# Get CIFAR10 Label
classes = ['airplane', 'automobile', 'bird', 'cat', 'deer', 'dog', 'frog', 'horse', 'ship', 'truck']
num_classes = len(classes)

#Visualize Dataset 
show_images = []
for _ in range(5 * 5):
    show_images.append(X_train[random.randint(0, len(X_train))])

for idx, image in enumerate(show_images):
    plt.subplot(5, 5, idx + 1)
    plt.imshow(image)

# Check Dataset Size
print(X_train.shape, y_train.shape)
Enter fullscreen mode Exit fullscreen mode

Alt Text
(50000, 32, 32, 3) (50000, 1)

PREPROCESSING

Normalize dataset, and convert each label to One-hot-vector.

#Normalization
X_train = X_train.astype('float32') / 255
X_test = X_test.astype('float32') / 255

# Convert label to One-hot-vector
y_train = tf.keras.utils.to_categorical(y_train, num_classes)
y_test = tf.keras.utils.to_categorical(y_test, num_classes)
Enter fullscreen mode Exit fullscreen mode

BUILD TARGET MODEL

Next,I build target image classifier. In this article, I define Convolution Neural Network(CNN).

# Model Definition
inputs = Input(shape=(32, 32, 3))
x = Conv2D(64, (3, 3), padding='SAME', activation='relu')(inputs)
x = Conv2D(64, (3, 3), padding='SAME', activation='relu')(x)
x = Dropout(0.25)(x)
x = MaxPooling2D()(x)

x = Conv2D(128, (3,3), padding='SAME', activation='relu')(x)
x = Conv2D(128, (3,3), padding='SAME', activation='relu')(x)
x = Dropout(0.25)(x)
x = MaxPooling2D()(x)

x = Conv2D(256, (3,3), padding='SAME', activation='relu')(x)
x = Conv2D(256, (3,3), padding='SAME', activation='relu')(x)
x = GlobalAveragePooling2D()(x)

x = Dense(1024, activation='relu')(x)
x = Dropout(0.25)(x)
y = Dense(10, activation='softmax')(x)

model = Model(inputs, y)

# MODEL COMPILE
model.compile(optimizer=tf.keras.optimizers.Adam(),
              loss='categorical_crossentropy',
              metrics=['accuracy'])
model.summary()
Enter fullscreen mode Exit fullscreen mode

TRAIN MODEL

Next, Train image classifier with X_train and y_train. Then I recommend you train dataset with GPU.

model.fit(X_train, y_train,
          batch_size=512,
          epochs=30,
          validation_data=(X_test, y_test),
          shuffle=True)
Enter fullscreen mode Exit fullscreen mode

EVALUATION ACCURACY

Use the test data X_test to evaluate the inference accuracy of the created image classifier. If you implement below code, you can find accuracy is about 80%.

predictions = model.predict(X_test)
accuracy = np.sum(np.argmax(predictions, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print('Accuracy on benign test example: {}%'.format(accuracy * 100))
Enter fullscreen mode Exit fullscreen mode

Accuracy on benign test example: 81.11%
Done.

CREATE ADVERSARIAL EXAMPLES

In order to create an Adversarial Examples in ART, the target classifier must be wrapped in the wrapper class provided by ART. In this article, I use KerasClassifier because I has already used keras in building model.

#Set min and max about feature of input data
#Features are normalized to fit into the range of 0.0 to 1.0, so that the minimum value is 0.0 and the maximum value is 1.0.
min_pixel_value = 0.0
max_pixel_value = 1.0

#Wrap model with ART Keras Classifier
classifier = KerasClassifier(model=model, clip_values=(min_pixel_value, max_pixel_value), use_logits=False)
Enter fullscreen mode Exit fullscreen mode

IMPLEMENTE FGSM

You can create an Adversarial Examples simply by specifying the necessary arguments in the FastGradientMethod argument.
The success rate of the attack (the probability of inducing misclassification) increases in proportion to the value of the eps specified in the second argument, but the image will look unnatural (because of the increased noise).
Therefore, there is a trade-off between the success rate of the attack and the visual naturalness of Adversarial Examples.

I use the generate method of the FGSM instance to generate a Adversarial Examples.

#Create FGSM Instance
attack = FastGradientMethod(estimator=classifier, eps=0.1)

#Create Adversarial Examples
X_test_adv = attack.generate(x=X_test)
Enter fullscreen mode Exit fullscreen mode

IMPLEMENTE PREDICTION

all_preds = model.predict(X_test_adv)
accuracy = np.sum(np.argmax(all_preds, axis=1) == np.argmax(y_test, axis=1)) / len(y_test)
print('Accuracy on Adversarial Exmaples: {}%'.format(accuracy * 100))
Enter fullscreen mode Exit fullscreen mode

Accuracy on Adversarial Exmaples: 20.349999999999998%
Done.

Compared to the case of normal test data, I can see that the inference accuracy is significantly lower.
I can infer from this result that misclassification is induced by the hostile sample.

Next, I visually check the inference results of the normal and hostile samples.
First, I infer normal data;

# [1-13]
# Show normal data
target_index = 2
plt.imshow(X_test[target_index])

#Predict normal data
pred = model.predict(X_test[target_index][np.newaxis, ...])

#Show prediction result
print('True label: "{}"\nPrediction: "{}"'.format(classes[np.argmax(y_test[target_index])], classes[np.argmax(pred)]))
Enter fullscreen mode Exit fullscreen mode

Alt Text

The image displayed is the normal data entered into the image classifier.
Presumably, the actual label True label and the image classifier's Prediction label Prediction are identical.

I then infer a hostile sample.

# [1-14]
# Show Adversarial Examples
plt.imshow(X_test_adv[target_index])

# Predict Adversarial Examples
pred_adv = model.predict(X_test_adv[target_index][np.newaxis, ...])

# Show prediction result
print('True label: "{}"\nPrediction: "{}"'.format(classes[np.argmax(y_test[target_index])], classes[np.argmax(pred_adv)]))
Enter fullscreen mode Exit fullscreen mode

Alt Text

The image displayed is a hostile sample entered into the image classifier.
The image is noisy due to perturbations, but it looks like the image shown by the true label.
However, I don't think the actual True label and the image classifier's Prediction label Prediction are identical.

CONCLUSION

In this hands-on, entitled "Part 1: Non-Targeted Evasive Attacks," I used ART to create an adversarial sample that is misclassified as an arbitrary class.
I found that ART allows us to create adversarial samples with a small amount of code.

Top comments (0)