ใช้ AI ในการสร้างผลงานศิลปะ โดยการใช้ Neural Transfer Style

Neural Transfer Style เป็นหนึ่งในแอปพลิเคชั่นที่น่าทึ่งที่สุดของปัญญาประดิษฐ์ในบริบทที่สร้างสรรค์ ในโปรเจกต์นี้ วิธีการถ่ายทอดสไตล์การวาดภาพศิลปะไปยังรูปภาพที่เลือก ซึ่งจะได้ผลลัพท์ที่น่าทึ่ง นักวิจัยจำนวนมากได้ใช้และมีการปรับปรุงวิธีการนี้ โดยเพิ่มองค์ประกอบให้กับloss

มันทำงานอย่างไร?

เป้าหมายของเทคนิคนี้คือการนำสไตล์ของภาพ ซึ่งเราจะเรียกว่า “สไตล์อิมเมจ” ไปใช้กับภาพเป้าหมาย โดยคงเนื้อหาของภาพไว้

Style คือลักษณะพื้นผิวและรูปแบบภาพในรูปภาพ เช่น ฝีแปรงของศิลปิน การระบายสี เป็นต้น
Content คือโครงสร้างของรูปภาพ เช่น ผู้คน อาคาร วัตถุ เป็นตัวอย่างของรูปภาพ

ตัวอย่าง
https://miro.medium.com/v2/resize:fit:1400/1*tBCPZpaGYX7gTobCMpAp_Q.gif

การประมวลผลและการลดการประมวลผลภาพ

ก่อนอื่น ต้องฟอร์แมตรูปภาพเพื่อใช้ในnetwork ที่จะใช้คือ Convnet VGG19 ที่ผ่านการ pre-trained มาแล้ว ในขณะที่ประมวลผลภาพเป็นอาร์เรย์ที่เข้ากันได้ จำเป็นต้องยกเลิกการประมวลผลภาพที่ได้ โดยเปลี่ยนจากรูปแบบ BGR เป็น RGB
สร้างฟังก์ชันเสริมสองฟังก์ชันเพื่อเปลี่ยนจากรูปแบบ BGR เป็น RGB :


#ประมวลผลภาพล่วงหน้าเพื่อให้เข้ากันได้กับโมเดล VGG19
def preprocess_image(image_path):
    img = load_img(image_path, target_size=(resized_width, resized_height))
    img = img_to_array(img)
    img = np.expand_dims(img, axis=0)
    img = vgg19.preprocess_input(img)
    return img

# ฟังก์ชั่นแปลงเทนเซอร์เป็นภาพ
def deprocess_image(x):
    x = x.reshape((resized_width, resized_height, 3))

    # ลบศูนย์ศูนย์ด้วยพิกเซลเฉลี่ย จำเป็นเมื่อทำงานกับโมเดล VGG
    x[:, :, 0] += 103.939
    x[:, :, 1] += 116.779
    x[:, :, 2] += 123.68

    # ฟอร์แมต BGR->RGB
    x = x[:, :, ::-1]
    x = np.clip(x, 0, 255).astype('uint8')
    return x

Content loss

Content loss จะรักษาเนื้อหาของรูปภาพอินพุตหลักตามสไตล์ เนื่องจากเลเยอร์ที่สูงกว่าของ Convolutional Neural Network มีข้อมูลของโครงสร้างของรูปภาพ จึงคำนวณการสูญเสียเนื้อหาตามความแตกต่าง (l2 normalization) ระหว่างเอาต์พุตของเลเยอร์สูงสุดของรูปภาพอินพุตและเลเยอร์เดียวกันของรูปภาพที่สร้างขึ้น
Content loss ถูกกำหนดเป็น:

ในสมการ F คือการแสดงคุณสมบัติของรูปภาพเนื้อหา (สิ่งที่เครือข่ายส่งออกเมื่อเราเรียกใช้รูปภาพอินพุตของเราผ่าน) และ P คือหนึ่งในรูปภาพที่สร้างขึ้นที่เลเยอร์ลับเฉพาะ l
การใช้งาน:

# Content loss จะรักษาคุณลักษณะของรูปภาพเนื้อหาไว้ในรูปภาพที่สร้างขึ้น
def content_loss(layer_features):
    base_image_features = layer_features[0, :, :, :]
    combination_features = layer_features[2, :, :, :]
    return K.sum(K.square(combination_features - base_image_features))

Style loss

การทำความเข้าใจเกี่ยวกับ Style loss นั้นไม่ตรงไปตรงมาเท่ากับContent loss เป้าหมายคือการรักษาสไตล์ของภาพ (เช่น ลวดลายที่มองเห็นเหมือนการตวัดพู่กัน) ในภาพที่สร้างขึ้นใหม่ ในกรณีก่อนหน้านี้ เปรียบเทียบผลลัพธ์ดิบของเลเยอร์ระดับกลาง ที่นี่ เเปรียบเทียบความแตกต่างระหว่างเมทริกซ์ Gram ของเลเยอร์เฉพาะสำหรับรูปภาพอ้างอิงสไตล์และรูปภาพที่สร้างขึ้น Gram matrix ถูกกำหนดให้เป็นผลคูณภายในระหว่างแผนที่คุณสมบัติ vectorized ของเลเยอร์ที่กำหนด ความหมายของเมทริกซ์คือการจับความสัมพันธ์ระหว่างคุณสมบัติของเลเยอร์ การคำนวณการสูญเสียสำหรับหลายเลเยอร์ช่วยให้สามารถรักษาคุณสมบัติที่คล้ายคลึงกันซึ่งสัมพันธ์กันภายในในเลเยอร์ต่างๆ ระหว่างภาพสไตล์และภาพที่สร้างขึ้น
Style loss สำหรับเลเยอร์เดียวคำนวณดังนี้:

ในสมการ A คือเมทริกซ์ Gram สำหรับภาพสไตล์ และ Gis คือเมทริกซ์ Gram สำหรับภาพที่สร้างขึ้น ทั้งคู่จะสัมพันธ์กับเลเยอร์ที่กำหนด N และ M คือความกว้างและความสูงของภาพสไตล์
ในสมการ A คือเมทริกซ์ Gram สำหรับภาพสไตล์ และ Gis คือเมทริกซ์ Gram สำหรับภาพที่สร้างขึ้น ทั้งคู่จะสัมพันธ์กับเลเยอร์ที่กำหนด N และ M คือความกว้างและความสูงของภาพสไตล์
การสูญเสียสไตล์จะถูกคำนวณก่อนสำหรับทุกๆ เลเยอร์ จากนั้นจึงนำไปใช้กับทุกเลเยอร์ที่พิจารณาว่าเป็นโมเดลของสไตล์
นำใช้งาน:

def gram_matrix(x):
    features = K.batch_flatten(K.permute_dimensions(x, (2, 0, 1)))
    gram = K.dot(features, K.transpose(features))
    return gram


def style_loss_per_layer(style, combination):
    S = gram_matrix(style)
    C = gram_matrix(combination)
    channels = 3
    size = resized_width * resized_height
    return K.sum(K.square(S - C)) / (4. * (channels ** 2) * (size ** 2))

def total_style_loss(feature_layers):
    loss = K.variable(0.)
    for layer_name in feature_layers:
        layer_features = outputs_dict[layer_name]
        style_reference_features = layer_features[1, :, :, :]
        combination_features = layer_features[2, :, :, :]
        sl = style_loss_per_layer(style_reference_features, combination_features)
        loss += (style_weight / len(feature_layers)) * sl
    return loss

Variation loss

องค์ประกอบสุดท้ายของ loss คือ Variation loss องค์ประกอบนี้ไม่รวมอยู่ในเอกสารต้นฉบับ และไม่จำเป็นอย่างยิ่งต่อความสำเร็จของโครงการ ถึงกระนั้น ในเชิงประจักษ์ก็ได้รับการพิสูจน์แล้วว่าการเพิ่มองค์ประกอบนี้ให้ผลลัพธ์ที่ดีกว่า เนื่องจากทำให้การเปลี่ยนแปลงของสีระหว่างพิกเซลระยะใกล้ราบรื่นขึ้น
นำมาใส่รวมกัน:

def total_variation_loss(x):
    a = K.square(x[:, :resized_width - 1, :resized_height - 1, :] - x[:, 1:, :resized_height - 1, :])
    b = K.square(x[:, :resized_width - 1, :resized_height - 1, :] - x[:, :resized_width - 1, 1:, :])
    return K.sum(K.pow(a + b, 1.25))

Total loss

สุดท้าย Total loss จะคำนวณโดยคำนึงถึงส่วนร่วมเหล่านี้ทั้งหมด ขั้นแรก เราต้องแยกผลลัพธ์ของเลเยอร์เฉพาะที่เราเลือก ในการทำเช่นนี้ เรากำหนด dictionary เป็น :

# Get the outputs of each key layer, through unique names.
outputs_dict = dict([(layer.name, layer.output) for layer in model.layers])

จากนั้นเราจะคำนวณ loss ที่เรียกใช้ฟังก์ชันโค้ดก่อนหน้านี้ แต่ละองค์ประกอบจะคูณด้วยน้ำหนักเฉพาะ ซึ่งเราสามารถปรับให้เอฟเฟกต์เข้มข้นหรือเบาลงได้:

def total_loss():
    loss = K.variable(0.)

    # contribution of content_loss
    feature_layers_content = outputs_dict['block5_conv2']
    loss += content_weight * content_loss(feature_layers_content)

    # contribution of style_loss
    feature_layers_style = ['block1_conv1', 'block2_conv1',
                            'block3_conv1', 'block4_conv1',
                            'block5_conv1']
    loss += total_style_loss(feature_layers_style) * style_weight

    # contribution of variation_loss
    loss += total_variation_weight * total_variation_loss(combination_image)
    return loss

Setting up the Neural Network

The VGG19 network เป็นชุดอินพุตของภาพสามภาพ: ภาพเนื้อหาอินพุต ภาพอ้างอิงสไตล์ และเทนเซอร์เชิงสัญลักษณ์ซึ่งมีภาพที่สร้างขึ้น สองอันแรกเป็นตัวแปรคงที่และถูกกำหนดให้เป็นตัวแปรโดยใช้แพ็คเกจ keras.backend ตัวแปรที่สามถูกกำหนดเป็นตัวยึดตำแหน่งเนื่องจากจะเปลี่ยนแปลงตลอดเวลาในขณะที่เครื่องมือเพิ่มประสิทธิภาพอัปเดตผลลัพธ์

เมื่อเริ่มต้นตัวแปรแล้ว เราก็รวมตัวแปรเหล่านั้นในเทนเซอร์ซึ่งจะป้อนเข้าสู่ network ในภายหลัง

# Get tensor representations of our images
base_image = K.variable(preprocess_image(base_image_path))
style_reference_image = K.variable(preprocess_image(style_reference_image_path))

# Placeholder for generated image
combination_image = K.placeholder((1, resized_width, resized_height, 3))

# Combine the 3 images into a single Keras tensor
input_tensor = K.concatenate([base_image,
                              style_reference_image,
                              combination_image], axis=0)

เมื่อเสร็จแล้ว เราต้องกำหนดloss การไล่ระดับสี และผลลัพธ์ กระดาษต้นฉบับใช้อัลกอริทึม L-BFGS เป็นตัวเพิ่มประสิทธิภาพ หนึ่งในข้อจำกัดของอัลกอริทึมนี้คือต้องผ่าน loss และการไล่ระดับสีแยกกัน เนื่องจากการคำนวณค่าเหล่านี้โดยอิสระจะไม่มีประสิทธิภาพอย่างยิ่ง เราจะใช้คลาส Evaluator ที่คำนวณค่า loss และค่าการไล่ระดับสีพร้อมกัน แต่จะส่งคืนค่าแยกกัน

loss = total_loss()

# Get the gradients of the generated image
grads = K.gradients(loss, combination_image)
outputs = [loss]
outputs += grads

f_outputs = K.function([combination_image], outputs)

# Evaluate the loss and the gradients respect to the generated image. It is called in the Evaluator, necessary to
# compute the gradients and the loss as two different functions (limitation of the L-BFGS algorithm) without
# excessive losses in performance
def eval_loss_and_grads(x):
    x = x.reshape((1, resized_width, resized_height, 3))
    outs = f_outputs([x])
    loss_value = outs[0]
    if len(outs[1:]) == 1:
        grad_values = outs[1].flatten().astype('float64')
    else:
        grad_values = np.array(outs[1:]).flatten().astype('float64')
    return loss_value, grad_values

# Evaluator returns the loss and the gradient in two separate functions, but the calculation of the two variables
# are dependent. This reduces the computation time, since otherwise it would be calculated separately.
class Evaluator(object):

    def __init__(self):
        self.loss_value = None
        self.grads_values = None

    def loss(self, x):
        assert self.loss_value is None
        loss_value, grad_values = eval_loss_and_grads(x)
        self.loss_value = loss_value
        self.grad_values = grad_values
        return self.loss_value

    def grads(self, x):
        assert self.loss_value is not None
        grad_values = np.copy(self.grad_values)
        self.loss_value = None
        self.grad_values = None
        return grad_values

evaluator = Evaluator()

ขั้นตอนสุดท้าย

ขั้นตอนสุดท้ายคือการทำซ้ำตัวเพิ่มประสิทธิภาพหลาย ๆ ครั้งจนกว่าเราจะถึง loss ที่ต้องการหรือผลลัพธ์ที่ต้องการ เราจะบันทึกผลลัพธ์ตามการวนซ้ำเพื่อตรวจสอบว่าอัลกอริทึมทำงานตามที่คาดไว้ หากผลลัพธ์ไม่เป็นที่พอใจ เราสามารถเล่นกับน้ำหนักเพื่อปรับปรุงภาพที่สร้างขึ้น

# The oprimizer is fmin_l_bfgs
for i in range(iterations):
    print('Iteration: ', i)
    x, min_val, info = fmin_l_bfgs_b(evaluator.loss,
                                     x.flatten(),
                                     fprime=evaluator.grads,
                                     maxfun=15)

    print('Current loss value:', min_val)

    # Save current generated image
    img = deprocess_image(x.copy())
    fname = 'img/new' + np.str(i) + '.png'
    save(fname, img)