DEV Community

4rldur0
4rldur0

Posted on

댑덥딥 4주차 정리

'모두를 위한 딥러닝 시즌 2' 강의를 듣고 공부하는 스터디 입니다. https://deeplearningzerotoall.github.io/season2/lec_tensorflow.html


비대면 3 May, 2023

08-1 Perceptron

Perceptron

-인공신경망의 한 종류

인공신경망? Neuron의 동작을 본 따 만든 모델

-x*weight의 합 + bias를 input으로 함. activation fuction(ex. sigmoid)을 거쳐 output을 만듦

-AND와 OR 문제를 해결하기 위해 만들어짐→XOR 문제는 mutilayer가 필요/linear한 classifier로는 불가능

Image description

-XOR 문제 해결(단층)

Image description

08-2 Multi Layer Perceptron

Multilayer Perceptron

-XOR 문제 해결(multilayer)

Image description

-학습할 수 있는 방법이 없었음→backpropagation 알고리즘 개발을 통해 해결

#layer 2개
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
# nn layers
linear1 = torch.nn.Linear(2, 2, bias=True)
linear2 = torch.nn.Linear(2, 1, bias=True)
sigmoid = torch.nn.Sigmoid()
model = torch.nn.Sequential(linear1, sigmoid, linear2, sigmoid).to(device)
# define cost/loss & optimizer
criterion = torch.nn.BCELoss().to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=1)
for step in range(10001):
    optimizer.zero_grad()
    hypothesis = model(X)
# cost/loss function
    cost = criterion(hypothesis, Y)
    cost.backward()
    optimizer.step()
    if step % 100 == 0:
        print(step, cost.item())
Enter fullscreen mode Exit fullscreen mode
#layer 4개
X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
# nn layers
linear1 = torch.nn.Linear(2, 10, bias=True)
linear2 = torch.nn.Linear(10, 10, bias=True)
linear3 = torch.nn.Linear(10, 10, bias=True)
linear4 = torch.nn.Linear(10, 1, bias=True)
sigmoid
...
Enter fullscreen mode Exit fullscreen mode

backpropagation

-출력값과 예측값을 비교→w 업데이트

Image description

X = torch.FloatTensor([[0, 0], [0, 1], [1, 0], [1, 1]]).to(device)
Y = torch.FloatTensor([[0], [1], [1], [0]]).to(device)
# nn layers
w1 = torch.Tensor(2, 2).to(device)
b1 = torch.Tensor(2).to(device)
w2 = torch.Tensor(2, 1).to(device)
b2 = torch.Tensor(1).to(device)
def sigmoid(x):
# sigmoid function
    return 1.0 / (1.0 + torch.exp(-x))
# return torch.div(torch.tensor(1), torch.add(torch.tensor(1.0), torch.exp(-x)))
def sigmoid_prime(x):
# derivative of the sigmoid function
    return sigmoid(x) * (1 - sigmoid(x))

for step in range(10001):
# forward
    l1 = torch.add(torch.matmul(X, w1), b1)
    a1 = sigmoid(l1)
    l2 = torch.add(torch.matmul(a1, w2), b2)
    Y_pred = sigmoid(l2)
    cost = -torch.mean(Y * torch.log(Y_pred) + (1 - Y) * torch.log(1 - Y_pred))
# Back prop (chain rule)
for step in range(10001):
# Loss derivative
    d_Y_pred = (Y_pred - Y) / (Y_pred * (1.0 - Y_pred) + 1e-7)
# Layer 2
    d_l2 = d_Y_pred * sigmoid_prime(l2)
    d_b2 = d_l2
    d_w2 = torch.matmul(torch.transpose(a1, 0, 1), d_b2)
# Layer 1
    d_a1 = torch.matmul(d_b2, torch.transpose(w2, 0, 1))
    d_l1 = d_a1 * sigmoid_prime(l1)
    d_b1 = d_l1
    d_w1 = torch.matmul(torch.transpose(X, 0, 1), d_b1)
for step in range(10001):
# Weight update
    w1 = w1 - learning_rate * d_w1
    b1 = b1 - learning_rate * torch.mean(d_b1, 0)
    w2 = w2 - learning_rate * d_w2
    b2 = b2 - learning_rate * torch.mean(d_b2, 0)
    if step % 100 == 0:
        print(step, cost.item())
Enter fullscreen mode Exit fullscreen mode

09-1 ReLU

problem of sigmoid: gradient를 구할 때 문제 발생(vanishing gradient)-양끝에서는 gradient가 0에 가까움

→ReLU? f(x) = max(0,x)

:양수면 자기 자신, 음수면 0을 출력

torch.nn.sigmoid(x)
torch.nn.tanh(x)
torch.nn.relu(x)
torch.nn.leaky_relu(x, 0.01)    #음수에서 f(x)=0이 되는 문제 해결
Enter fullscreen mode Exit fullscreen mode

Optimizer in PyTorch

torch.optim.SGD
torch.optim.Adadelta
torch.optim.Adagrad
torch.optim.Adam
torch.optim.SparseAdam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RMSprop
torch.optim.Rprop
Enter fullscreen mode Exit fullscreen mode

Image description

09-2 Weight initialization

  1. 상수로 초기화
  2. RBM/DBM
  3. Xavier/He

RBM/DBM

RBM(Restricted Boltzmann Machine): 하나의 layer 안에서는 연결 없음. 다른 layer와는 모두 연결됨

-Pre-training을 통해 실현 X→Y Y→X’

(a) X1→Y2 Y2→X1’

(b) 첫번째 layer의 w를 고정한 후 두번째 layer에 대해 반복 X2→Y3 Y3→X2’

(c) fine-tuning(=RBM적용된 weight에 대해 backpropagation)

-요즘에는 잘 사용하지 않음

Xavier/He

-특성에 따라 초기화를 달리함. 수식에 대입하는 간단한 방법

-Nin: layer의 input 수, Nout: layer의 output 수

Xavier Normal initialization

Image description

Xavier Uniform initialization

Image description

He initialization: Xavier initialization에서 Nout이 빠진 버전

He Normal initialization

Image description

He Uniform initialization

Image description

# nn layers
linear1 = torch.nn.Linear(784, 256, bias=True)
linear2 = torch.nn.Linear(256, 256, bias=True)
linear3 = torch.nn.Linear(256, 10, bias=True)
relu = torch.nn.ReLU()
# xavier initialization
torch.nn.init.xavier_uniform_(linear1.weight)
torch.nn.init.xavier_uniform_(linear2.weight)
torch.nn.init.xavier_uniform_(linear3.weight)
Enter fullscreen mode Exit fullscreen mode

09-3 Dropout

-overfitting을 해결하기 위한 방법

Image description

-한 layer에서 사전에 설정한 확률(~=비율)에 따라 무작위로 택한 일부분의 node만을 활용함

-매번 다른 형태의 network를 만듦

dropout = torch.nn.Dropout(p=drop_prob)
Enter fullscreen mode Exit fullscreen mode

*test 할 때는 dropout을 사용하지 않음.

model.eval()    #=dropout = False
Enter fullscreen mode Exit fullscreen mode

Q. backpropagation = chain rule ?


대면 6 May, 2023
Q. loss 값이 nan이 나오는 이유

Image description

perceptron

Top comments (0)