DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

2

SGD in PyTorch

Buy Me a Coffee

*Memos:

  • My post explains CGD(Classic Gradient Descent), Momentum and Nesterov's Momentum.
  • My post explains Module().

SGD() can do the basic gradient descent with or without Momentum or Nesterov's Momentum as shown below. *SGD() in PyTorch is Classic(Basic) Gradient Descent(CGD) but not Stochastic Gradient Descent(SGD):

*Memos:

  • The 1st argument for initialization is params(Required-Type:generator).
  • The 2nd argument for initialization is lr(Optional-Default:0.001-Type:int or float). *It must be 0 <= x.
  • The 3rd argument for initialization is momentum(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 4th argument for initialization is dampening(Optional-Default:0-Type:int or float).
  • The 5th argument for initialization is weight_decay(Optional-Default:0-Type:int or float). *It must be 0 <= x.
  • The 6th argument for initialization is nesterov(Optional-Default:False-Type:bool). *If it's True, Nesterov's Momentum is used while if it's False, Momentum is used.
  • There is maximize argument for initialization(Optional-Default:False-Type:bool). *maximize= must be used.
  • There is foreach argument for initialization(Optional-Default:None-Type:bool). *foreach= must be used.
  • There is differentiable argument for initialization(Optional-Default:False-Type:bool). *differentiable= must be used.
  • There is fused argument for initialization(Optional-Default:None-Type:bool). *fused= must be used.
  • Both foreach and fused cannot be True.
  • Both differentiable and fused cannot be True.
  • step() can update parameters.
  • zero_grad() can reset gradients.
from torch import nn
from torch import optim

class MyModel(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear_layer = nn.Linear(in_features=4, out_features=5)

    def forward(self, x):
        return self.linear_layer(x)

mymodel = MyModel()

sgd = optim.SGD(params=mymodel.parameters())
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )

sgd.state_dict()
# {'state': {},
#  'param_groups': [{'lr': 0.001,
#    'momentum': 0,
#    'dampening': 0,
#    'weight_decay': 0,
#    'nesterov': False,
#    'maximize': False,
#    'foreach': None,
#    'differentiable': False,
#    'fused': None,
#    'params': [0, 1]}]}

sgd.step()
sgd.zero_grad()
# None

sgd = optim.SGD(params=mymodel.parameters(), lr=0.001, momentum=0, 
            dampening=0, weight_decay=0, nesterov=False, maximize=False, 
            foreach=None, differentiable=False, fused=None)
sgd
# SGD (
# Parameter Group 0
#     dampening: 0
#     differentiable: False
#     foreach: None
#     fused: None
#     lr: 0.001
#     maximize: False
#     momentum: 0
#     nesterov: False
#     weight_decay: 0
# )
Enter fullscreen mode Exit fullscreen mode

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more

Top comments (0)

Image of Docusign

🛠️ Bring your solution into Docusign. Reach over 1.6M customers.

Docusign is now extensible. Overcome challenges with disconnected products and inaccessible data by bringing your solutions into Docusign and publishing to 1.6M customers in the App Center.

Learn more