DEV Community

Super Kai (Kazuya Ito)
Super Kai (Kazuya Ito)

Posted on • Edited on

1

LayerNorm in PyTorch

Buy Me a Coffee

*Memos:

LayerNorm() can get the 1D or more D tensor of the zero or more elements computed by Layer Normalization from the 1D or more D tensor of zero or more elements as shown below:

*Memos:

  • The 1st argument for initialization is normalized_shape(Required-Type:int, tuple or list of int or torch.Size). *It must be 0 <= x.
  • The 2nd argument for initialization is eps(Optional-Default:1e-05-Type:float).
  • The 3rd argument for initialization is elementwise_affine=True(Optional-Default:True-Type:bool).
  • The 4th argument for initialization is bias(Optional-Default:True-Type:bool). *My post explains bias argument.
  • The 5th argument for initialization is device(Optional-Default:None-Type:str, int or device()): *Memos:
  • The 6th argument for initialization is dtype(Optional-Default:None-Type:dtype): *Memos:
  • The 1st argument is input(Required-Type:tensor of float): *Memos:
    • It must be the 1D or more D tensor of zero or more elements.
    • The number of the elements of the deepest dimension must be same as normalized_shape.
    • Its device and dtype must be same as LayerNorm()'s.
    • The tensor's requires_grad which is False by default is set to True by LayerNorm().
  • layernorm1.device and layernorm1.dtype don't work.
import torch
from torch import nn

tensor1 = torch.tensor([8., -3., 0., 1., 5., -2.])

tensor1.requires_grad
# False

layernorm1 = nn.LayerNorm(normalized_shape=6)
tensor2 = layernorm1(input=tensor1)
tensor2
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

tensor2.requires_grad
# True

layernorm1
# LayerNorm((6,), eps=1e-05, elementwise_affine=True)

layernorm1.normalized_shape
# (6,)

layernorm1.eps
# 1e-05

layernorm1.elementwise_affine 
# True

layernorm1.bias
# Parameter containing:
# tensor([0., 0., 0., 0., 0., 0.], requires_grad=True)

layernorm1.weight
# Parameter containing:
# tensor([1., 1., 1., 1., 1., 1.], requires_grad=True)

layernorm2 = nn.LayerNorm(normalized_shape=6)
layernorm2(input=tensor2)
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=6, eps=1e-05, 
                         elementwise_affine=True, bias=True,
                         device=None, dtype=None)
layernorm(input=tensor1)
# tensor([1.6830, -1.1651, -0.3884, -0.1295, 0.9062, -0.9062],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[8., -3., 0.],
                          [1., 5., -2.]])
layernorm = nn.LayerNorm(normalized_shape=3)
layernorm(input=my_tensor)
# tensor([[1.3641, -1.0051, -0.3590],
#         [-0.1162, 1.2787, -1.1625]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=(2, 3))
layernorm(input=my_tensor)
# tensor([[1.6830, -1.1651, -0.3884],
#         [-0.1295, 0.9062, -0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=my_tensor.size())
layernorm(input=my_tensor)
# tensor([[1.6830, -1.1651, -0.3884],
#         [-0.1295, 0.9062, -0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[8.], [-3.], [0.],
                          [1.], [5.], [-2.]])
layernorm = nn.LayerNorm(normalized_shape=1)
layernorm(input=my_tensor)
# tensor([[0.], [0.], [0.], [0.], [0.], [0.]],
#        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=(6, 1))
layernorm(input=my_tensor)
# tensor([[1.6830], [-1.1651], [-0.3884], [-0.1295], [0.9062], [-0.9062]], #        grad_fn=<NativeLayerNormBackward0>)

layernorm = nn.LayerNorm(normalized_shape=my_tensor.size())
layernorm(input=my_tensor)
# tensor([[1.6830], [-1.1651], [-0.3884], [-0.1295], [0.9062], [-0.9062]],
#        grad_fn=<NativeLayerNormBackward0>)

my_tensor = torch.tensor([[[8., -3., 0.],
                           [1., 5., -2.]]])
layernorm = nn.LayerNorm(normalized_shape=3)
layernorm(input=my_tensor)
# tensor([[[1.3641, -1.0051, -0.3590],
#          [-0.1162, 1.2787, -1.1625]]],
#        grad_fn=<NativeLayerNormBackward0>)
Enter fullscreen mode Exit fullscreen mode

Image of Datadog

How to Diagram Your Cloud Architecture

Cloud architecture diagrams provide critical visibility into the resources in your environment and how they’re connected. In our latest eBook, AWS Solution Architects Jason Mimick and James Wenzel walk through best practices on how to build effective and professional diagrams.

Download the Free eBook

Top comments (0)

👋 Kindness is contagious

Please leave a ❤️ or a friendly comment on this post if you found it helpful!

Okay