loading...

Verification of Memory Reordering Using Nim language

xflywind profile image flywind ・5 min read

Verify「CPU memory reordering」

Question

According to Memory Reordering Caught in the Act open_in_new, Intel CPU may sometimes reorder memory instructions. Memory reordering has a great effect on lock-free programming. The goal of this article is to verify memory reordering.

Given two integers X and Y, somewhere in memory, they are initialized to 0. Now we have two processors which run parallelly and execute machine codes.

Intel CPU may execute "Memory instruction reordering" and the actual execution order has become:

There are four situations: r1 = 1, r2 = 0 or r1 = 0, r2 = 1 or r1 = 1, r2 = 1 or r1 = 0, r2 = 0. Among them, r1 = 0, r2 = 0 is caused by Intel CPU "Memory instruction reordering" which is also the goal of our verification.

CPU Info

Now list my CPU Info:

Name Processors Cores Cache
Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz 4 2 3 MB Intel®
Smart Cache
$ cat /proc/cpuinfo       # Watch cpu info
$ lscpu                   # View summary

Deduction

To verify this process, we use Nim language. Providing that we have four variables, run two worker threads in multi-core processor and disable compiler's Memory reordering optimization to watch the process of cpu's Memory reordering optimization.

First we initialize x and y to zero. Then we run x = 1; r1 = y; in worker thread 1 and run y = 1; r2 = x; in worker thread 2. Finally we print the number of which r1 = 0, r2 = 0 occurs (Number of occurrences, number of iterations).

Thread 1:

x = 1
r1 = y

Thread 2:

y = 1
r2 = x

Results:

r1 == 0 and r2 == 0

Programming

First, choose a "wake <-> wait" mechanism for the threads to keep the worker threads in order. For simplicity, we manually write a simple semaphore implementation.

This semaphore implementation uses Linux's eventfd and eventfd is an efficient thread communication mechanism which is very suitable as a semaphore implementation(reference eventfd(2) — Linux manual page).

import std/posix
import std/os

# Provide semaphore-like semantics for reads from the new file descriptor.
const EFD_SEMAPHORE = 1 

proc eventfd(initval: cuint, flags: cint): cint {.
  importc: "eventfd", 
  header: "sys/eventfd.h"
.}

type
  Semaphore = object
    fd: cint

template checkOSError(syscall: untyped) =
  let n = syscall
  if n < 0:
    raiseOSError(osLastError()) 

proc initSemaphore(): Semaphore =
  # create eventfd in semaphore mode
  result.fd = eventfd(0, EFD_SEMAPHORE)

proc wait(s: var Semaphore) =
  var buf: uint64
  checkOSError s.fd.read(buf.addr, 8) 

proc signal(s: var Semaphore) = 
  var buf: uint64 = 1
  checkOSError s.fd.write(buf.addr, 8)

Implement thread functions:

import std/random

var 
  beginSem1 = initSemaphore()   # for thread 1 synchronization
  endSem1 = initSemaphore()     # for thread 1 synchronization

  beginSem2 = initSemaphore()   # for thread 2 synchronization
  endSem2 = initSemaphore()     # for thread 2 synchronization

  x: int
  y: int

  r1: int
  r2: int

const Format{.strdefine.} = "logging"  # Specify the format of the console print log

proc threadFunc1() {.thread.} =
  randomize()                          # Initialize "Random Number Generator"
  while true:
    beginSem1.wait()                   # Wait for the main thread to initiate a signal
    while rand(100) mod 8 == 0: break  # Calculate the random number
                                       # deliberately make the two threads run staggered

    # ----------------------------- transaction! -------------------------------
    x = 1
    {.emit: """asm volatile("" ::: "memory");""".} # Disable compiler reordering
    r1 = y
    # --------------------------------------------------------------------------

    endSem1.signal()                   # Notify the main thread that an iteration is complete

proc threadFunc2() {.thread.} =
  randomize()                          # Initialize "Random Number Generator"
  while true:
    beginSem2.wait()                   # Wait for the main thread to initiate a signal
    while rand(100) mod 8 == 0: break  # Calculate the random number
                                       # deliberately make the two threads run staggered

    # ----------------------------- transaction! -------------------------------
    y = 1
    {.emit: """asm volatile("" ::: "memory");""".} # Disable compiler reordering
    r2 = x
    # --------------------------------------------------------------------------

    endSem2.signal()                   # Notify the main thread that an iteration is complete

proc main() =
  var
    thread1: Thread[void]
    thread2: Thread[void]

  createThread(thread1, threadFunc1)
  createThread(thread2, threadFunc2)

  # Repeat experiment
  var n = 0                            # Number of reorders
  var counter = 0                      # The number of worker thread iterations
  while true:
    x = 0
    y = 0

    beginSem1.signal()
    beginSem2.signal()
    endSem1.wait()
    endSem2.wait()

    counter.inc()
    if r1 == 0 and r2 == 0:            # CPU memory reordering occurs
      n.inc()
      when Format == "csv":            # Output csv format: -d:Format=csv
        echo $n, ", ", $counter
      else:                            # Output log format
        echo "Reorders detected (n=" & $n & ", counter=" & $counter & ")"

  joinThread(thread1)
  joinThread(thread2)

main()

Experimental results

Open the console, run the program, and output the log:

$ nim c -r -d:release reordering.nim

Reorders detected (n=1, counter=1)
Reorders detected (n=2, counter=509)
Reorders detected (n=3, counter=512)
Reorders detected (n=4, counter=21185)
Reorders detected (n=5, counter=21400)
Reorders detected (n=6, counter=21486)
Reorders detected (n=7, counter=21490)
Reorders detected (n=8, counter=21553)
Reorders detected (n=9, counter=31530)
Reorders detected (n=10, counter=31612)
Reorders detected (n=11, counter=32948)
Reorders detected (n=12, counter=36575)
Reorders detected (n=13, counter=39927)
Reorders detected (n=14, counter=59147)
Reorders detected (n=15, counter=59165)
Reorders detected (n=16, counter=59168)
Reorders detected (n=17, counter=60922)
Reorders detected (n=18, counter=62117)
Reorders detected (n=19, counter=62121)
Reorders detected (n=20, counter=62130)
Reorders detected (n=21, counter=62407)
Reorders detected (n=22, counter=65811)
Reorders detected (n=23, counter=65829)
Reorders detected (n=24, counter=69376)
Reorders detected (n=25, counter=72983)
Reorders detected (n=26, counter=76600)
Reorders detected (n=27, counter=78803)
Reorders detected (n=28, counter=79768)
Reorders detected (n=29, counter=83267)
Reorders detected (n=30, counter=83304)
...

Further, let's make a csv file and a chart to view the trend of CPU memory reordering.

We have defined the Format symbol in the program to help control the format of the printed log. -d:Format=cs prints csv data format (number of reordering occurrences\t worker thread iterations).

Write a python script (reordering.py), read the resulting csv file (reordering.csv) and generate a graph (reordering.png).

# reordering.py

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import csv
import seaborn as sns

sns.set(style="darkgrid")

xdata = [] # Number of worker thread iterations
ydata = [] # Occurrence of reorders

with open("reordering.csv") as f:
  fcsv = csv.reader(f)
  headers = next(fcsv)
  for row in fcsv:
    xdata.append(float(row[1]))
    ydata.append(float(row[0]))

df = pd.DataFrame(dict(counter=xdata, n=ydata))

g = sns.relplot(
  x="counter", 
  y="n", 
  height=4, 
  linewidth=2, 
  aspect=1.3, 
  kind="line", 
  data=df
)
g.fig.autofmt_xdate() 
plt.savefig('reordering.png')

Run our program:

$ nim c -r -d:release reordering.nim > reordering.csv
$ python3 reordering.py

Generate csv file (reordering.csv):

1, 1524
2, 6342
3, 6396
4, 6404
5, 6612
6, 6665
7, 6706
8, 6787
9, 6789
10, 6793
11, 6821
12, 6833
13, 6845
14, 6860
15, 6869
16, 6872
...
991, 4272196
992, 4272227
993, 4272232
994, 4272236
995, 4272242
996, 4272277
997, 4272280
998, 4272293
999, 4272305
1000, 4275173
...

Generate a graph (reordering.png) --- Basically present a linear trend:

Alt Text

Discussion

markdown guide