<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Aleksei Gutikov</title>
    <description>The latest articles on DEV Community by Aleksei Gutikov (@agutikov).</description>
    <link>https://dev.to/agutikov</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F299532%2Fdbd09007-92e2-46a6-86da-c025f1b8ef41.jpeg</url>
      <title>DEV Community: Aleksei Gutikov</title>
      <link>https://dev.to/agutikov</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/agutikov"/>
    <language>en</language>
    <item>
      <title>VS Code sporadic freeze</title>
      <dc:creator>Aleksei Gutikov</dc:creator>
      <pubDate>Sat, 01 Mar 2025 23:17:52 +0000</pubDate>
      <link>https://dev.to/agutikov/vs-code-sporadic-freeze-38na</link>
      <guid>https://dev.to/agutikov/vs-code-sporadic-freeze-38na</guid>
      <description>&lt;p&gt;Just want to leave it somewhere, I've spent the whole night troubleshooting this.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;"Enable memory monitoring"&lt;/strong&gt; feature of &lt;strong&gt;Konsole&lt;/strong&gt; actually sets it's own &lt;code&gt;cgroup&lt;/code&gt; &lt;code&gt;memory.high&lt;/code&gt; limit.&lt;br&gt;
It could cause significant performance degradation of heavy processes started in terminal.&lt;br&gt;
In my case - it caused VS Code freezes.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcvfzil58z7q3bgsw46b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgcvfzil58z7q3bgsw46b.png" alt="Image description" width="800" height="678"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, this Konsole setting actually does this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;QFile&lt;/span&gt; &lt;span class="nf"&gt;memHighFile&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;_createdAppCGroupPath&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;QDir&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;separator&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;QStringLiteral&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"memory.high"&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="n"&gt;memHighFile&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;write&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;QStringLiteral&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"%1M"&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;arg&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;newMemHigh&lt;/span&gt;&lt;span class="p"&gt;).&lt;/span&gt;&lt;span class="n"&gt;toLocal8Bit&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you start some fancy modern IDE from terminal, for example like I'm used to:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;git clone git@github.com:xyz/foo_bar.git
&lt;span class="nb"&gt;cd &lt;/span&gt;foo_bar
code ./
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It will run in the same cgroup together with Konsole (and with all other tabs).&lt;/p&gt;

&lt;p&gt;My VS Code with all plugins uses about 1-1.4G RAM while browsing small projects.&lt;br&gt;
So just after 1-2 seconds after start when plugins occupied all available memory UI just freezes. Not completely, but it became very-very sloooooow.&lt;br&gt;
And processes started by plugins (/opt/visual-studio-code/code, cpptools, clang-tidy, gopls, ...) start generating a lot of small disk reads.&lt;/p&gt;

&lt;p&gt;It happens because all those executables and dynamic libraries that VS Code and plugins pull into the memory does not fit the limit and Linux starts swapping pages with the code being executed. &lt;/p&gt;

&lt;h1&gt;
  
  
  Now the story
&lt;/h1&gt;

&lt;p&gt;First I looked at &lt;code&gt;htop&lt;/code&gt; and &lt;code&gt;iotop&lt;/code&gt; - found a bunch of processes, that looked like vscode plugins, trashing my SSD.&lt;br&gt;
I was shocked! Brand new SSD is dying! Or I occasionally installed some malware and now it stealing my (empty :)) bitcoin wallet! Or my OS goes crazy! Bug in Linux kernel! WTF is going on?!&lt;/p&gt;

&lt;p&gt;The most mind-blowing thing I have found after series of experiments was that it worked fine if I start it from Applications menu, or with Alt+F2.&lt;/p&gt;

&lt;p&gt;Just imagine - the same text editor :), when you start it by click on the icon in the Applications menu works fine, and if you start it with command line - it just completely refuses to work.&lt;/p&gt;

&lt;p&gt;First I expected different versions. I'm newbie in Manjaro, so maybe I have installed two different versions somehow. &lt;code&gt;ps&lt;/code&gt;, &lt;code&gt;which&lt;/code&gt;, &lt;code&gt;locate&lt;/code&gt; and &lt;code&gt;pacman&lt;/code&gt; showed it was the same application from a single package.&lt;/p&gt;

&lt;p&gt;So my next guess was env vars. I was looking for some differences in the environment variables.&lt;br&gt;
No results naturally.&lt;/p&gt;

&lt;p&gt;Then tried to disable all plugins.&lt;br&gt;
Of course it started working smoothly :) Of course! So I started enabling and disabling different plugins trying to find the one broken that eats my SSD.&lt;br&gt;
At this point I could get even more confused. Because it obviously affects the behavior, but that is not a root cause.&lt;/p&gt;

&lt;p&gt;By this time I already understood that actually heavy IO load must not affect UI, must not cause freezes.&lt;br&gt;
And also I noticed that terminal application where I start the code from also became irresponsible. So I had to switch between Konsole and Yakuake starting vscode in one window and killing it in another one.&lt;/p&gt;

&lt;p&gt;And after combining those two things:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;lots of small disk reads&lt;/li&gt;
&lt;li&gt;and freeze of the terminal application where I start the IDE&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I finally recalled that yesterday I have, just in case, enabled this "memory monitoring" in Konsole.&lt;/p&gt;

&lt;p&gt;Doublefacepalm.&lt;/p&gt;

&lt;h1&gt;
  
  
  I've learned something today. Again
&lt;/h1&gt;

&lt;ol&gt;
&lt;li&gt;If you do not understand what those settings are really doing - be ready for magic things happening.&lt;/li&gt;
&lt;li&gt;In software sporadic issues are not sporadic. Mostly.&lt;/li&gt;
&lt;li&gt;You can handle any problem if apply enough patience and expertise.
&lt;/li&gt;
&lt;/ol&gt;

</description>
      <category>linux</category>
      <category>konsole</category>
      <category>kde</category>
      <category>cgroup</category>
    </item>
    <item>
      <title>
Who is faster: compare_exchange or fetch_add?</title>
      <dc:creator>Aleksei Gutikov</dc:creator>
      <pubDate>Wed, 15 Apr 2020 23:01:59 +0000</pubDate>
      <link>https://dev.to/agutikov/who-is-faster-compareexchange-or-fetchadd-1pjc</link>
      <guid>https://dev.to/agutikov/who-is-faster-compareexchange-or-fetchadd-1pjc</guid>
      <description>&lt;ul&gt;
&lt;li&gt;Differences and similarity&lt;/li&gt;
&lt;li&gt;
Benchmarks

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;shared_ptr&lt;/code&gt; implementaion&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;spinlock&lt;/code&gt; implementation&lt;/li&gt;
&lt;li&gt;Benchmark for &lt;code&gt;shared_ptr&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Benchmark for &lt;code&gt;spinlock&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;Measurements&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;Resulting charts&lt;/li&gt;

&lt;li&gt;Conclusion&lt;/li&gt;

&lt;li&gt;Update&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;C++ has out of the box cross-platfrorm atomic operations since C++11.&lt;/p&gt;

&lt;p&gt;There are several types of &lt;a href="https://en.cppreference.com/w/cpp/atomic" rel="noopener noreferrer"&gt;atomic operations&lt;/a&gt;.&lt;br&gt;
Here I want to compare 2 types of them:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;atomic_fetch_add&lt;/code&gt;, &lt;code&gt;atomic_fetch_sub&lt;/code&gt;, etc...&lt;/li&gt;
&lt;li&gt;&lt;code&gt;atomic_compare_exchange&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This kind of atomic ops existed in computer programming long before C++11.&lt;br&gt;
For example &lt;a href="https://en.wikipedia.org/wiki/Compare-and-swap" rel="noopener noreferrer"&gt;Compare-and-swap (CAS)&lt;/a&gt; and &lt;a href="https://en.wikipedia.org/wiki/Fetch-and-add" rel="noopener noreferrer"&gt;Fetch-and-add (FAA)&lt;/a&gt; where implemented as CPU instructions in Intel 80486 - &lt;a href="https://en.wikipedia.org/wiki/X86_instruction_listings#Added_with_80486" rel="noopener noreferrer"&gt;CMPXCHG and XADD&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Here I will not talk about origin of atomic operations and problem they are designed to solve - data races.&lt;/p&gt;

&lt;p&gt;Here I want to concentrate only on next 2 points:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Comparison of semantic and typical use cases of atomic CAS and FAA.&lt;/li&gt;
&lt;li&gt;Comparison of performance of CAS and FAA in typical and abnormal cases.&lt;/li&gt;
&lt;/ol&gt;
&lt;h2&gt;
  
  
  Differences and similarity &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Most understandable description of atomic operation that I know is equivalent code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;atomic_fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;old&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;+=&lt;/span&gt; &lt;span class="n"&gt;add&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;old&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="nf"&gt;atomic_compare_exchange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;target&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;desired&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;expected&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;target&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;desired&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Difference in behaviour:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;compare_exchange&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;fetch_add&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;can fail&lt;/td&gt;
&lt;td&gt;can't fail - always succeeds&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;leaves memory unchanged if fails&lt;/td&gt;
&lt;td&gt;always changes memory&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;succeeds only within narrow contition - equality of values pointed to by &lt;code&gt;target&lt;/code&gt; and &lt;code&gt;expected&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;rollback of changes requires another memory write operation&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;implies a loop of retries&lt;/td&gt;
&lt;td&gt;no loops&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Both &lt;code&gt;compare_exchange&lt;/code&gt; and &lt;code&gt;fetch_add&lt;/code&gt; are equivalent, in the sense that it is possible to define (implement) one through another. &lt;br&gt;
But differences are huge and sometimes (you'll see) usage of inappropriate atomic operation leads to significant performance impact up to complete inoperability of program.&lt;/p&gt;

&lt;p&gt;Basic use cases:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;&lt;code&gt;compare_exchange&lt;/code&gt;&lt;/th&gt;
&lt;th&gt;&lt;code&gt;fetch_add&lt;/code&gt;&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;program thread waits for some changes made by the other thread&lt;/td&gt;
&lt;td&gt;program thread can continue progress in any case&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;lock, mutual exclusion (mutex)&lt;/td&gt;
&lt;td&gt;atomic counter, refcounter (shared_ptr, ...)&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;
&lt;h2&gt;
  
  
  Benchmarks &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Basically, implemented &lt;code&gt;shared_ptr&lt;/code&gt; and &lt;code&gt;spinlock&lt;/code&gt; each with both &lt;code&gt;compare_exchange&lt;/code&gt; and &lt;code&gt;fetch_add&lt;/code&gt;.&lt;br&gt;
And compared their performance under &lt;strong&gt;aggressive&lt;/strong&gt; &lt;strong&gt;concurrent&lt;/strong&gt; read/write access.&lt;br&gt;
Also compared to standard &lt;a href="https://en.cppreference.com/w/cpp/thread/mutex" rel="noopener noreferrer"&gt;&lt;code&gt;std::mutex&lt;/code&gt;&lt;/a&gt; and &lt;a href="https://en.cppreference.com/w/cpp/memory/shared_ptr" rel="noopener noreferrer"&gt;&lt;code&gt;std::shared_ptr&lt;/code&gt;&lt;/a&gt;.&lt;br&gt;
Code: &lt;a href="https://github.com/agutikov/faa_vs_cmpxchg" rel="noopener noreferrer"&gt;https://github.com/agutikov/faa_vs_cmpxchg&lt;/a&gt;&lt;/p&gt;
&lt;h3&gt;
  
  
  &lt;code&gt;shared_ptr&lt;/code&gt; implementaion &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Instead of posting complete code of &lt;code&gt;shared_ptr&lt;/code&gt; here, I provide just difference between two implementations.&lt;br&gt;
Main part is &lt;strong&gt;decrement&lt;/strong&gt; of &lt;strong&gt;reference counter&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;Implemented with &lt;code&gt;fetch_sub&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;decref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;*&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_fetch_sub&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implemented with &lt;code&gt;compare_exchange&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="nf"&gt;decref&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;*&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;int&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="k"&gt;do&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_compare_exchange_weak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;r&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="o"&gt;-&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benchmark contains both implementations of &lt;code&gt;decref&lt;/code&gt;: with &lt;code&gt;std::atomic_compare_exchange_weak&lt;/code&gt; and &lt;code&gt;std::atomic_compare_exchange_strong&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  &lt;code&gt;spinlock&lt;/code&gt; implementation &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Main part of &lt;code&gt;spinlock&lt;/code&gt; is .&lt;/p&gt;

&lt;p&gt;Implemented with &lt;code&gt;fetch_add&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;spinlock&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_fetch_add&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Implemented with &lt;code&gt;compare_exchange&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;spinlock&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;bool&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="n"&gt;v&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_compare_exchange_weak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="p"&gt;(;;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;load&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_compare_exchange_weak&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;v&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="nb"&gt;true&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
                        &lt;span class="k"&gt;break&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
                    &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="p"&gt;}&lt;/span&gt;
                &lt;span class="c1"&gt;// benchmarked both variants with "pause" and without&lt;/span&gt;
                &lt;span class="kr"&gt;__asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"pause"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="p"&gt;}&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;false&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Benchmark contains both implementations of &lt;code&gt;spinlock&lt;/code&gt;: with &lt;code&gt;std::atomic_compare_exchange_weak&lt;/code&gt; and &lt;code&gt;std::atomic_compare_exchange_strong&lt;/code&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  Benchmark for &lt;code&gt;shared_ptr&lt;/code&gt; &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Creates single instance of &lt;code&gt;shred_ptr&amp;lt;int&amp;gt;&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Pass it into variable number of benchmark threads.&lt;/li&gt;
&lt;li&gt;Each thread do copy and move of shared_ptr N times in loop.&lt;/li&gt;
&lt;li&gt;N = Total_N / n_threads. So each run of benchmark with different number of thread do the same total number of lopp iterations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Workload function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt; &lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt; &lt;span class="nc"&gt;T&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt; &lt;span class="k"&gt;typename&lt;/span&gt; &lt;span class="nc"&gt;shared_ptr_t&lt;/span&gt; &lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;shared_ptr_benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;shared_ptr_t&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;shared_ptr_t&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="n"&gt;shared_ptr_t&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;move&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;
        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt;&lt;span class="n"&gt;p1&lt;/span&gt; &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="c1"&gt;// try to protect code from been optimized out by compiler&lt;/span&gt;
            &lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;-=&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt; &lt;span class="k"&gt;else&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ERROR %p %p %p&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p1&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;p2&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;block&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How to get callable for benchmark threads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;shared_ptr_t&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nf"&gt;int&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;));&lt;/span&gt;

&lt;span class="k"&gt;auto&lt;/span&gt; &lt;span class="n"&gt;thread_work&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;shared_ptr_benchmark&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;shared_ptr_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;p&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
    &lt;span class="p"&gt;};&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Benchmark for &lt;code&gt;spinlock&lt;/code&gt; &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;ol&gt;
&lt;li&gt;Create single instance of &lt;code&gt;spinlock&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Pass it into variable number of benchmark threads.&lt;/li&gt;
&lt;li&gt;Each thread do fast modifications of global variables with holding a lock, N times in loop.&lt;/li&gt;
&lt;li&gt;N = Total_N / n_threads. So each run of benchmark with different number of thread do the same total number of loop iterations.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Workload function:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;spinlock_benchmark_data&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;value1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;uint8_t&lt;/span&gt; &lt;span class="n"&gt;padding&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;8192&lt;/span&gt;&lt;span class="p"&gt;];&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;value2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;

&lt;span class="k"&gt;template&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="k"&gt;typename&lt;/span&gt; &lt;span class="nc"&gt;spinlock_t&lt;/span&gt;&lt;span class="p"&gt;&amp;gt;&lt;/span&gt;
&lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;spinlock_benchmark&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;spinlock_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;slock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="kt"&gt;int64_t&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
                        &lt;span class="n"&gt;spinlock_benchmark_data&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;slock&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value1&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value1&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value2&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
        &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;value2&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

        &lt;span class="n"&gt;slock&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;v1&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="n"&gt;fprintf&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;stderr&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="s"&gt;"ERROR %lu != %lu&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;v2&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
            &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;abort&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;

        &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="o"&gt;--&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;How to get callable for benchmark threads:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="n"&gt;spinlock_t&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;slock&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;spinlock_t&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="n"&gt;spinlock_benchmark_data&lt;/span&gt;&lt;span class="o"&gt;*&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="n"&gt;spinlock_benchmark_data&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;slock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;hash&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="kr"&gt;thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;hasher&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="kt"&gt;size_t&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;hasher&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;this_thread&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;get_id&lt;/span&gt;&lt;span class="p"&gt;());&lt;/span&gt;
    &lt;span class="n"&gt;spinlock_benchmark&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="n"&gt;spinlock_t&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;slock&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;global_data&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  Measurements &lt;a&gt;&lt;/a&gt;
&lt;/h3&gt;

&lt;p&gt;Measured:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Complete wall-clock time of execution of all benchmark threads with &lt;a href="https://en.cppreference.com/w/cpp/chrono/steady_clock" rel="noopener noreferrer"&gt;&lt;code&gt;std::chrono::steady_clock&lt;/code&gt;&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;CPU time spent, by &lt;a href="https://en.cppreference.com/w/cpp/chrono/c/clock" rel="noopener noreferrer"&gt;&lt;code&gt;std::clock&lt;/code&gt;&lt;/a&gt;.
Then results divided by total number of iterations performed to evaluate approximate read and CPU time spent for each iteration.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Resulting charts &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Benchmarks ran on a laptop with i7-8550U.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;shared_ptr&lt;/code&gt;: average CPU time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgyijnzw2ithfl5yauhw0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fgyijnzw2ithfl5yauhw0.png" alt="shared_ptr cpu time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;shared_ptr&lt;/code&gt;: average CPU time per loop iteration, relative to baseline &lt;code&gt;std::shared_ptr&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft8rvrzp1zbhx19ibtpsj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Ft8rvrzp1zbhx19ibtpsj.png" alt="shared_ptr cpu time rel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;shared_ptr&lt;/code&gt;: average wall-clock time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fklhhsi689onegfb855sp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fklhhsi689onegfb855sp.png" alt="shared_ptr time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;shared_ptr&lt;/code&gt;: average wall-clock time per loop iteration, relative to baseline &lt;code&gt;std::shared_ptr&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F69z54c9ymk9abnpl3pwj.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F69z54c9ymk9abnpl3pwj.png" alt="shared_ptr time rel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw53ndeu9sjgkvij6p5tf.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fw53ndeu9sjgkvij6p5tf.png" alt="spinlock cpu time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F13s9lytdd8mn0pbzcfza.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F13s9lytdd8mn0pbzcfza.png" alt="spinlock cpu time rel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, without &lt;code&gt;fetch_add&lt;/code&gt;, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0fisejgbobxf75ba6umt.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2F0fisejgbobxf75ba6umt.png" alt="spinlock cpu time wo faa"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, without &lt;code&gt;fetch_add&lt;/code&gt;, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnux61krs6h3zyg671wj0.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fnux61krs6h3zyg671wj0.png" alt="spinlock cpu time rel wo faa"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foq3la2wo3r5oyjwzen5z.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Foq3la2wo3r5oyjwzen5z.png" alt="spinlock time"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq2onbxuoyltus9fmt6nb.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fq2onbxuoyltus9fmt6nb.png" alt="spinlock time rel"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, without &lt;code&gt;fetch_add&lt;/code&gt;, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkrexyohwy1tk5tjyqh08.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fkrexyohwy1tk5tjyqh08.png" alt="spinlock time wo faa"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, without &lt;code&gt;fetch_add&lt;/code&gt;, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fuoigkozdo8t7jvck0hoa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fuoigkozdo8t7jvck0hoa.png" alt="spinlock time rel wo faa"&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h2&gt;
  
  
  Conclusion &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;fetch_sub&lt;/code&gt; is twice faster then &lt;code&gt;compare_exchange&lt;/code&gt; for reference counters.&lt;/li&gt;
&lt;li&gt;But standard &lt;code&gt;std::shared_ptr&lt;/code&gt; is much faster than my implementation, but I still do not understand why.&lt;/li&gt;
&lt;li&gt;Spinlock implemented with &lt;code&gt;fetch_add&lt;/code&gt; cause so many memory modifications that it drastically (x100) increase time required for lock operation. It shows unacceptable performance while used in 4 threads and more.&lt;/li&gt;
&lt;li&gt;Simple spinlock implemented with &lt;code&gt;compare_exchange&lt;/code&gt; can be as fast as &lt;code&gt;std::mutex&lt;/code&gt; and even little-bit faster.&lt;/li&gt;
&lt;/ul&gt;
&lt;h2&gt;
  
  
  Update &lt;a&gt;&lt;/a&gt;
&lt;/h2&gt;

&lt;p&gt;Right after publication I realized that &lt;code&gt;atomic_fetch_or&lt;/code&gt; is much more suited for &lt;code&gt;spinlock&lt;/code&gt; implementation. Here it is:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight cpp"&gt;&lt;code&gt;&lt;span class="k"&gt;struct&lt;/span&gt; &lt;span class="nc"&gt;for_spinlock&lt;/span&gt;
&lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic&lt;/span&gt;&lt;span class="o"&gt;&amp;lt;&lt;/span&gt;&lt;span class="kt"&gt;int&lt;/span&gt;&lt;span class="o"&gt;&amp;gt;&lt;/span&gt; &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="n"&gt;lock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;std&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="n"&gt;atomic_fetch_or&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;&amp;amp;&lt;/span&gt;&lt;span class="n"&gt;locked&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;!=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
            &lt;span class="kr"&gt;__asm&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s"&gt;"pause"&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
        &lt;span class="p"&gt;}&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;

    &lt;span class="kt"&gt;void&lt;/span&gt; &lt;span class="nf"&gt;unlock&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt;
    &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="n"&gt;locked&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;};&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And here is benchmark results:&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsjgpt84ayk0k0tlxldjr.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fsjgpt84ayk0k0tlxldjr.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average CPU time per loop iteration, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo2tv2ko40n5znlprcuh6.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fo2tv2ko40n5znlprcuh6.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, nanoseconds, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu0ekb9rxutabb4r9f2f1.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fu0ekb9rxutabb4r9f2f1.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;code&gt;spinlock&lt;/code&gt;: average wall-clock time per loop iteration, relative to baseline &lt;code&gt;std::mutex&lt;/code&gt;, less is better:&lt;br&gt;
&lt;a href="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fs979bed732y6f1jf0p22.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fi%2Fs979bed732y6f1jf0p22.png" alt="Alt Text"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;So, better implementation of &lt;code&gt;spinlock&lt;/code&gt; with &lt;code&gt;atomic_fetch_or&lt;/code&gt; is only 2 times slower than &lt;code&gt;compare_exchange&lt;/code&gt; one.&lt;/p&gt;

</description>
      <category>cpp</category>
      <category>benchmark</category>
    </item>
  </channel>
</rss>
