<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Andrei Kvapil</title>
    <description>The latest articles on DEV Community by Andrei Kvapil (@kvaps).</description>
    <link>https://dev.to/kvaps</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F336431%2F5b7813f5-5ad3-4b0d-b897-65fa3876e49c.jpeg</url>
      <title>DEV Community: Andrei Kvapil</title>
      <link>https://dev.to/kvaps</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/kvaps"/>
    <language>en</language>
    <item>
      <title>How to resolve split-brain in DRBD9</title>
      <dc:creator>Andrei Kvapil</dc:creator>
      <pubDate>Mon, 19 Jul 2021 16:06:12 +0000</pubDate>
      <link>https://dev.to/kvaps/how-to-solve-split-brain-in-drbd9-15no</link>
      <guid>https://dev.to/kvaps/how-to-solve-split-brain-in-drbd9-15no</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--p2uxD_PC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kzyx2gkmcim5hgjk5dft.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--p2uxD_PC--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://dev-to-uploads.s3.amazonaws.com/uploads/articles/kzyx2gkmcim5hgjk5dft.png" alt="split-brain"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;First, let's define what split-brain is. Each replica can be either connected or disconnected towards to the other. If the replica spontaneously goes to StandAlone. It means that it refuses to accept the state and don't want to synchronize with the other. This is a classic split-brain situation.&lt;/p&gt;

&lt;p&gt;Solving the split-brain for two replicas is done in the same way as for multiple replicas.&lt;/p&gt;

&lt;p&gt;First, let's decide which replica we want to synchronize with. To do this, look into&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm status &amp;lt;res&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If the replica we need is in the &lt;code&gt;StandAlone&lt;/code&gt; or &lt;code&gt;Outdated&lt;/code&gt;/&lt;code&gt;Inconsistent&lt;/code&gt; state, it must first be switched to &lt;code&gt;UpToDate&lt;/code&gt;. To acheve this go to the node with this replica and execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm primary --force &amp;lt;res&amp;gt;
drbdadm secondary &amp;lt;res&amp;gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;And in order to make the rest of the replicas forget about their state and synchronize the data from the &lt;code&gt;UpToDate&lt;/code&gt; replicas. Go to the nodes with them and execute:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm disconnect &amp;lt;res&amp;gt;
drbdadm connect &amp;lt;res&amp;gt; --discard-my-data
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is worth mentioning that in the latest versions of LINSTOR, the auto-tiebreaker function is enabled by default. This means, when you creating a resource in two replicas, automatically adds a third diskless replica, which is a kind of arbiter for ensuring the quorum. Thus, it is less and less usual to solve split-brain these days.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>storage</category>
      <category>drbd</category>
      <category>troubleshooting</category>
    </item>
    <item>
      <title>Troubleshooting DRBD9 in LINSTOR</title>
      <dc:creator>Andrei Kvapil</dc:creator>
      <pubDate>Mon, 19 Jul 2021 15:31:24 +0000</pubDate>
      <link>https://dev.to/kvaps/troubleshooting-drbd9-in-linstor-40fn</link>
      <guid>https://dev.to/kvaps/troubleshooting-drbd9-in-linstor-40fn</guid>
      <description>&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--adsGY0kU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://habrastorage.org/webt/ft/tb/2v/fttb2vkaex5-wur6zygsbkkwj2k.png" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--adsGY0kU--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://habrastorage.org/webt/ft/tb/2v/fttb2vkaex5-wur6zygsbkkwj2k.png" alt=""&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Over the past few years of tight work with LINSTOR and DRBD9, I have accumulated a some amount of problems and solutions for them. I decided to collect all of them into single article. Not sure that you will face exactly the same problems, but now you could at least understand the mechanics of managing and troubleshooting the DRBD9-devices.&lt;/p&gt;

&lt;p&gt;There is not much information on this matter on the Internet. Hope you'll find it useful in case if you use or plan to use LINSTOR.&lt;/p&gt;



&lt;h2&gt;
  
  
  Case 1: Unknown and DELETING resources
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-10417-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10417-disk-0 ┊ m14c18 ┊ 56263 ┊        ┊       ┊  Unknown ┊ 2021-07-09 14:20:31 ┊
┊ one-vm-10417-disk-0 ┊ m15c38 ┊ 56263 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2021-04-08 07:46:43 ┊
┊ one-vm-10417-disk-0 ┊ m8c12  ┊ 56263 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2020-10-14 13:10:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Usually it is nothing critical. Just the node on which it is most likely &lt;code&gt;OFFLINE&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor n l -n m14c18
╭─────────────────────────────────────────────────────────╮
┊ Node   ┊ NodeType  ┊ Addresses                ┊ State   ┊
╞═════════════════════════════════════════════════════════╡
┊ m14c18 ┊ SATELLITE ┊ 10.36.130.153:3367 (SSL) ┊ OFFLINE ┊
╰─────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check if node has linstor-satellite service running and if it's accessible for linstor-controller.&lt;/p&gt;

&lt;p&gt;If at least one resource in &lt;code&gt;Unknown&lt;/code&gt; state, then removing any other resources will stuck on &lt;code&gt;DELETING&lt;/code&gt;. In last versions of LINSTOR such deleting resources can be switched back to live by executing &lt;code&gt;resource create&lt;/code&gt;, example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-10417-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10417-disk-0 ┊ m14c18 ┊ 56263 ┊        ┊       ┊  Unknown ┊ 2021-07-09 14:20:31 ┊
┊ one-vm-10417-disk-0 ┊ m15c38 ┊ 56263 ┊        ┊ Ok    ┊ DELETING ┊ 2021-04-08 07:46:43 ┊
┊ one-vm-10417-disk-0 ┊ m16c2  ┊ 56263 ┊        ┊ Ok    ┊ DELETING ┊ 2021-05-01 03:36:21 ┊
┊ one-vm-10417-disk-0 ┊ m8c12  ┊ 56263 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2020-10-14 13:10:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────╯

# linstor r c m15c38 one-vm-10417-disk-0 --diskless
╭────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10417-disk-0 ┊ m14c18 ┊ 56263 ┊        ┊       ┊  Unknown ┊ 2021-07-09 14:20:31 ┊
┊ one-vm-10417-disk-0 ┊ m15c38 ┊ 56263 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2021-04-08 07:46:43 ┊
┊ one-vm-10417-disk-0 ┊ m16c2  ┊ 56263 ┊        ┊ Ok    ┊ DELETING ┊ 2021-05-01 03:36:21 ┊
┊ one-vm-10417-disk-0 ┊ m8c12  ┊ 56263 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2020-10-14 13:10:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Any case, if your node totally died, the only way to get rid of &lt;code&gt;Unknown&lt;/code&gt; resource is to perform &lt;code&gt;node lost&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor node lost m14c18
# linstor r l -r one-vm-10417-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10417-disk-0 ┊ m15c38 ┊ 56263 ┊ Unused ┊ Ok    ┊ Diskless ┊ 2021-04-08 07:46:43 ┊
┊ one-vm-10417-disk-0 ┊ m8c12  ┊ 56263 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2020-10-14 13:10:42 ┊
╰────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;As we can see, the rest of the &lt;code&gt;DELETING&lt;/code&gt; resources have also disappeared. This behavior is related to the DRBD logic. If there is a chance that the resource is still existing somewhere, there is a chance that it will return to the cluster and make conflict with the other members. To avoid this, you can delete Unknown resources from a faulty node only by removing the entire faulty node.&lt;/p&gt;

&lt;h2&gt;
  
  
  Case 2: Outdated replica
&lt;/h2&gt;

&lt;p&gt;We have found replica which is Outdated for some reason:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-5899-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-5899-disk-0 ┊ m11c30 ┊ 8306 ┊ Unused ┊ Ok    ┊ UpToDate ┊ 2021-02-03 09:43:02 ┊
┊ one-vm-5899-disk-0 ┊ m13c25 ┊ 8306 ┊ Unused ┊ Ok    ┊ Outdated ┊ 2021-02-02 17:51:26 ┊
┊ one-vm-5899-disk-0 ┊ m15c25 ┊ 8306 ┊ InUse  ┊ Ok    ┊ Diskless ┊ 2021-01-18 15:51:40 ┊
╰──────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It can be fixed quite simply:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c25:~# drbdadm disconnect one-vm-5899-disk-0
root@m13c25:~# drbdadm connect --discard-my-data one-vm-5899-disk-0
root@m13c25:~# drbdadm status one-vm-5899-disk-0
one-vm-5899-disk-0 role:Secondary
  disk:UpToDate
  m11c30 role:Secondary
    peer-disk:UpToDate
  m15c25 role:Primary
    peer-disk:Diskless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; option &lt;code&gt;--discard-my-data&lt;/code&gt; is valid only for split-brain situations, in all other cases the specifying of it has no effect.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Case 3: Inconsistent replica
&lt;/h2&gt;

&lt;p&gt;Here we have resource where one of the replicas become &lt;code&gt;Inconsistent&lt;/code&gt; for some reason:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-6372-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊        State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-6372-disk-0 ┊ m10c17 ┊ 8262 ┊ Unused ┊ Ok    ┊     UpToDate ┊ 2021-02-03 09:43:31 ┊
┊ one-vm-6372-disk-0 ┊ m13c35 ┊ 8262 ┊ Unused ┊ Ok    ┊ Inconsistent ┊                     ┊
┊ one-vm-6372-disk-0 ┊ m8c10  ┊ 8262 ┊ InUse  ┊ Ok    ┊     Diskless ┊ 2021-01-05 20:22:14 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────╯

linstor v l -r one-vm-6372-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊        State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m10c17 ┊ one-vm-6372-disk-0 ┊ thindata             ┊     0 ┊    2261 ┊ /dev/drbd2261 ┊ 19.38 GiB ┊ Unused ┊     UpToDate ┊
┊ m13c35 ┊ one-vm-6372-disk-0 ┊ thindata             ┊     0 ┊    2261 ┊ /dev/drbd2261 ┊ 20.01 GiB ┊ Unused ┊ Inconsistent ┊
┊ m8c10  ┊ one-vm-6372-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    2261 ┊ /dev/drbd2261 ┊           ┊ InUse  ┊     Diskless ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;When logging in into the node, we can found that it is stuck in a sync state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c35:~# drbdadm status one-vm-6372-disk-0
one-vm-6372-disk-0 role:Secondary
  disk:Inconsistent
  m10c17 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:51.52
  m8c10 role:Primary
    peer-disk:Diskless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try to reconnect it to the second diskful replica:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c35:~# drbdadm disconnect one-vm-6372-disk-0:m10c17
root@m13c35:~# drbdadm connect one-vm-6372-disk-0:m10c17
root@m13c35:~# drbdadm status one-vm-6372-disk-0
one-vm-6372-disk-0 role:Secondary
  disk:Inconsistent
  m10c17 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:0.00
  m8c10 role:Primary
    peer-disk:Diskless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hmm, now replication is stuck at zero percent. Damn, let's recreate the resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r d m13c35 one-vm-6372-disk-0
linstor rd ap one-vm-6618-disk-9
linstor r l -r one-vm-6372-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊              State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-6372-disk-0 ┊ m10c17 ┊ 8262 ┊ Unused ┊ Ok    ┊           UpToDate ┊ 2021-02-03 09:43:31 ┊
┊ one-vm-6372-disk-0 ┊ m13c35 ┊ 8262 ┊ Unused ┊ Ok    ┊ SyncTarget(43.43%) ┊ 2021-07-09 13:36:51 ┊
┊ one-vm-6372-disk-0 ┊ m8c10  ┊ 8262 ┊ InUse  ┊ Ok    ┊           Diskless ┊ 2021-01-05 20:22:14 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hooray, replication has started!&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Case 7 shows that &lt;code&gt;drbdadm down / up&lt;/code&gt; on&lt;code&gt;m13c35&lt;/code&gt; would most likely bring the replica back to life.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Case 4: StandAlone towards the diskless replica
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-8586-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8586-disk-0 ┊ m11c42 ┊ 8543 ┊ Unused ┊ StandAlone(m13c34) ┊ Outdated ┊ 2020-11-28 22:07:23 ┊
┊ one-vm-8586-disk-0 ┊ m13c17 ┊ 8543 ┊ Unused ┊ Ok                 ┊ Diskless ┊                     ┊
┊ one-vm-8586-disk-0 ┊ m13c34 ┊ 8543 ┊ InUse  ┊ Connecting(m11c42) ┊ Diskless ┊ 2021-01-20 14:40:04 ┊
┊ one-vm-8586-disk-0 ┊ m15c36 ┊ 8543 ┊ Unused ┊ Ok                 ┊ UpToDate ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we can see that the resource on &lt;code&gt;m11c42&lt;/code&gt; is in&lt;code&gt;StandAlone&lt;/code&gt; state towards the diskless replica on &lt;code&gt;m13c34&lt;/code&gt;. Resources become &lt;code&gt;StandAlone&lt;/code&gt; state when they found data inconsistencies among themselves. The fix is quite simple:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m11c42:~# drbdadm disconnect one-vm-8586-disk-0
root@m11c42:~# drbdadm connect one-vm-8586-disk-0 --discard-my-data 
root@m11c42:~# drbdadm status one-vm-8586-disk-0
one-vm-8586-disk-0 role:Secondary
  disk:UpToDate
  m13c17 role:Secondary
    peer-disk:Diskless
  m13c34 role:Primary
    peer-disk:Diskless
  m15c36 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 5: StandAlone towards the diskful replica
&lt;/h2&gt;

&lt;p&gt;Here we have a different situation, the resource on &lt;code&gt;m11c44&lt;/code&gt; is in &lt;code&gt;StandAlone&lt;/code&gt; towards another diskful replica on &lt;code&gt;m10c27&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-8536-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8536-disk-0 ┊ m10c27 ┊ 8656 ┊ Unused ┊ Connecting(m11c44) ┊ UpToDate ┊ 2021-02-02 17:41:36 ┊
┊ one-vm-8536-disk-0 ┊ m11c44 ┊ 8656 ┊ Unused ┊ StandAlone(m10c27) ┊ Outdated ┊ 2021-02-03 09:51:30 ┊
┊ one-vm-8536-disk-0 ┊ m13c29 ┊ 8656 ┊ Unused ┊ Ok                 ┊ Diskless ┊                     ┊
┊ one-vm-8536-disk-0 ┊ m13c9  ┊ 8656 ┊ InUse  ┊ Ok                 ┊ Diskless ┊ 2021-01-21 09:21:55 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

root@m11c44:~# drbdadm status one-vm-8536-disk-0
one-vm-8536-disk-0 role:Secondary
  disk:Outdated quorum:no
  m10c27 connection:StandAlone
  m13c29 role:Secondary
    peer-disk:Diskless
  m13c9 role:Primary
    peer-disk:Diskless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;We can try to fix it, just like in the previous case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m11c44:~# drbdadm disconnect one-vm-8536-disk-0
root@m11c44:~# drbdadm connect one-vm-8536-disk-0 --discard-my-data 
root@m11c44:~# drbdadm status one-vm-8536-disk-0
one-vm-8536-disk-0 role:Secondary
  disk:Outdated quorum:no
  m10c27 connection:StandAlone
  m13c29 role:Secondary
    peer-disk:Diskless
  m13c9 role:Primary
    peer-disk:Diskless
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;But after connecting, the replica almost instantly returns back to &lt;code&gt;StandAlone&lt;/code&gt;. In dmesg for this resource, you can see the error &lt;code&gt;Unrelated data, aborting!&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[706520.163680] drbd one-vm-8536-disk-0/0 drbd2655 m10c27: drbd_sync_handshake:
[706520.163691] drbd one-vm-8536-disk-0/0 drbd2655 m10c27: self E54E31513A64A2EE:0000000000000000:35BC97142AF7A8A4:0000000000000000 bits:1266688 flags:3
[706520.163699] drbd one-vm-8536-disk-0/0 drbd2655 m10c27: peer 591D9E9CA26B4F98:66E67F43AB59AB30:4F01DD98B884F10E:0000000000000000 bits:24982941 flags:1100
[706520.163708] drbd one-vm-8536-disk-0/0 drbd2655 m10c27: uuid_compare()=unrelated-data by rule=history-both
[706520.163710] drbd one-vm-8536-disk-0/0 drbd2655: Unrelated data, aborting!
[706520.528669] drbd one-vm-8536-disk-0 m10c27: Aborting remote state change 1918960097
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It is easier to delete and recreate such a resource:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r d m11c44 one-vm-8536-disk-0      
linstor rd ap one-vm-8536-disk-0
linstor r l -r one-vm-8536-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊             State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8536-disk-0 ┊ m10c27 ┊ 8656 ┊ Unused ┊ Ok    ┊          UpToDate ┊ 2021-02-02 17:41:36 ┊
┊ one-vm-8536-disk-0 ┊ m11c44 ┊ 8656 ┊ Unused ┊ Ok    ┊ SyncTarget(0.48%) ┊ 2021-07-09 15:40:17 ┊
┊ one-vm-8536-disk-0 ┊ m13c29 ┊ 8656 ┊ Unused ┊ Ok    ┊          Diskless ┊                     ┊
┊ one-vm-8536-disk-0 ┊ m13c9  ┊ 8656 ┊ InUse  ┊ Ok    ┊          Diskless ┊ 2021-01-21 09:21:55 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 6: Consistent replica
&lt;/h2&gt;

&lt;p&gt;Almost the same as the previous case, but instead of &lt;code&gt;StandAlone&lt;/code&gt;, the replica is marked as&lt;code&gt;Consistent&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-8379-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns              ┊      State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8379-disk-0 ┊ m13c40 ┊ 8052 ┊ Unused ┊ StandAlone(m14c6)  ┊ Consistent ┊ 2021-02-02 18:03:36 ┊
┊ one-vm-8379-disk-0 ┊ m14c15 ┊ 8052 ┊ InUse  ┊ Ok                 ┊   Diskless ┊ 2021-02-03 07:53:58 ┊
┊ one-vm-8379-disk-0 ┊ m14c6  ┊ 8052 ┊ Unused ┊ StandAlone(m13c40) ┊   UpToDate ┊                     ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯

linstor v l -r one-vm-8379-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊      State ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m13c40 ┊ one-vm-8379-disk-0 ┊ thindata             ┊     0 ┊    2051 ┊ /dev/drbd2051 ┊ 24.16 GiB ┊ Unused ┊ Consistent ┊
┊ m14c15 ┊ one-vm-8379-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    2051 ┊ /dev/drbd2051 ┊           ┊ InUse  ┊   Diskless ┊
┊ m14c6  ┊ one-vm-8379-disk-0 ┊ thindata             ┊     0 ┊    2051 ┊ /dev/drbd2051 ┊ 40.01 GiB ┊ Unused ┊   UpToDate ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In the dmesg you can see the &lt;code&gt;Unrelated data&lt;/code&gt; errors:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m14c6:~# dmesg  |grep one-vm-8379-disk-0 | grep 'Unrelated data'
[2983657.291734] drbd one-vm-8379-disk-0/0 drbd2051: Unrelated data, aborting!
[2983659.335697] drbd one-vm-8379-disk-0/0 drbd2051: Unrelated data, aborting!
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's recreate the device:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r d m13c40 one-vm-8379-disk-0
linstor rd ap one-vm-8379-disk-0
linstor r l -r one-vm-8379-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊             State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8379-disk-0 ┊ m11c44 ┊ 8052 ┊ Unused ┊ Ok    ┊ SyncTarget(8.62%) ┊ 2021-07-09 15:44:51 ┊
┊ one-vm-8379-disk-0 ┊ m14c15 ┊ 8052 ┊ InUse  ┊ Ok    ┊          Diskless ┊ 2021-02-03 07:53:58 ┊
┊ one-vm-8379-disk-0 ┊ m14c6  ┊ 8052 ┊ Unused ┊ Ok    ┊          UpToDate ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replication started, hurray!&lt;/p&gt;

&lt;h2&gt;
  
  
  Case 7: Classic split-brain situation
&lt;/h2&gt;

&lt;p&gt;Here we have two diskful replicas that cannot agree between themselves:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-8373-disk-2
╭───────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8373-disk-2 ┊ m11c12 ┊ 8069 ┊ InUse  ┊ Ok                 ┊ Diskless ┊ 2021-01-05 19:06:18 ┊
┊ one-vm-8373-disk-2 ┊ m13c23 ┊ 8069 ┊ Unused ┊ StandAlone(m14c6)  ┊ Outdated ┊                     ┊
┊ one-vm-8373-disk-2 ┊ m14c6  ┊ 8069 ┊ Unused ┊ StandAlone(m13c23) ┊ UpToDate ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor v l -r one-vm-8373-disk-2
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m11c12 ┊ one-vm-8373-disk-2 ┊ DfltDisklessStorPool ┊     0 ┊    2068 ┊ /dev/drbd2068 ┊           ┊ InUse  ┊ Diskless ┊
┊ m13c23 ┊ one-vm-8373-disk-2 ┊ thindata             ┊     0 ┊    2068 ┊ /dev/drbd2068 ┊ 19.51 GiB ┊ Unused ┊ Outdated ┊
┊ m14c6  ┊ one-vm-8373-disk-2 ┊ thindata             ┊     0 ┊    2068 ┊ /dev/drbd2068 ┊ 19.51 GiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now we have to select the replica on which we want to replace the data, and perform reconnect with discard-my-data on it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c23:~# drbdadm status one-vm-8373-disk-2
one-vm-8373-disk-2 role:Secondary
  disk:Outdated quorum:no
  m11c12 role:Primary
    peer-disk:Diskless
  m14c6 connection:StandAlone

root@m13c23:~# drbdadm disconnect one-vm-8373-disk-2
root@m13c23:~# drbdadm connect one-vm-8373-disk-2 --discard-my-data 
root@m13c23:~# drbdadm status one-vm-8373-disk-2
one-vm-8373-disk-2 role:Secondary
  disk:Outdated quorum:no
  m11c12 role:Primary
    peer-disk:Diskless
  m14c6 connection:Connecting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It becomes to &lt;code&gt;Connecting&lt;/code&gt;, now we need to reconnect the second replica:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m14c6:~# drbdadm disconnect  one-vm-8373-disk-2:m13c23
root@m14c6:~# drbdadm connect  one-vm-8373-disk-2:m13c23
root@m14c6:~# drbdadm status one-vm-8373-disk-2
one-vm-8373-disk-2 role:Secondary
  disk:UpToDate
  m11c12 role:Primary
    peer-disk:Diskless
  m13c23 role:Secondary congested:yes ap-in-flight:0 rs-in-flight:2264
    replication:SyncSource peer-disk:Inconsistent done:72.42

root@m14c6:~# drbdadm status one-vm-8373-disk-2
one-vm-8373-disk-2 role:Secondary
  disk:UpToDate
  m11c12 role:Primary
    peer-disk:Diskless
  m13c23 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 8: Stuck SyncTarget
&lt;/h2&gt;

&lt;p&gt;Sync is stuck at 81.71% and does not move&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-7584-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port ┊ Usage  ┊ Conns ┊              State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-7584-disk-0 ┊ m11c24 ┊ 8006 ┊ InUse  ┊ Ok    ┊           Diskless ┊ 2021-01-18 13:55:17 ┊
┊ one-vm-7584-disk-0 ┊ m13c3  ┊ 8006 ┊ Unused ┊ Ok    ┊ SyncTarget(81.71%) ┊                     ┊
┊ one-vm-7584-disk-0 ┊ m8c37  ┊ 8006 ┊ Unused ┊ Ok    ┊           UpToDate ┊ 2021-02-03 09:47:01 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try to reconnect:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c3:~# drbdadm status one-vm-7584-disk-0
one-vm-7584-disk-0 role:Secondary
  disk:Inconsistent
  m11c24 role:Primary
    peer-disk:Diskless
  m8c37 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:81.71

root@m13c3:~# drbdadm disconnect one-vm-7584-disk-0:m8c37
root@m13c3:~# drbdadm connect one-vm-7584-disk-0:m8c37
root@m13c3:~# drbdadm status one-vm-7584-disk-0
one-vm-7584-disk-0 role:Secondary
  disk:Inconsistent
  m11c24 role:Primary
    peer-disk:Diskless
  m8c37 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:0.00
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Now replication is stuck at zero percent, let's try to completely extinguish the device and start again:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c3:~# drbdadm down one-vm-7584-disk-0
root@m13c3:~# drbdadm up one-vm-7584-disk-0
root@m13c3:~# drbdadm status one-vm-7584-disk-0
one-vm-7584-disk-0 role:Secondary
  disk:Inconsistent quorum:no
  m11c24 role:Primary
    peer-disk:Diskless
  m8c37 connection:Connecting

root@m13c3:~# drbdadm status one-vm-7584-disk-0
one-vm-7584-disk-0 role:Secondary
  disk:UpToDate
  m11c24 role:Primary
    peer-disk:Diskless
  m8c37 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Hurray, job is done!&lt;/p&gt;

&lt;h2&gt;
  
  
  Case 9: Outdated replica, which is Connecting
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-7577-disk-2
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-7577-disk-2 ┊ m10c21 ┊ 57064 ┊ Unused ┊ Ok                 ┊ Diskless ┊ 2021-02-05 20:52:31 ┊
┊ one-vm-7577-disk-2 ┊ m13c10 ┊ 57064 ┊ InUse  ┊ Ok                 ┊ UpToDate ┊ 2021-02-05 20:52:23 ┊
┊ one-vm-7577-disk-2 ┊ m14c29 ┊ 57064 ┊ Unused ┊ Connecting(m13c10) ┊ Outdated ┊ 2021-02-05 20:52:26 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Similar to Case 2, and fixes the same way:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m14c29:~# drbdadm status one-vm-7577-disk-2
one-vm-7577-disk-2 role:Secondary
  disk:Outdated
  m10c21 role:Secondary
    peer-disk:Diskless
  m13c10 connection:Connecting

root@m14c29:~# drbdadm disconnect one-vm-7577-disk-2
root@m14c29:~# drbdadm connect one-vm-7577-disk-2
root@m14c29:~# drbdadm status one-vm-7577-disk-2
one-vm-7577-disk-2 role:Secondary
  disk:UpToDate
  m10c21 role:Secondary
    peer-disk:Diskless
  m13c10 role:Primary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 10: Unconnected / Connecting / NetworkFailure
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Caces 10 and 11 caused by &lt;a href="https://github.com/LINBIT/linstor-server/issues/150#issuecomment-882489597"&gt;bug&lt;/a&gt;, which should be fixed since LINSTOR v1.14.0 release&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Resources are periodically flapping between these states:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-10154-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ Unused ┊ Ok                 ┊ UpToDate ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m15c6  ┊ 56031 ┊ Unused ┊ Connecting(m11c37) ┊ Diskless ┊ 2021-04-08 07:46:40 ┊
┊ one-vm-10154-disk-0 ┊ m8c11  ┊ 56031 ┊ Unused ┊ Unconnected(m8c8)  ┊ Outdated ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m8c8   ┊ 56031 ┊ InUse  ┊ Unconnected(m8c11) ┊ Diskless ┊ 2021-04-08 09:04:32 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor r l -r one-vm-10154-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ Unused ┊ Ok                 ┊ UpToDate ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m15c6  ┊ 56031 ┊ Unused ┊ Connecting(m11c37) ┊ Diskless ┊ 2021-04-08 07:46:40 ┊
┊ one-vm-10154-disk-0 ┊ m8c11  ┊ 56031 ┊ Unused ┊ Connecting(m8c8)   ┊ Outdated ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m8c8   ┊ 56031 ┊ InUse  ┊ Connecting(m8c11)  ┊ Diskless ┊ 2021-04-08 09:04:32 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor r l -r one-vm-10154-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns                ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ Unused ┊ Ok                   ┊ UpToDate ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m15c6  ┊ 56031 ┊ Unused ┊ Connecting(m11c37)   ┊ Diskless ┊ 2021-04-08 07:46:40 ┊
┊ one-vm-10154-disk-0 ┊ m8c11  ┊ 56031 ┊ Unused ┊ NetworkFailure(m8c8) ┊ Outdated ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m8c8   ┊ 56031 ┊ InUse  ┊ Connecting(m8c11)    ┊ Diskless ┊ 2021-04-08 09:04:32 ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor v l -r one-vm-10154-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource            ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞═══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m11c37 ┊ one-vm-10154-disk-0 ┊ thindata             ┊     0 ┊    3194 ┊ None          ┊ 70.66 GiB ┊ Unused ┊ UpToDate ┊
┊ m15c6  ┊ one-vm-10154-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    3194 ┊ /dev/drbd3194 ┊           ┊        ┊  Unknown ┊
┊ m8c11  ┊ one-vm-10154-disk-0 ┊ thindata             ┊     0 ┊    3194 ┊ /dev/drbd3194 ┊ 31.06 GiB ┊ Unused ┊ Outdated ┊
┊ m8c8   ┊ one-vm-10154-disk-0 ┊ DfltDisklessStorPool ┊     0 ┊    3194 ┊ /dev/drbd3194 ┊           ┊ InUse  ┊ Diskless ┊
╰───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check dmesg, if you see an &lt;code&gt;Peer presented a node_id of X instead of Y&lt;/code&gt; error, then you faced a LINSTOR bug, and for some reason the IDs of the nodes were messed up:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[15962703.499997] drbd one-vm-10154-disk-0 m8c8: Peer presented a node_id of 2 instead of 3
[15962703.500003] drbd one-vm-10154-disk-0 m8c8: conn( Connecting -&amp;gt; NetworkFailure )
[15962703.551966] drbd one-vm-10154-disk-0 m8c8: Restarting sender thread
[15962703.552245] drbd one-vm-10154-disk-0 m8c8: Connection closed
[15962703.552251] drbd one-vm-10154-disk-0 m8c8: helper command: /sbin/drbdadm disconnected
[15962703.554361] drbd one-vm-10154-disk-0 m8c8: helper command: /sbin/drbdadm disconnected exit code 0
[15962703.554390] drbd one-vm-10154-disk-0 m8c8: conn( NetworkFailure -&amp;gt; Unconnected )
[15962704.555917] drbd one-vm-10154-disk-0 m8c8: conn( Unconnected -&amp;gt; Connecting )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here you can see that m8c8 is represented for m8c11 as a node with &lt;code&gt;node-id:0&lt;/code&gt;, but in fact it is &lt;code&gt;node-id:2&lt;/code&gt;. Likewise, m8c8 sees m8c11 as a node with &lt;code&gt;node-id:0&lt;/code&gt;, when it is actually &lt;code&gt;node-id: 0&lt;/code&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m8c11:~# drbdsetup status one-vm-10154-disk-0 --verbose
one-vm-10154-disk-0 node-id:0 role:Secondary suspended:no
  volume:0 minor:3194 disk:Outdated quorum:yes blocked:no
  m11c37 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
  m8c8 node-id:3 connection:Unconnected role:Unknown congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no

root@m8c8:~# drbdsetup status one-vm-10154-disk-0 --verbose
one-vm-10154-disk-0 node-id:2 role:Primary suspended:no
  volume:0 minor:3194 disk:Diskless client:yes quorum:yes blocked:no
  m11c37 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
  m8c11 node-id:0 connection:Unconnected role:Unknown congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Off peer-disk:Outdated resync-suspended:no
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is a big problem for DRBD. In this case we have to migrate the virtual machine to a correct replica, delete all other resources and create new ones:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r d m15c6 m8c11 one-vm-10154-disk-0
# linstor r l -r one-vm-10154-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ InUse ┊ Ok    ┊ UpToDate ┊                     ┊
┊ one-vm-10154-disk-0 ┊ m15c6  ┊ 56031 ┊       ┊ Ok    ┊ DELETING ┊ 2021-04-08 07:46:40 ┊
┊ one-vm-10154-disk-0 ┊ m8c11  ┊ 56031 ┊       ┊ Ok    ┊ DELETING ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────╯

# linstor r l -r one-vm-10154-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ InUse ┊ Ok    ┊ UpToDate ┊                     ┊
╰───────────────────────────────────────────────────────────────────────────────────────╯

# linstor rd ap one-vm-10154-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName        ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊              State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-10154-disk-0 ┊ m10c2  ┊ 56031 ┊ Unused ┊ Ok    ┊ SyncTarget(27.34%) ┊ 2021-07-09 14:55:47 ┊
┊ one-vm-10154-disk-0 ┊ m11c37 ┊ 56031 ┊ InUse  ┊ Ok    ┊           UpToDate ┊                     ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 11: One of the diskful replicas has no connection to the diskless replica
&lt;/h2&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; Caces 10 and 11 caused by &lt;a href="https://github.com/LINBIT/linstor-server/issues/150#issuecomment-882489597"&gt;bug&lt;/a&gt;, which should be fixed since LINSTOR v1.14.0 release&lt;br&gt;
&lt;/p&gt;
&lt;/blockquote&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-8760-disk-0
╭────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns              ┊    State ┊ CreatedOn           ┊
╞════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8760-disk-0 ┊ m13c18 ┊ 55165 ┊ Unused ┊ Connecting(m8c9)   ┊ Outdated ┊                     ┊
┊ one-vm-8760-disk-0 ┊ m14c27 ┊ 55165 ┊ Unused ┊ Ok                 ┊ UpToDate ┊ 2021-02-03 12:00:35 ┊
┊ one-vm-8760-disk-0 ┊ m8c9   ┊ 55165 ┊ InUse  ┊ Connecting(m13c18) ┊ Diskless ┊ 2021-04-08 09:04:07 ┊
╰────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Let's try some magic here:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c18:~# drbdadm down one-vm-8760-disk-0
root@m13c18:~# drbdadm up one-vm-8760-disk-0
root@m13c18:~# drbdadm status one-vm-8760-disk-0
one-vm-8760-disk-0 role:Secondary
  disk:Outdated
  m14c27 role:Secondary
    peer-disk:UpToDate
  m8c9 connection:Connecting
root@m13c18:~# drbdadm status one-vm-8760-disk-0
one-vm-8760-disk-0 role:Secondary
  disk:Outdated
  m14c27 role:Secondary
    peer-disk:UpToDate
  m8c9 connection:Unconnected

root@m8c9:~# drbdadm disconnect one-vm-8760-disk-0:m13c18
root@m8c9:~# drbdadm connect one-vm-8760-disk-0:m13c18
root@m8c9:~# drbdadm status one-vm-8760-disk-0
one-vm-8760-disk-0 role:Primary
  disk:Diskless
  m13c18 connection:Unconnected
  m14c27 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Damn, the same situation as in the previous case:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-8760-disk-0
╭──────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns                ┊    State ┊ CreatedOn           ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8760-disk-0 ┊ m13c18 ┊ 55165 ┊ Unused ┊ NetworkFailure(m8c9) ┊ Outdated ┊                     ┊
┊ one-vm-8760-disk-0 ┊ m14c27 ┊ 55165 ┊ Unused ┊ Ok                   ┊ UpToDate ┊ 2021-02-03 12:00:35 ┊
┊ one-vm-8760-disk-0 ┊ m8c9   ┊ 55165 ┊ InUse  ┊ Unconnected(m13c18)  ┊ Diskless ┊ 2021-04-08 09:04:07 ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Check dmesg:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;[14635995.191931] drbd one-vm-8760-disk-0 m8c9: conn( Unconnected -&amp;gt; Connecting )
[14635995.740020] drbd one-vm-8760-disk-0 m8c9: Peer presented a node_id of 3 instead of 2
[14635995.740051] drbd one-vm-8760-disk-0 m8c9: conn( Connecting -&amp;gt; NetworkFailure )
[14635995.775994] drbd one-vm-8760-disk-0 m8c9: Restarting sender thread
[14635995.777153] drbd one-vm-8760-disk-0 m8c9: Connection closed
[14635995.777174] drbd one-vm-8760-disk-0 m8c9: helper command: /sbin/drbdadm disconnected
[14635995.789649] drbd one-vm-8760-disk-0 m8c9: helper command: /sbin/drbdadm disconnected exit code 0
[14635995.789707] drbd one-vm-8760-disk-0 m8c9: conn( NetworkFailure -&amp;gt; Unconnected )
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Yep, exactly! The node IDs are messed up again.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m8c9:~# drbdsetup status one-vm-8760-disk-0 --verbose
one-vm-8760-disk-0 node-id:3 role:Primary suspended:no
  volume:0 minor:1016 disk:Diskless client:yes quorum:yes blocked:no
  m13c18 node-id:0 connection:Unconnected role:Unknown congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Off peer-disk:Outdated resync-suspended:no
  m14c27 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no

root@m13c18:~# drbdsetup status one-vm-8760-disk-0 --verbose
one-vm-8760-disk-0 node-id:0 role:Secondary suspended:no
  volume:0 minor:1016 disk:Outdated quorum:yes blocked:no
  m14c27 node-id:1 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:UpToDate resync-suspended:no
  m8c9 node-id:2 connection:Unconnected role:Unknown congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Off peer-disk:DUnknown resync-suspended:no

root@m14c27:~# drbdsetup status one-vm-8760-disk-0 --verbose
one-vm-8760-disk-0 node-id:1 role:Secondary suspended:no
  volume:0 minor:1016 disk:UpToDate quorum:yes blocked:no
  m13c18 node-id:0 connection:Connected role:Secondary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Outdated resync-suspended:no
  m8c9 node-id:3 connection:Connected role:Primary congested:no ap-in-flight:0 rs-in-flight:0
    volume:0 replication:Established peer-disk:Diskless peer-client:yes resync-suspended:no
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;m13c18 sees m8c9 as &lt;code&gt;node-id:2&lt;/code&gt;, but in fact it is &lt;code&gt;node-id:3&lt;/code&gt;. We have to migrate the VM to a correct replica and recreate the rest of them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor r l -r one-vm-8760-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns               ┊    State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8760-disk-0 ┊ m13c18 ┊ 55165 ┊ Unused ┊ Unconnected(m8c9)   ┊ Outdated ┊                     ┊
┊ one-vm-8760-disk-0 ┊ m14c27 ┊ 55165 ┊ Unused ┊ Ok                  ┊ UpToDate ┊ 2021-02-03 12:00:35 ┊
┊ one-vm-8760-disk-0 ┊ m8c9   ┊ 55165 ┊ InUse  ┊ Unconnected(m13c18) ┊ Diskless ┊ 2021-04-08 09:04:07 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────────╯

# linstor r l -r one-vm-8760-disk-0
╭───────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊    State ┊ CreatedOn           ┊
╞═══════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8760-disk-0 ┊ m13c18 ┊ 55165 ┊ Unused ┊ Ok    ┊ UpToDate ┊                     ┊
┊ one-vm-8760-disk-0 ┊ m14c27 ┊ 55165 ┊ InUse  ┊ Ok    ┊ UpToDate ┊ 2021-02-03 12:00:35 ┊
╰───────────────────────────────────────────────────────────────────────────────────────╯

# linstor r d m13c18 one-vm-8760-disk-0
# linstor rd ap one-vm-8760-disk-0
# linstor r l -r one-vm-8760-disk-0
╭─────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ ResourceName       ┊ Node   ┊ Port  ┊ Usage  ┊ Conns ┊              State ┊ CreatedOn           ┊
╞═════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ one-vm-8760-disk-0 ┊ m14c27 ┊ 55165 ┊ InUse  ┊ Ok    ┊           UpToDate ┊ 2021-02-03 12:00:35 ┊
┊ one-vm-8760-disk-0 ┊ m8c6   ┊ 55165 ┊ Unused ┊ Ok    ┊ SyncTarget(78.57%) ┊ 2021-07-09 15:30:55 ┊
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 12: Consistent replica
&lt;/h2&gt;

&lt;p&gt;We are on a diskless node and looking at the resource states:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m11c39:~# drbdadm status one-vm-6967-disk-0
one-vm-6967-disk-0 role:Primary
  disk:Diskless
  m13c15 role:Secondary
    peer-disk:Consistent
  m14c40 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is an very unpleasant situation. Both diskfull replicas are &lt;code&gt;UpToDate&lt;/code&gt;, but the diskless replica works only with one of them, the second is marked as &lt;code&gt;Consistent&lt;/code&gt;. It occurs as a result of bug with the dikless primary on 9.0.19, . However I also managed to catch it on 9.0.21, but much less often.&lt;/p&gt;

&lt;p&gt;When you try to disconnect the resource on the &lt;code&gt;m14c40&lt;/code&gt; node, you will see that this is impossible, since the diskless replica is currently using it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m14c40:~# drbdadm disconnect one-vm-6967-disk-0
one-vm-6967-disk-0: State change failed: (-10) State change was refused by peer node
additional info from kernel:
Declined by peer m11c39 (id: 3), see the kernel log there
Command 'drbdsetup disconnect one-vm-6967-disk-0 3' terminated with exit code 11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This can be fixed as follows:&lt;/p&gt;

&lt;p&gt;Do the disconnect and invalidate on Consistent node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c15:~# drbdadm disconnect one-vm-6967-disk-0
root@m13c15:~# drbdadm invalidate one-vm-6967-disk-0
root@m13c15:~# drbdadm connect one-vm-6967-disk-0
root@m13c15:~# drbdadm status one-vm-6967-disk-0
one-vm-6967-disk-0 role:Secondary
  disk:Inconsistent
  m11c39 role:Primary
    peer-disk:Diskless
  m14c40 role:Secondary
    replication:SyncTarget peer-disk:UpToDate done:3.04
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here we see that the synchronization was completed, but the resource become into &lt;code&gt;Inconsistent&lt;/code&gt; state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c15:~# drbdadm status one-vm-6967-disk-0
one-vm-6967-disk-0 role:Secondary
  disk:Inconsistent
  m11c39 role:Primary
    peer-disk:Diskless
  m14c40 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To solve this situation, you need to perform the disconnect/connect operation with the other diskful replica:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m13c15:~# drbdadm disconnect one-vm-6967-disk-0:m14c40
root@m13c15:~# drbdadm connect one-vm-6967-disk-0:m14c40
root@m13c15:~# drbdadm status one-vm-6967-disk-0
one-vm-6967-disk-0 role:Secondary
  disk:UpToDate
  m11c39 role:Primary
    peer-disk:Diskless
  m14c40 role:Secondary
    peer-disk:UpToDate
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Case 13: Forgotten resource
&lt;/h2&gt;

&lt;p&gt;In older versions of LINSTOR it was possible to face a bug when a resource was deleted but diskless replicas remained on the node:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;one-vm-7792-disk-0 role:Secondary
  disk:Diskless quorum:no
  m13c9 connection:Connecting
  m14c13 connection:Connecting
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This resource is not existing in LINSTOR anymore:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;linstor r l -r one-vm-7792-disk-0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;So we can safely shut it down via drbdsetup:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;root@m14c43:~# drbdsetup down one-vm-7792-disk-0
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Note:&lt;/strong&gt; drbdsetup is a lower-level utility than drbdadm. drbdsetup communicates directly with the kernel and does not require a config file for the drbd resource.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;h2&gt;
  
  
  Case 14: Corrupted bitmap
&lt;/h2&gt;

&lt;p&gt;And finally, the most delicious. A DRBD bug that was found in 9.0.19 version, but was later fixed. Let's say you have just created a new replica on m10c23, it has synchronized and went into this state:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# linstor v l -r one-vm-5460-disk-2
╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
┊ Node   ┊ Resource           ┊ StoragePool          ┊ VolNr ┊ MinorNr ┊ DeviceName    ┊ Allocated ┊ InUse  ┊    State ┊
╞══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════╡
┊ m10c23 ┊ one-vm-5460-disk-2 ┊ thindata             ┊     0 ┊    2665 ┊ /dev/drbd2665 ┊ 11.71 GiB ┊ Unused ┊ UpToDate ┊
┊ m11c35 ┊ one-vm-5460-disk-2 ┊ DfltDisklessStorPool ┊     0 ┊    2665 ┊ /dev/drbd2665 ┊           ┊ InUse  ┊ Diskless ┊
┊ m14c2  ┊ one-vm-5460-disk-2 ┊ diskless             ┊     0 ┊    2665 ┊ /dev/drbd2665 ┊           ┊ InUse  ┊ Diskless ┊
┊ m15c17 ┊ one-vm-5460-disk-2 ┊ thindata             ┊     0 ┊    2665 ┊ /dev/drbd2665 ┊ 28.01 GiB ┊ Unused ┊ UpToDate ┊
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Both replicas are &lt;code&gt;UpToDate&lt;/code&gt;, but note the Allocated size. One of them uses much less space than the other one.&lt;br&gt;
According the DRBD logic, the primary diskless replica works with all secondary replicas, that is, it reads and writes to both diskful replicas at once.&lt;/p&gt;

&lt;p&gt;Thus, the virtual machine will begin to be confused as it will read some data from the normal replica, and some from the bad one, thereby damaging its own file system.&lt;/p&gt;

&lt;p&gt;The cause of this problem is a broken bitmap, and now we need to fix it.&lt;br&gt;
The fact is that DRBD has some kind of changelog inside the device, which records where and what data was changed during the time. Thus, in case of disconnect and reconnection, only the changed data is synchronized, but not the entire device. In other words, now, as a result of a DRBD bug, we have an incorrect changelog.&lt;/p&gt;

&lt;p&gt;Here I would like to note right away that there is a difference in the logic of the classic DRBD and how LINSTOR works with it. The fact is that LINSTOR stores the zero-day value of the changelog in its metadata and sets this value each time when a new replica is created. Thus, changes to a new replica are synchronized only according to the changelog. Due to this, if the changelog is small, then the synchronization is completed very quickly, unlike the case of performing full initial synchronization.&lt;/p&gt;

&lt;p&gt;While the standard DRBD logic does not offer such an "improvement" and performs full resync every time for all new replicas, that is, even if your changelog is damaged, synchronization will always be successful.&lt;/p&gt;

&lt;p&gt;You can diagnose differences in two replicas by running the command &lt;code&gt;drbdadm verify&lt;/code&gt; on one of them, for example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm verify one-vm-5460-disk-2:m15c17
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Will perform the check &lt;code&gt;one-vm-5460-disk-2&lt;/code&gt; against the resource located on &lt;code&gt;m15c17&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;After that, all unsynchronized sectors will be marked for synchronization, and it will be enough to do:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm disconnect one-vm-5460-disk-2
drbdadm connect one-vm-5460-disk-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;To start the syncing. An alternative solution would be to invalidate the entire replica at once and reconnect it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;drbdadm disconnect one-vm-5460-disk-2
drbdadm invalidate one-vm-5460-disk-2
drbdadm connect one-vm-5460-disk-2
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then it will be fully synchronized. But even though the sync is successful, it will continue to have the wrong changelog. Thus, all new replicas synchronized through the changelog will have the same problem again.&lt;/p&gt;

&lt;p&gt;Okay, we're done with the theory, now let's get back to our situation. Usually, in this case, it makes sense to immediately delete the created replica and perform the following actions:&lt;/p&gt;

&lt;p&gt;Shutdown our virtual machine, or if this action is not available make an external snapshot, and then run:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dd if=/dev/drbd2665 of=/dev/drbd2665 status=progress bs=65536 conv=notrunc,sparse iflag=direct,fullblock oflag=direct
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Here dd will read and write it back by byte the entire device, thereby correcting our changelog. Do not forget to commit our external snapshot (if you did), and now you can safely create new replicas using LINSTOR.&lt;/p&gt;




&lt;p&gt;That's all for now. Thank you for your attention. Hope it is useful to you.&lt;/p&gt;

</description>
      <category>linux</category>
      <category>storage</category>
      <category>drbd</category>
      <category>troubleshooting</category>
    </item>
  </channel>
</rss>
