<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Bruno Calza</title>
    <description>The latest articles on DEV Community by Bruno Calza (@brunocalza).</description>
    <link>https://dev.to/brunocalza</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F557386%2F37cbaf34-349d-4b87-a4fb-1acabda59dc3.jpg</url>
      <title>DEV Community: Bruno Calza</title>
      <link>https://dev.to/brunocalza</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/brunocalza"/>
    <language>en</language>
    <item>
      <title>But how, exactly, databases use mmap?</title>
      <dc:creator>Bruno Calza</dc:creator>
      <pubDate>Thu, 21 Jan 2021 00:26:32 +0000</pubDate>
      <link>https://dev.to/brunocalza/but-how-exactly-databases-use-mmap-574n</link>
      <guid>https://dev.to/brunocalza/but-how-exactly-databases-use-mmap-574n</guid>
      <description>&lt;p&gt;In a previous post &lt;a href="https://dev.to/discovering-and-exploring-mmap-using-go/"&gt;Discovering and exploring mmap using Go&lt;/a&gt;, we talked about how databases have a major problem to solve, which is: &lt;strong&gt;how to deal with data stored in disk that is bigger than the available memory&lt;/strong&gt;. We talked about how many databases solve this problem using &lt;strong&gt;memory-mapped files&lt;/strong&gt; and explored &lt;strong&gt;mmap&lt;/strong&gt; capabilities.&lt;/p&gt;

&lt;p&gt;Knowing that databases use &lt;strong&gt;memory-mapped files *&lt;em&gt;to solve the problem was not enough for me. It solved part of the mystery but a question remained: **how, exactly, databases use *mmap&lt;/em&gt; to read and write data from disk?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;I decided to dig through a database source code to answer that question. There are plenty of databases that use mmap. Some of them decided to not use anymore. Some examples: &lt;a href="https://www.sqlite.org/index.html"&gt;SQLite&lt;/a&gt; has an option of accessing disk content directly using memory-mapped I/O[1], it seems &lt;a href="https://github.com/google/leveldb"&gt;LevelDB&lt;/a&gt; used to use but it changed it[2], &lt;a href="https://lucene.apache.org/"&gt;Lucene&lt;/a&gt; has an option with &lt;em&gt;MMapDirectory&lt;/em&gt;[3], &lt;a href="https://lmdb.readthedocs.io/"&gt;LMDB&lt;/a&gt; uses mmap[4], a simple key/value in-memory database from Counchbase called &lt;a href="https://github.com/couchbase/moss"&gt;moss&lt;/a&gt; uses mmap for durability of in-memory data[5] and &lt;a href="https://www.mongodb.com/"&gt;MongoDB&lt;/a&gt; removed &lt;strong&gt;mmap&lt;/strong&gt; storage engine for &lt;em&gt;WiredTiger&lt;/em&gt;[6].&lt;/p&gt;

&lt;p&gt;I chose &lt;a href="https://github.com/boltdb/bolt"&gt;bolt&lt;/a&gt;, a simple &lt;strong&gt;key/value store&lt;/strong&gt; implemented in &lt;strong&gt;Go&lt;/strong&gt; by &lt;a href="https://twitter.com/benbjohnson"&gt;Ben Johnson&lt;/a&gt; inspired by the &lt;strong&gt;LMDB&lt;/strong&gt; project, for this endeavor. Mostly because of source code simplicity and my familiarity with Go language. I know a simple key/value store might not be the most complete source code for learning all the details of reading/writing data to disk, but as I have found out, it was more than enough to get a grasp of it. &lt;/p&gt;

&lt;p&gt;The original bolt repository is no longer maintained. A fork of &lt;strong&gt;bolt&lt;/strong&gt; called &lt;strong&gt;bbolt&lt;/strong&gt; is maintained and used by &lt;a href="https://github.com/etcd-io/bbolt"&gt;etcd&lt;/a&gt;. If you are not familiar with &lt;strong&gt;bolt&lt;/strong&gt;, I recommend the articles &lt;a href="https://npf.io/2014/07/intro-to-boltdb-painless-performant-persistence/"&gt;Intro to BoltDB: Painless Performant Persistence&lt;/a&gt; and &lt;a href="https://www.progville.com/go/bolt-embedded-db-golang/"&gt;Bolt — an embedded key/value database for Go &lt;/a&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The start
&lt;/h2&gt;

&lt;p&gt;I download the code to my machine and opened it in my editor. I thought a good place to start digging was to find out where the database was initialized and look for any references of &lt;strong&gt;mmap&lt;/strong&gt; there. Like most embedded databases, &lt;strong&gt;bolt **has an &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L150"&gt;Open&lt;/a&gt; method for opening the database or creating a new one if it does not exist. Inside it, I found a reference to a private **mmap&lt;/strong&gt; function. That's a good start. &lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Memory map the data file.
if err := db.mmap(options.InitialMmapSize); err != nil {
    _ = db.close()
    return nil, err
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
  
  
  How much memory should I allocate?
&lt;/h2&gt;

&lt;p&gt;The private &lt;a href="https://dev.to/p/b43be99e-c260-446b-a1b1-40261aaffcf3/func%20(db%20*DB)%20mmap(minsz%20int)%20error%20%7B"&gt;mmap&lt;/a&gt; is responsible for opening the memory-mapped file. In order to do this, it needs to figure out how much memory it is going to allocate. This task is accomplished by another method called &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L308"&gt;mmapSize&lt;/a&gt;. Given the size of the database, this method figures out how many bytes of memory should be allocated.&lt;/p&gt;

&lt;p&gt;It starts by doubling the size from &lt;strong&gt;32KB&lt;/strong&gt; to &lt;strong&gt;1GB&lt;/strong&gt;. But if the database is larger than &lt;strong&gt;1GB&lt;/strong&gt;, it grows &lt;strong&gt;1GB&lt;/strong&gt; at a time.&lt;/p&gt;
&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Double the size from 32KB until 1GB.
for i := uint(15); i &amp;lt;= 30; i++ {
    if size &amp;lt;= 1&amp;lt;&amp;lt;i {
        return 1 &amp;lt;&amp;lt; i, nil
    }
}

...

// If larger than 1GB then grow by 1GB at a time.
sz := int64(size)
if remainder := sz % int64(maxMmapStep); remainder &amp;gt; 0 {
    sz += int64(maxMmapStep) - remainder
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It is all fine by now. That's how &lt;strong&gt;bolt&lt;/strong&gt; figured out how much to allocate. But there is a piece of the puzzle now that will be very important when we talk about database storage layout. After figuring out how much to allocate, it needs to ensure that the allocated size is a multiple of the &lt;strong&gt;page size&lt;/strong&gt;. If you are not familiar with database storage and don't know what a &lt;strong&gt;page&lt;/strong&gt; is, don't worry, we'll be back to this.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Ensure that the mmap size is a multiple of the page size.
// This should always be true since we're incrementing in MBs.
pageSize := int64(db.pageSize)
if (sz % pageSize) != 0 {
    sz = ((sz / pageSize) + 1) * pageSize
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Shouldn't it check if we are allocating more than we have available? That's the last piece of the &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L308"&gt;mmapSize&lt;/a&gt; method.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// If we've exceeded the max size then only grow up to the max size.
if sz &amp;gt; maxMapSize {
    sz = maxMapSize
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;The &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/bolt_amd64.go#L4"&gt;maxMapSize&lt;/a&gt; constant is set to &lt;code&gt;0xFFFFFFFFFFFF&lt;/code&gt; on &lt;strong&gt;AMD64&lt;/strong&gt; architectures.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;The &lt;strong&gt;AMD64&lt;/strong&gt; architecture defines a 64-bit virtual address format, of which the low-order 48 bits are used in current implementations. This allows up to 256 TiB (248 bytes) of virtual address space. - &lt;a href="https://en.wikipedia.org/wiki/X86-64"&gt;x86-64 Wiki&lt;/a&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is the limit of the &lt;strong&gt;bolt&lt;/strong&gt; database file.&lt;/p&gt;

&lt;h2&gt;
  
  
  Calling the system call
&lt;/h2&gt;

&lt;p&gt;Now that &lt;strong&gt;bolt&lt;/strong&gt; knows how much it should allocate, it calls the system call &lt;a href="https://dev.to/p/b43be99e-c260-446b-a1b1-40261aaffcf3/func%20mmap(db%20*DB,%20sz%20int)%20error%20%7B"&gt;mmap&lt;/a&gt;. Here is the full code for &lt;em&gt;Unix&lt;/em&gt;-like environments:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// mmap memory maps a DB's data file.
func mmap(db *DB, sz int) error {
    // Map the data file to memory.
    b, err := syscall.Mmap(int(db.file.Fd()), 0, sz, syscall.PROT_READ, syscall.MAP_SHARED|db.MmapFlags)
    if err != nil {
        return err
    }

    // Advise the kernel that the mmap is accessed randomly.
    if err := madvise(b, syscall.MADV_RANDOM); err != nil {
        return fmt.Errorf("madvise: %s", err)
    }

    // Save the original byte slice and convert to a byte array pointer.
    db.dataref = b
    db.data = (*[maxMapSize]byte)(unsafe.Pointer(&amp;amp;b[0]))
    db.datasz = sz
    return nil
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;It is a pretty straightforward code. I have some observations to make about &lt;code&gt;syscall.PROT_READ&lt;/code&gt;, but I'll leave for the next session. It is nice to see the call to madvise there, although I don't know its benefits. I would love to know the importance of that call, if it improves performance or if another flag could be set to get different behavior from OS for different use cases.&lt;/p&gt;

&lt;p&gt;The mapped memory is set to the variables &lt;code&gt;db.dataref&lt;/code&gt; and &lt;code&gt;db.data&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;db.dataref = b
db.data = (*[maxMapSize]byte)(unsafe.Pointer(&amp;amp;b[0]))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I would like to know about the importance of keeping track of both variables. I could not grasp what is going on in the conversion to &lt;code&gt;db.data&lt;/code&gt;. But anyway, what we have to keep in mind is that is through these variables that &lt;strong&gt;bolt&lt;/strong&gt; will read data from disk.&lt;/p&gt;

&lt;h2&gt;
  
  
  What about writes?
&lt;/h2&gt;

&lt;p&gt;While skimming through the source code, I looked for evidence of how &lt;strong&gt;mmap&lt;/strong&gt; was used for both reads and writes. I dug both &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/bucket.go#L266"&gt;Get&lt;/a&gt; and &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/bucket.go#L285"&gt;Put&lt;/a&gt; method. I could not find any place where the references to &lt;code&gt;db.dataref&lt;/code&gt; or &lt;code&gt;db.data&lt;/code&gt; were being updated. I discovered that the writes to disk happen when &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/tx.go#L144"&gt;Commit&lt;/a&gt; is called to a transaction. But there I could only find calls to &lt;a href="https://golang.org/pkg/os/#File.WriteAt"&gt;WriteAt&lt;/a&gt;. So I gave up my search of trying to understand how &lt;strong&gt;mmap&lt;/strong&gt; was used for writes.&lt;/p&gt;

&lt;p&gt;Then, suddenly, while looking back to the call of &lt;strong&gt;mmap&lt;/strong&gt;, I noticed the  &lt;code&gt;syscall.PROT_READ&lt;/code&gt; flag that I have not noticed the first time I looked at the code. So &lt;strong&gt;mmap&lt;/strong&gt;, is only used for reads in &lt;strong&gt;bolt&lt;/strong&gt;. Another place that indicates this is in the definition of &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L100"&gt;DB struct&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;dataref  []byte   // mmap'ed readonly, write throws SEGV
data     *[maxMapSize]byte
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That made perfect sense to me. Since flushes to disk are very hard to control when using &lt;strong&gt;mmap&lt;/strong&gt;, it is probably the safest approach. How &lt;strong&gt;bolt&lt;/strong&gt; does writing is a topic of another post. &lt;/p&gt;

&lt;h2&gt;
  
  
  How the database file is structured?
&lt;/h2&gt;

&lt;p&gt;We know how and when** bolt &lt;strong&gt;allocates memory and that **mmap&lt;/strong&gt; is not used for writes. But how, exactly, &lt;strong&gt;bolt&lt;/strong&gt; can find the value of a key? To understand that, we have to understand how typically databases structure their files. I am not going to do deep here. Mostly because I don't understand enough to go deep. Just going to try to give a glimpse of what is going on. &lt;/p&gt;

&lt;p&gt;A file is just an array of bytes. We have to apply some reasoning to this array of bytes to work with it effectively. Databases structure their files in disk into blocks (chunks of bytes) called &lt;strong&gt;pages&lt;/strong&gt;. &lt;strong&gt;bolt **is no different. The database file can be seen as&lt;br&gt;
&lt;a href="/content/images/2021/01/image--17-.png" class="article-body-image-wrapper"&gt;&lt;img src="/content/images/2021/01/image--17-.png" alt=""&gt;&lt;/a&gt;&lt;br&gt;
Each page has a **fixed length of bytes&lt;/strong&gt;, typically the same size as the OS page (usually &lt;code&gt;4096 bytes&lt;/code&gt;). Here is the &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L40"&gt;part&lt;/a&gt; of &lt;strong&gt;bolt&lt;/strong&gt; that sets the &lt;code&gt;pageSize&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// default page size for db is set to the OS page size.
var defaultPageSize = os.Getpagesize()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Every database has its own page layout. The page layout of &lt;strong&gt;bolt&lt;/strong&gt; is defined at &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/page.go#L28"&gt;page.go&lt;/a&gt; as&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;const (
    branchPageFlag   = 0x01
    leafPageFlag     = 0x02
    metaPageFlag     = 0x04
    freelistPageFlag = 0x10
)

const (
    bucketLeafFlag = 0x01
)

type pgid uint64

type page struct {
    id       pgid
    flags    uint16
    count    uint16
    overflow uint32
    ptr      uintptr
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;id&lt;/code&gt;: it is the page identifier used to index the page. Given a page id, I can locate it in the disk through &lt;strong&gt;mmap&lt;/strong&gt;, since the disk file is just a list of continuous fixed length pages;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;flags&lt;/code&gt;: it tells the page type. There are four types of page: &lt;code&gt;meta&lt;/code&gt;, &lt;code&gt;freeList&lt;/code&gt;, &lt;code&gt;leaf&lt;/code&gt; and &lt;code&gt;branch&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;count&lt;/code&gt;: indicates the number of elements stored in the page;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;overflow&lt;/code&gt;: represents the number of subsequent pages;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;ptr&lt;/code&gt;: indicates the end of page header and start of page data. This is where the keys and values are going to be stored.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;A visual representation of a page:&lt;br&gt;
&lt;a href="/content/images/2021/01/image--16-.png" class="article-body-image-wrapper"&gt;&lt;img src="/content/images/2021/01/image--16-.png" alt=""&gt;&lt;/a&gt;Layout of a page&lt;br&gt;
With this in mind, we can look at the code that retrieves a &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L792"&gt;page&lt;/a&gt; given its id.&lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// page retrieves a page reference from the mmap based on the current page size.&lt;br&gt;
func (db *DB) page(id pgid) *page {&lt;br&gt;
    pos := id * pgid(db.pageSize)&lt;br&gt;
    return (*page)(unsafe.Pointer(&amp;amp;db.data[pos]))&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  How to perform an efficient search?&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;We know how the database file is structured in disk and we know how to retrieve a page from disk. But how a &lt;code&gt;bucket.Get([]byte("key"))&lt;/code&gt; search works? We are not going to go into too much detail here. I hope the abstraction I created will be enough to get a clue about what is going on.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;What if the pages themselves, in the data part, contained references to other pages? And what if these references build up to form a B+Tree?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is exactly what &lt;strong&gt;bolt&lt;/strong&gt; does. Thinking of the page as a node of a &lt;strong&gt;B+Tree&lt;/strong&gt;. In a &lt;strong&gt;B+Tree&lt;/strong&gt; we have internal nodes and leaves. That's the reason for the &lt;code&gt;flags&lt;/code&gt; attribute, to indicate what kind of node that page is.&lt;br&gt;
&lt;a href="/content/images/2021/01/image--18-.png" class="article-body-image-wrapper"&gt;&lt;img src="/content/images/2021/01/image--18-.png" alt=""&gt;&lt;/a&gt;Abstract representation of a B+Tree in &lt;strong&gt;bolt&lt;/strong&gt;&lt;br&gt;
So to perform the search of a key, you start at the root node a do a B+Tree traversal on mmapped disk pages. Therefore, &lt;strong&gt;bolt&lt;/strong&gt; is a memory-mapped B+Tree file. The more memory you have, the more it will behave like a memory key/value store.&lt;/p&gt;

&lt;p&gt;There are many more details about this process. Most of it I don't understand myself. So let's just keep it simple at this abstract level. &lt;/p&gt;

&lt;h2&gt;
  
  
  Resizing mmap
&lt;/h2&gt;

&lt;p&gt;When I started looking at the source code, I searched for all calls of &lt;code&gt;mmap&lt;/code&gt;. The first was found at the Open method, as explained in the beginning. And the second one was found at the &lt;a href="https://github.com/boltdb/bolt/blob/fd01fc79c553a8e99d512a07e8e0c63d4a3ccfc5/db.go#L827"&gt;allocate&lt;/a&gt; method.&lt;/p&gt;

&lt;p&gt;When &lt;strong&gt;bolt&lt;/strong&gt; is writing it needs to make sure it is not consuming all the allocated memory. If it sees that it is going to exceed the database size, it resizes the memory. &lt;/p&gt;


&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;// Resize mmap() if we're at the end.&lt;br&gt;
p.id = db.rwtx.meta.pgid&lt;br&gt;
var minsz = int((p.id+pgid(count))+1) * db.pageSize&lt;br&gt;
if minsz &amp;gt;= db.datasz {&lt;br&gt;
    if err := db.mmap(minsz); err != nil {&lt;br&gt;
        return nil, fmt.Errorf("mmap allocate error: %s", err)&lt;br&gt;
    }&lt;br&gt;
}&lt;br&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;
&lt;h2&gt;
&lt;br&gt;
  &lt;br&gt;
  &lt;br&gt;
  Conclusion&lt;br&gt;
&lt;/h2&gt;

&lt;p&gt;Reading &lt;strong&gt;bolt **source code is a very nice way of understanding database internals. It was not very intimidating as I thought it would be. We ignored most of its source code, trying to focus only on how **mmap&lt;/strong&gt; is used to retrieve data from disk efficiently. There are so many more concepts we can learn from **bolt **liketransactions, atomic, isolation, concurrency control, but I'll leave for another posts.&lt;/p&gt;

&lt;p&gt;It is important to remind that the strategy used by **bolt **is just one of multiple strategies. Others databases uses different page layouts and different data structures. However, on a higher level the logic of mmapped databases should be the same I guess. &lt;/p&gt;

&lt;p&gt;I am on a journey to learn more about databases. If you are interested you can follow me on &lt;a href="https://twitter.com/brunocalza"&gt;twitter&lt;/a&gt;, where I share more related content.&lt;/p&gt;

&lt;h3&gt;
  
  
  If you want to learn more about bolt
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://youjiali1995.github.io/storage/boltdb/"&gt;Boltdb source code analysis&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.jianshu.com/p/b86a69892990"&gt;BoltDB for block persistence (1)&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;&lt;a href="https://www.youtube.com/watch?v=bouIpFd9VGM"&gt;Go-nuts and Bolts: An Introduction to BoltDB&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  References
&lt;/h2&gt;

&lt;p&gt;[1]&lt;a href="https://sqlite.org/mmap.html"&gt;Memory-Mapped I/O&lt;/a&gt;&lt;br&gt;
[2]&lt;a href="https://groups.google.com/g/leveldb/c/C5Hh__JfdrQ"&gt;mmap based writing vs. stdio based writing&lt;/a&gt;&lt;br&gt;
[3]&lt;a href="https://jprante.github.io/lessons/2012/07/26/Mmap-with-Lucene.html"&gt;Memory-mapped files with Lucene: some more aspects&lt;/a&gt;&lt;br&gt;
[4]&lt;a href="https://symas.com/performance-tradeoffs-in-lmdb/"&gt;PERFORMANCE TRADEOFFS IN LMDB&lt;/a&gt;&lt;br&gt;
[5]&lt;a href="https://www.youtube.com/watch?v=ttebJcN5bgQ"&gt;Marty Schoch - Building a High-Performance Key/Value Store in Go&lt;/a&gt;&lt;br&gt;
[6]&lt;a href="https://docs.mongodb.com/manual/core/storage-engines/"&gt;Storage Engines&lt;/a&gt;&lt;/p&gt;

</description>
      <category>go</category>
      <category>database</category>
    </item>
    <item>
      <title>Discovering and exploring mmap using Go</title>
      <dc:creator>Bruno Calza</dc:creator>
      <pubDate>Thu, 14 Jan 2021 20:01:07 +0000</pubDate>
      <link>https://dev.to/brunocalza/discovering-and-exploring-mmap-using-go-54h1</link>
      <guid>https://dev.to/brunocalza/discovering-and-exploring-mmap-using-go-54h1</guid>
      <description>&lt;p&gt;Recently I've come to know the concept of &lt;strong&gt;memory-mapped files&lt;/strong&gt; while watching a lecture of the course &lt;a href="https://15445.courses.cs.cmu.edu/fall2019/"&gt;Intro to Database Systems&lt;/a&gt; of &lt;a href="https://twitter.com/andy_pavlo"&gt;Andy Pavlo&lt;/a&gt; on database storage. One of the main problems a database storage engine has to solve is &lt;strong&gt;how to deal with data in disk that is bigger than the available memory&lt;/strong&gt;. At a higher level, the main purpose of a disk-oriented storage engine is to manipulate data files in a disk. But if we assume that the data in the disk will eventually get bigger than the available memory, we cannot simply load the whole data file into memory, do the change, and write it back to disk.&lt;/p&gt;

&lt;p&gt;This is not a new problem in Computer Science. When operational systems were being developed in the early 1960s, a similar problem was faced: &lt;strong&gt;how can we run programs stored in disk that are larger than the available memory?&lt;/strong&gt; A solution to this problem was made by a group in Manchester, implemented on the &lt;a href="https://en.wikipedia.org/wiki/Atlas_(computer)"&gt;Atlas Computer&lt;/a&gt;, in 1961. It was called &lt;em&gt;virtual memory&lt;/em&gt;. The &lt;em&gt;virtual memory&lt;/em&gt; gives a running program the illusion that it has big enough memory, despite the fact that the computer does not have enough.&lt;/p&gt;

&lt;p&gt;We are not going to go deep on how &lt;em&gt;virtual memory&lt;/em&gt; works. Just have in mind that when a program is accessing memory it is accessing the &lt;em&gt;virtual memory&lt;/em&gt;. And maybe the data the program is trying to access is not actually in memory, but it does not matter. The operational system will make pretend that it is by going to disk, and putting it there, and replace an old chunk of memory that is not going to be used.&lt;/p&gt;

&lt;p&gt;So, one of the ways a database storage engine can solve the larger than memory problem is to make use of &lt;em&gt;virtual memory&lt;/em&gt; and the concept of &lt;strong&gt;memory-mapped files&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;In Linux, we can make this use by using the system call &lt;a href="https://man7.org/linux/man-pages/man2/mmap.2.html"&gt;mmap&lt;/a&gt; that lets you map a file, no matter how big, directly into memory. If your program needs to manipulate the file, all it needs is to manipulate the memory. The operating system handles the writes to disk for you.&lt;/p&gt;

&lt;p&gt;In some occasions, programmers find this method more convenient than the usual system calls: &lt;a href="https://man7.org/linux/man-pages/man2/open.2.html"&gt;open&lt;/a&gt;, &lt;a href="https://man7.org/linux/man-pages/man2/read.2.html"&gt;read&lt;/a&gt;, &lt;a href="https://man7.org/linux/man-pages/man2/write.2.html"&gt;write&lt;/a&gt;, &lt;a href="https://man7.org/linux/man-pages/man2/lseek.2.html"&gt;lseek&lt;/a&gt; and &lt;a href="https://man7.org/linux/man-pages/man2/close.2.html"&gt;close&lt;/a&gt;.&lt;/p&gt;

&lt;h3&gt;
  
  
  A simple demonstration
&lt;/h3&gt;

&lt;p&gt;Here is a small example of how you can take advantage of this in Go using the package &lt;a href="https://github.com/edsrzf/mmap-go"&gt;mmap-go&lt;/a&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package main

import (
    "os"
    "fmt"
    "github.com/edsrzf/mmap-go"
)

func main() {
    f, _ := os.OpenFile("./file", os.O_RDWR, 0644)
    defer f.Close()

    mmap, _ := mmap.Map(f, mmap.RDWR, 0 )
    defer mmap.Unmap()
    fmt.Println(string(mmap))

    mmap[0] = 'X'
    mmap.Flush()
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;&lt;a href="https://res.cloudinary.com/practicaldev/image/fetch/s--smW-4i7E--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://asciinema.org/a/pRS8PvTRHksnCVQgSOWvPBF3a.svg" class="article-body-image-wrapper"&gt;&lt;img src="https://res.cloudinary.com/practicaldev/image/fetch/s--smW-4i7E--/c_limit%2Cf_auto%2Cfl_progressive%2Cq_auto%2Cw_880/https://asciinema.org/a/pRS8PvTRHksnCVQgSOWvPBF3a.svg" alt="asciicast"&gt;&lt;/a&gt;&lt;br&gt;
The beauty is that we could have a much bigger file, and the solution would still work. We would not have to worry about managing memory in order to avoid it filling up.&lt;/p&gt;

&lt;h3&gt;
  
  
  Detailing &lt;em&gt;mmap&lt;/em&gt; capabilites
&lt;/h3&gt;

&lt;p&gt;We're going to explore more &lt;em&gt;mmap&lt;/em&gt; functionalities from the point of view of the API provided by &lt;a href="https://github.com/edsrzf/mmap-go"&gt;mmap-go&lt;/a&gt;. There are probably more features that the &lt;a href="https://godoc.org/golang.org/x/sys/unix#Mmap"&gt;native syscall&lt;/a&gt; provides that this library does not implement.&lt;/p&gt;

&lt;h4&gt;
  
  
  The &lt;code&gt;prot&lt;/code&gt; argument
&lt;/h4&gt;

&lt;p&gt;Here is the &lt;code&gt;mmap.Map&lt;/code&gt; signature&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func Map(f *os.File, prot, flags int) (MMap, error) 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;Let's look at &lt;code&gt;prot&lt;/code&gt; first. The &lt;code&gt;prot&lt;/code&gt; argument lets you specify the protection levels of your mapping: &lt;code&gt;RDONLY&lt;/code&gt;, &lt;code&gt;RDWR&lt;/code&gt;, &lt;code&gt;EXEC&lt;/code&gt; are the options provided for &lt;code&gt;mmap-go&lt;/code&gt;. These levels are pretty straightforward, &lt;code&gt;RDONLY&lt;/code&gt; means you can only read from the mapping, &lt;code&gt;RDWR&lt;/code&gt; means you can also write, and &lt;code&gt;EXEC&lt;/code&gt; means you can execute code on that mapping.  Here is the description of &lt;code&gt;prot&lt;/code&gt; from the Linux &lt;code&gt;man&lt;/code&gt;:&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;The prot argument describes the desired memory protection of the
mapping (and must not conflict with the open mode of the file).
It is either PROT_NONE or the bitwise OR of one or more of the
following flags:

PROT_EXEC
    Pages may be executed.

PROT_READ
    Pages may be read.

PROT_WRITE
    Pages may be written.

PROT_NONE
    Pages may not be accessed.
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In the &lt;a href="https://godoc.org/golang.org/x/sys/unix"&gt;unix package&lt;/a&gt;, those flags are: &lt;code&gt;unix.PROT_EXEC&lt;/code&gt;, &lt;code&gt;unix.PROT_READ&lt;/code&gt;, &lt;code&gt;unix.PROT_WRITE&lt;/code&gt; and &lt;code&gt;unix.PROT_NONE&lt;/code&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  Experimenting with &lt;code&gt;PROT_EXEC&lt;/code&gt; flag
&lt;/h4&gt;

&lt;p&gt;I've become intrigued by the &lt;code&gt;EXEC&lt;/code&gt; flag and wanted to see an example of how that works. I've Google and could not find any example. So I tried a search in Github by &lt;code&gt;PROT_EXEC&lt;/code&gt; and found a good example in &lt;code&gt;C&lt;/code&gt;: &lt;a href="https://github.com/onesmash/MMapExecDemo"&gt;MMapExecDemo&lt;/a&gt;. I replicated this example in &lt;code&gt;Go&lt;/code&gt; using &lt;code&gt;mmap-go&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The first step was to create a function that I wanted to be put in memory by &lt;code&gt;mmap&lt;/code&gt; allocation, compile it, and get its assembly opcodes.&lt;/p&gt;

&lt;p&gt;I created the &lt;code&gt;inc&lt;/code&gt; function in &lt;code&gt;inc.go&lt;/code&gt; file&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;package inc

func inc(n int) int {
    return n + 1
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;compiled it with &lt;code&gt;go tool compile -S -N inc.go&lt;/code&gt;, then got its assembly by calling &lt;code&gt;go tool objdump -S inc.o&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;func inc(n int) int {
  0x22b                 48c744241000000000      MOVQ $0x0, 0x10(SP)
        return n + 1
  0x234                 488b442408              MOVQ 0x8(SP), AX
  0x239                 48ffc0                  INCQ AX
  0x23c                 4889442410              MOVQ AX, 0x10(SP)
  0x241                 c3                      RET
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;With this, we can build represent our function in bytes on our code&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;code := []byte{
        0x48, 0xc7, 0x44, 0x24, 0x10, 0x00, 0x00, 0x00, 0x00,
        0x48, 0x8b, 0x44, 0x24, 0x08,
        0x48, 0xff, 0xc0,
        0x48, 0x89, 0x44, 0x24, 0x10,
        0xc3,
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We allocate our memory with &lt;code&gt;mmap&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory, err := mmap.MapRegion(nil, len(code), mmap.EXEC|mmap.RDWR, mmap.ANON, 0)
if err != nil {
    panic(err)
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;In this call, we're using a more complete function called &lt;code&gt;MapRegion&lt;/code&gt; that lets you specify how much memory you are allocating (&lt;code&gt;Map&lt;/code&gt; allocates the size of the underlying file) and the offset of the file.&lt;/p&gt;

&lt;p&gt;In the beginning, we said that the main purpose of &lt;code&gt;mmap&lt;/code&gt; was to create a mapping between a file and memory. But in this call we are not indicating any file. &lt;code&gt;mmap&lt;/code&gt; can be used just a regular memory allocater by setting &lt;code&gt;nil&lt;/code&gt; to the &lt;code&gt;*os.File&lt;/code&gt; argument and &lt;code&gt;mmap.ANON&lt;/code&gt; to the &lt;code&gt;flags&lt;/code&gt; argument. We will talk about more &lt;code&gt;mmap.ANON&lt;/code&gt;. Since we are not mapping any file, the offset is &lt;code&gt;0&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;So we have memory allocated with the same size of our code &lt;code&gt;len(code)&lt;/code&gt;. Since we set the flag &lt;code&gt;mmap.RDWR&lt;/code&gt;, we can copy our &lt;code&gt;code&lt;/code&gt; to &lt;code&gt;memory&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;copy(memory, code)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;We have the code of our &lt;code&gt;inc&lt;/code&gt; function in memory. In order to execute it, we have to cast that memory address to a function with a signature that matches the signature of our compiled &lt;code&gt;inc&lt;/code&gt;.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;memory_ptr := &amp;amp;memory
ptr := unsafe.Pointer(&amp;amp;memory_ptr)
inc := *(*func(int) int)(ptr)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;When we call &lt;code&gt;inc&lt;/code&gt;, we are executing the code we put in memory. That only works because of the flag &lt;code&gt;mmap.EXEC&lt;/code&gt;. If that flag was not set, a &lt;code&gt;segmentation violation&lt;/code&gt; would occur.&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;fmt.Println(inc(10)) // Prints 11
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;I don't know if this is a real use case. I just wanted to see what it meant to execute code that you put in memory. And there are probably other ways of achieving the same with regular memory allocation and calls to &lt;a href="https://man7.org/linux/man-pages/man2/mprotect.2.html"&gt;mprotect&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;One question that may come up is: but the code is already in the &lt;code&gt;code&lt;/code&gt; variable, can't we just execute it? No, because the memory static allocated to &lt;code&gt;code&lt;/code&gt; is not executable. Can we make it executable? I've tried to use &lt;a href="https://man7.org/linux/man-pages/man2/mprotect.2.html"&gt;mprotect&lt;/a&gt; on it but still got &lt;code&gt;segmentation violation&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Here is the full working &lt;a href="https://gist.github.com/brunoac/b9ff4ad46c27926e5e4f078133d0de79"&gt;gist&lt;/a&gt;.&lt;/p&gt;

&lt;h4&gt;
  
  
  The &lt;code&gt;flags&lt;/code&gt; argument
&lt;/h4&gt;

&lt;p&gt;We can have many processes mapping the same memory region. This argument lets us decide about the visibility of the updates happening in the mapping. There are many flags, and you can check them out at &lt;a href="https://man7.org/linux/man-pages/man2/mmap.2.html"&gt;mmap&lt;/a&gt;. The important ones are &lt;code&gt;unix.MAP_SHARED&lt;/code&gt;, &lt;code&gt;unix.MAP_PRIVATE&lt;/code&gt; and &lt;code&gt;unix.MAP_ANON&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MAP_SHARED&lt;/code&gt; means that changes to the mapping are visible to all processes and will also occur at the underlying mapped file, although we cannot control when.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MAP_PRIVATE&lt;/code&gt; means the changes are private and other processes will not see them. And also, they are not carried through to the underlying file.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;MAP_ANON&lt;/code&gt; means that there is not going to be a mapped file. It is useful for sub-processes communication with shared memory.&lt;/p&gt;

&lt;p&gt;I've got confused about the &lt;code&gt;mmap-go&lt;/code&gt; library implementation. It only provides the &lt;code&gt;mmap.ANON&lt;/code&gt; flag, that we used in the above example. If you want your mapping to be private, you can set the &lt;code&gt;mmap.COPY&lt;/code&gt; flag to the &lt;code&gt;prot&lt;/code&gt; argument. Anyways, you can always use the flags provided by the &lt;code&gt;unix&lt;/code&gt; package implementation.&lt;/p&gt;

&lt;h4&gt;
  
  
  Locking and flushing
&lt;/h4&gt;

&lt;p&gt;Two other nice methods, &lt;code&gt;Lock&lt;/code&gt; and &lt;code&gt;Flush&lt;/code&gt;, are provided by the API of &lt;code&gt;mmap-go&lt;/code&gt;. The &lt;code&gt;Lock&lt;/code&gt; method calls the &lt;a href="https://man7.org/linux/man-pages/man2/mlock.2.html"&gt;mlock&lt;/a&gt; system call that prevents the mapping to be paged out to disk. And the &lt;code&gt;Flush&lt;/code&gt; method calls the &lt;a href="https://man7.org/linux/man-pages/man2/msync.2.html"&gt;msync&lt;/a&gt; system call that forces the data in memory to be written to disk. This is a good way to trying to have more control over how and when data is flushed to disk.&lt;/p&gt;

&lt;h3&gt;
  
  
  Wrapping up
&lt;/h3&gt;

&lt;p&gt;I felt kind of stupid of knowing about &lt;code&gt;mmap&lt;/code&gt; after so long. I don't remember it being brought in my college class. For some reason, I felt amazed by it and its capabilities and decided to dig deeper. I like databases and I'm aiming to get a better grasp of them. This means that &lt;code&gt;mmap&lt;/code&gt; cannot go unnoticed from my learning. For future posts, I'll try to bring about the benefits and drawbacks of using &lt;code&gt;mmap&lt;/code&gt;, which projects use it, and what kind of problems it is suited for.&lt;/p&gt;

&lt;p&gt;Even though the &lt;code&gt;mmap&lt;/code&gt; can be used to solve that database problem we stated in the beginning, and many modern databases use it, &lt;a href="https://twitter.com/andy_pavlo"&gt;Andy Pavlo&lt;/a&gt; advocates against it and have three lecture on how to databases, that don't use &lt;code&gt;mmap&lt;/code&gt;, manage data.&lt;/p&gt;

&lt;p&gt;If you like this kind of content, follow me on &lt;a href="https://twitter.com/brunocalza"&gt;twitter&lt;/a&gt;. You may find more related stuff there.&lt;/p&gt;

</description>
      <category>go</category>
      <category>database</category>
      <category>virtualmemory</category>
      <category>mmap</category>
    </item>
  </channel>
</rss>
