Paula Gearon

Posted on Aug 19, 2019 • Edited on Apr 22, 2020

Buffering Up

Part 4 in this meandering journey toward durable data structures. Part 3 was here, or go back to the beginning.

A word of warning: like the previous post, this one is also very code-heavy. I explain it all in detail, so it may feel like it bogs down. But this also forms the basis of a lot of work moving forward, so I think it's important to be explicit in every aspect.

Buffers

Today I'm going to switch gears and move to Java. There are several reasons for this:

Java is a higher-level language than C. I was tempted to show strings in past posts but figured I'd end up spending far too much time explaining it, with no benefit. Suffice to say that no one wants to get into why we want strncat instead of strcat, using largish buffers for holding strings temporarily, and possibly having to allocate larger buffers as your string grows in size.
Java is an Object-Oriented language. This allows objects to store some of the context, while C needed to explicitly pass context around. Objects are also referred to by reference, and pointers disappear.
Java has a type of class called a Buffer. These can represent large blocks of memory, just like we got from malloc in C.
Java can't layer records over a buffer like C can (or Erlang. This is a lovely feature in Erlang if you ever get to use that language). But it does provide addressed access to the buffer, with methods like getLong and getByte. It takes more work, but we can still get the functionality we want.
Java buffers are designed specifically for I/O, meaning that they can be sent to disk easily.
Java supports memory mapping files, and buffers are the interface for doing this.
Anything that can be done in Java can be done in any other language that runs on the Java Virtual Machine (JVM). This includes one of my target languages: Clojure.

My other targeted system is Javascript, so for anyone who isn't interested in the JVM, please be assured that I will get there! Eventually.

Getting a Buffer

In C we used the malloc call. We can do something similar in Java by asking for a buffer of the correct size. The size to ask for was the size of each element multiplied by the number of elements. In C it was easy to get the size of the structure with the sizeof operator, but no such operator exists in Java. This is partly because objects in Java always have extra space allocated for methods and type information, so the memory usage doesn't relate directly to the data being stored as it does in C. This means that we need to figure out the size for ourselves.

In this case, we're going to use 2 integers for the elements: the next pointer, and the value of the element. That's 2 * sizeof(int), except we don't have a sizeof operation. That's OK because for basic types the sizes are included:
int ELEMENT_SIZE = 2 * Integer.BYTES

We can then allocate a ByteBuffer of 10 elements with the static allocate method on ByteBuffer:

ByteBuffer buffer = ByteBuffer.allocate(ELEMENT_SIZE * 10);

Java is an object-oriented language. That means that we can use an object to hold the management of the buffer and allocation of elements for us. I'm going to call this object BufferList since it will manage a list that is represented in a buffer. This object is also the natural place to store the constant values that we need, like the array size, and offsets for the parts of the Element objects.

The first part of an Element is the integer holding the pointer for next. Since this is at the start of the object, it will be offset from the object by 0. The second part is the integer containing the value. This will be offset from the start of the object by the size of 1 integer. We can do this in bytes (1 * Integer.BYTES, which is 4 bytes), or if we present the same data as integers then we can represent it as an offset of 1.

So here is the start of our BufferList class:

  public static int ELEMENT_SIZE = 2 * Integer.BYTES;
  public static int NULL = -1;

  private static int NEXT_OFFSET = 0;
  private static int VALUE_OFFSET = 1;

  private ByteBuffer buffer;
  private int nextAvailable;

  public BufferList() {
    this(10);
  }

  public BufferList(int length) {
    buffer = ByteBuffer.allocate(ELEMENT_SIZE * length);
    nextAvailable = 0;
  }

This gives us the buffer to work with (just like the malloc in C), and also the size of each element and the locations of the fields within each element. We are handling it manually in Java whereas C had facilities to do it for us, but we are still getting the same effect.

Buffer Management

The BufferList object is not just holding the buffer; it is also managing it. To do this, it remembers the latest place that it can allocate space from using the nextAvailable value. The constructor initializes this to the start of the buffer, which is offset 0.

In our Javascript linked list we created a new element to add to the front of the list by using the Element constructor. To do the same thing in C, we had a helper function that allocated the required memory and set the values, this was then updated to a function that captured the required memory from the overall buffer.

To do the same thing in Java, we can introduce an addToHead function. This will call the Element constructor the same way as the Javascript code did, but it can also provide the context of the allocated buffer and the location of the next free space to use. An additional benefit of Java is method overloading, so an extra addToHead method can be created that doesn't need a list provided. This is just a shortcut to create a single element list, and just calls the main version of the method with a null value for the next part of the list.

  public Element addToHead(int value) {
    return addToHead(null, value);
  }

  public Element addToHead(Element nextElt, int value) {
    if (nextAvailable * ELEMENT_SIZE >= buffer.capacity()) {
      throw new RuntimeException("Out of memory");
    }
    return new Element(nextAvailable++, nextElt, value);
  }

The first thing that the addToHead method does is to ensure that the buffer isn't already full. That's not going to be the case in this code, since there is space for 10 elements, and I only ask for 9, but I get uncomfortable not doing the check. (I should have done it with the C code as well!)

We can see that the main work is done in the final line. It uses the current offset for the next element that can be allocated. It passes this, along with the arguments along to the Element constructor. After this, the next available element index is incremented. The buffer is not explicitly mentioned here, as it will be passed along automatically in the context of the BufferList class. We'll see how that happens now.

Elements

Since Element objects will always be associated with the buffer it was created with, then it makes sense for Element objects to be associated with the buffer manager. Java supports this by allowing the Element class to be an inner class of BufferList. This way it has automatic access to all of the static constants, and the allocated buffer.

The best way for an Element to be associated with its associated data is to take a slice of the buffer that refers to the relevant data. This is done by setting the position of the buffer as the start of the slice, and the limit as the end. Each method for these operations returns the modified buffer, so the operations can be chained:
buffer.position(offset).limit(offset + ELEMENT_SIZE).slice()

The data for these objects is going to consistently represent integers, so the bytes can be reinterpreted as integers by representing them in an IntBuffer.

In the C example, the element was constantly being derived out of the buffer from its index. In Java, the Element objects can get everything it needs during construction, which means that it will lose the index of the element if we don't save it, so that goes into an index field.

We also want a few different constructors.

Constructors

The first constructor takes just the index of the element. This allows it to capture the appropriate slice of the buffer, wrapping any values that may already be there. This is how we set up a fresh element but is also how we can read an existing element.

The next constructor uses the previous one to capture the appropriate slice from the main buffer and then sets the next and value fields. Note that since this code is using Element objects, then the next need not be passed as an element index value in the way the C code did it. Instead, it expects an actual element.

The final constructor is just a convenience that doesn't have a next value, indicating that it is just the element at the end of the list. All it does it call another constructor using a null value for the next element.

  Element(int index) {
    this.index = index;
    int offset = index * ELEMENT_SIZE;
    intBuffer = buffer.position(offset).limit(offset + ELEMENT_SIZE).slice().asIntBuffer();
  }

  Element(int index, Element next, int value) {
    this(index);
    setNext(next);
    setValue(value);
  }

  Element(int index, int value) {
    this(index, null, value);
  }

At this point, we have Element objects that wrap data from the buffer. So now we need to provide getter and setter methods that access that data.

Getters

The method for getIndex just references the index provided when the Element object was created. The getNext and getValue methods are more interesting.

getValue reads the integer from the offset for values, which was defined as VALUE_OFFSET.

getNext is different, as it needs to return an Element object. It reads the integer at the NEXT_OFFSET as the index for the element to be read. If this number is NULL (defined as -1) then no object is returned. Otherwise, the first constructor is called, creating an object that reads from the given position.

  public int getIndex() {
    return index;
  }

  public int getValue() {
    return intBuffer.get(VALUE_OFFSET);
  }

  public Element getNext() {
    int nextId = intBuffer.get(NEXT_OFFSET);
    return nextId == NULL ? null : new Element(nextId);
  }

Setters

The index of the element never changes, so it does not need a setter method. This means that only setNext and setValue need to be defined. Again, the next value is set to NULL (-1) if the provided object is null. Note that this is returned from setters, so that multiple setters can be called in a row, in the same way that the slice was done in the first constructor. This isn't necessary, but it's a useful standard to follow.

  public Element setValue(int value) {
    intBuffer.put(VALUE_OFFSET, value);
    return this;
  }

  public Element setNext(Element next) {
    intBuffer.put(NEXT_OFFSET, next == null ? NULL : next.getIndex());
    return this;
  }

Printing

Just like we did in the Javascript code for listString, we can traverse the list recursively. Unlike Javascript, converting numbers to strings with the + operator only ever occurs if the value on the left of the + sign is already a string. This isn't the case here, so we use the Integer.toString() method to do that for us. Otherwise, the function is quite similar to the Javascript function listString.

public static String listString(Element element) {
  Element next = element.getNext();
  String valueString = Integer.toString(element.getValue());
  if (next == null) return valueString;
  else return valueString + ", " + listString(next);
}

The listString function does not reference the buffer itself, instead using the current element to obtain its information. Because of this, the method can be static, meaning that it doesn't need to access any of the values that are local to its context. It doesn't really matter, but it's slightly more efficient.

We can then define a toString method on Elements to call this.

public String toString() {
  return listString(this);
}

Building a List

With the buffer management in BufferList and the Element objects defined, the code for building a list is quite similar to what we built in other languages.

BufferList bufferList = new BufferList(10);
BufferList.Element list = bufferList.addToHead(34);
list = bufferList.addToHead(list, 21);
list = bufferList.addToHead(list, 13);
list = bufferList.addToHead(list, 8);
list = bufferList.addToHead(list, 5);
list = bufferList.addToHead(list, 3);
list = bufferList.addToHead(list, 2);
list = bufferList.addToHead(list, 1);
list = bufferList.addToHead(list, 1);
System.out.println(list);

Running it all gives exactly the same result that we saw in previous iterations

~/src$ java buffer.ListExample
1, 1, 2, 3, 5, 8, 13, 21, 34

The complete java code for this program can be found here. It's all in a package called buffer, so don't forget to put it in a directory of that name if you want to compile it.

Recap

This post showed how the principles demonstrated when storing a Linked List structure in a memory buffer can be translated into another language. The tools of directly mapping the memory into the desired records may be missing, but object definitions can be used to fill this gap with getters and setters. At the same time, the encapsulation of context that objects provide can hide some of the manual aspects of managing the memory buffer.

Up Next

The astute reader may have noticed that buffers are in a Java package called nio. This stands for "New I/O" and refers to a library of operations for efficiently storing and retrieving data, both on the network and on disk. We will start looking at how to save and retrieve the data that we have been working with, all in the next post.

DEV Community