This post has moved to blog.zachklipp.com.
Well, so much for blogging every day. Better late than never, right? Anyway, I have a good excuse: I’ve been super busy writing code and tests, and it’s not going terribly. I'm calling this day 2.5 since I was up a large part of the night working but haven't done anything today yet.
Progress
Standup time. Here’s what I’ve done so far:
-
Built an abstract class that:
-
Adds a mutating method to
CharSequence
:
fun replace(start: Int, end: Int, text: CharSequence)
- Adds a method for efficiently copying parts of the buffer into other character arrays:
fun getChars(start: Int, end: Int, dest: CharArray, destOffset: Int)
Implements the rest of
CharSequence
on top of a length property and thegetChars
method, so implementers only need to implement 3 things.Returns a mutable instance of the same from
CharSequence.subSequence
, so it can act as a sort of mutating lense into the original sequence and not just a cheap read-only view.Supports platform-specific
GetChars
-like interfaces via an empty commonexpect interface
, and on Android uses this to extendGetChars
.
-
-
Built three concrete implementations of this abstract class:
- A basic piece table that doesn't support snapshots, both as a way to figure out how to actually do that and as a control to benchmark the snapshot-aware implementations.
- A piece table that supports snapshots in a very simplest and naïve way that doesn't try to be efficient but was very quick to build and is easier to reason about, and also acts as an upper-bound control for a more optimized implementation. It doesn't have an actual add buffer, it just stores the inserted texts directly in their pieces. Thanks to Alex Vanyo for suggesting this implementation!
- A snapshot-aware piece table that uses the tricks I described in my first post. It's a lot more complicated, but thanks to an extensive unit test suite shared with the other implementations (see below) I'm reasonably confident it actually works.
Taken the extensive unit test suite for the existing
GapBuffer
implementation, added a bunch of tests for the additionalCharSequence
contracts, added more tests for snapshot isolation, and parameterized the test suite to run over all three concrete implementations.Also copied the
GapBuffer
benchmarks and parametrized them to run all three implementations.
Learnings
GetChars
It's really a shame that neither Java nor Kotlin have a standard interface that lets a type declare that it supports efficient multi-character copying. The getChars
method on my piece table abstract class allows efficient copying of a slice of a CharSequence
into any part of a CharArray
. It's much more efficient for copying large strings than simply calling CharSequence.get()
for every single character. It takes its signature from what seems to have become the de facto standard for such a method, appearing on String
, StringBuilder
, and Android's GetChars
interface. Unfortunately, outside of that Android-specific one, there's no standard Kotlin or JVM interface that things can implement to declare they support this functionality. To make use of it for CharSequence
s of unknown types you have to manually check for a set of known types, and potentially fallback to iterating with get
. Android's TextUtils
class has a static method that will do this for the types listed above, but that method isn't available on any other platforms or for host-side unit tests. If only Kotlin supported typeclasses…
Gap buffer vs piece table
The Jetpack Microbenchmark library is really nice. Compose uses it extensively, and there were already benchmarks for the gap buffer implementation, so it's been easy to compare my implementations.
The non-snapshot piece table's performance is actually pretty close to the gap buffer on most benchmarks. It's better in some cases (dumping the buffer to a String
and random inserts on large buffers), but worse in others (contiguous replace operations, any replace operations on small buffers). That makes sense since moving the gap in a large buffer could mean shifting up to the entire character buffer, whereas in the piece table it only requires shifting the list of pieces. Array copies in small buffers are very quick but the piece table still has to pay the overhead of managing pieces, so I think a hybrid approach where the piece table uses a different data structure for small buffers might help in a lot of common cases.
Interestingly, the piece table is also a lot more consistent. The wall clock numbers for small and large buffer benchmarks are almost identical, where as the gap buffer starts small but takes a huge performance hit for random disconnected writes as the buffer grows.
Allocations are a lot worse for piece tables in every write case, which makes sense since it needs to allocate objects for pieces, which the gap buffer does not need to do.
Snapshot overhead
The naïve snapshot-aware piece table and the one that more closely matches the non-snapshot implementation are much closer in performance than I was expecting so far. They're also both much worse than the non-snapshot piece table, which I'm less surprised about since I haven't really done much optimizing yet. I'm planning to dig into this today and hopefully make some significant improvements.
That said, even the slowest snapshot-aware piece table numbers might be Good Enough for text fields, since the benchmark numbers are still smaller than a hundredth of a millisecond, and by plumbing a single buffer through all layers of the text field API we should also avoid a bunch of work that we're currently doing to copy the buffer to/from strings, which isn't being accounted for in these more focused benchmarks.
KotlinX persistent collections
I knew Compose used the kotlinx.collections.immutable library under the hood to implement SnapshotStateList
and friends, but I didn't realize that we actually embed the library's source itself. The reason, which seems obvious when you think about it, is that the public library is still not stable, so Compose can't have a dependency on it from its stable releases. This sucks, because Compose doesn't want to expose its fork of this library so it's internal in the runtime:runtime
module. And all this text stuff I'm doing is in the ui:ui-text
module, so I've had to make a second copy of the collections library. If we ever ended up shipping this, we could extract our fork into a separate Gradle module and mark everything with a special @OptIn
annotation to keep third-party code from accidentally using it. But persistent collections are a really useful tool and I really hope kotlinx.collections.immutable
hits 1.0 soon.
Next up
Here's what I'm planning to do today:
- Add a stress test for contention among multiple writers on different threads. The snapshot isolation tests I already wrote cover most of the concurrency handling behaviors, but I realized they don't ensure that the trick of directly mutating the last segment in the add buffer is synchronized correctly.
- Improve how the actual pieces are managed: there's a lot of room for implementations to be "correct" but still accumulate a lot of unnecessary fragmentation in the piece table. Fragmented piece tables will be slower to process for both reads and writes, since both require iterating over, in the worst case, the entire table.
- Also add unit tests for this.
- Dig into the Perfetto traces generated from the benchmarks (again, Jetpack Microbenchmark is awesome) to try to optimize things more and get closer to the existing gap buffer performance.
- Explore using a hybrid approach where small add buffers (less than the size of a full add buffer segment) are stored in a simpler data structure, maybe even just an array or immutable gap buffer that we copy for every mutation.
- Small buffers are probably overwhelmingly the most common case, since most text fields on mobile apps don't end up containing that much text.
- Copying small arrays is actually very fast and so I'm curious if this will be faster than the overhead of managing pieces and complex persistent data structures.
- It might also be more space efficient since text fields that only hold zip codes really shouldn't be allocating more than a few tens of bytes.
- Explore using a linked list for the piece table instead of a
PersistentList
, since the list only ever needs to be traversed in-order, sequentially, from the start (the current implementation uses indices, but that shouldn't be hard to change).
Top comments (0)