This is a personal note about how I tracked down and fixed a double-free bug caused by Crystal’s garbage collector interacting with libxml2. I used gdb and valgrind to trace the issue, understand where memory was allocated and freed, and eventually identify the root cause. I am not an advanced user of these tools, so this write-up serves as a reminder of the steps I took and what I learned along the way.
The Problem
For a few days, some of my tests started crashing intermittently, against the night builds of the Crystal:
$ bin/drar_test --seed 6690 --verbose --parallel 1
free(): double free detected in tcache 2
- The crashes were non-deterministic: they didn’t always occur locally, and sometimes didn’t even appear in CI.
- The error message itself wasn’t very helpful at first, and I wasn’t sure where the issue was coming from.
Step 1: Reproducing the Issue
I started by reproducing the failure locally:
- I used the same app configuration as in CI.
- I tested different
--seedarguments until I found a seed that reliably triggered the crash.
Step 2: Initial Investigation with GDB
Since the error originated from free(), I wanted to see what was happening at the crash:
$ crystal --version
Crystal 1.17.0 [d2c705b53] (2025-07-16)
$ crystal build --stats --threads 1 --time -o bin/drar_test ./test/ext/std/openssl/cipher_test.cr ...
$ gdb --args bin/drar_test --seed 6690 --verbose --parallel 1
> run
Run options: --seed 6690 --verbose --parallel 1...
free(): double free detected in tcache 2
Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44 return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
At the crash, I used bt to print the backtrace:
> bt
...
#6 0x00007ffff7258ad8 in tcache_double_free_verify (e=<optimized out>) at malloc.c:3350
#7 0x00007ffff7e7413b in xmlFreeNodeList (cur=0x1da4db00) at /usr/src/debug/libxml2-2.12.10-5.fc43.x86_64/tree.c:3662
#8 0x00007ffff7e73e68 in xmlFreeDoc (cur=0x1da4b710) at /usr/src/debug/libxml2-2.12.10-5.fc43.x86_64/tree.c:1212
#9 0x0000000003e480a0 in finalize () at /home/miry/src/crystal/crystal/src/xml/document.cr:67
#10 0x0000000000482b86 in -> () at /home/miry/src/crystal/crystal/src/gc/boehm.cr:340
#11 0x00007ffff73cc517 in GC_invoke_finalizers () at extra/../finalize.c:1255
#12 0x00007ffff73cc801 in GC_notify_or_invoke_finalizers () at extra/../finalize.c:1342
#13 GC_notify_or_invoke_finalizers () at extra/../finalize.c:1282
#14 0x00007ffff73d8e77 in GC_generic_malloc_many (lb=<optimized out>, k=1, result=0x7ffff750b130 <first_thread+496>)
at extra/../mallocx.c:336
#15 0x00007ffff73e67b5 in GC_malloc_kind (bytes=<optimized out>, kind=<optimized out>) at extra/../thread_local_alloc.c:187
-
The backtrace revealed that Crystal’s GC was finalizing an
XML::Documentand callingxmlFreeDoc. - This was my first clue that the crash involved XML nodes being freed twice.
Step 3: Using Valgrind
I then ran the same binary under Valgrind:
$ valgrind --track-origins=yes --leak-check=full bin/drar_test 2> valgrind.logs
In the logs, I found:
==232953== Invalid free() / delete / delete[] / realloc()
==232953== at 0x1E2C6E43: free (vg_replace_malloc.c:990)
==232953== by 0x1E33B39B: xmlFreePropList (tree.c:2052)
==232953== by 0x1E33B39B: xmlFreePropList (tree.c:2047)
==232953== by 0x1E33B0A7: xmlFreeNodeList (tree.c:3638)
==232953== by 0x1E33AE67: xmlFreeDoc (tree.c:1212)
==232953== by 0x3EAB88F: *XML::Document#finalize:Nil (document.cr:67)
...
==232953== Address 0x1f4b6780 is 0 bytes inside a block of size 96 free'd
==232953== at 0x1E2C6E43: free (vg_replace_malloc.c:990)
==232953== by 0x1E33B39B: xmlFreePropList (tree.c:2052)
==232953== by 0x1E33B39B: xmlFreePropList (tree.c:2047)
==232953== by 0x1E33B0A7: xmlFreeNodeList (tree.c:3638)
==232953== by 0x1E33AE67: xmlFreeDoc (tree.c:1212)
==232953== by 0x3EAB88F: *XML::Document#finalize:Nil (document.cr:67)
...
==232953== Block was alloc'd at
==232953== at 0x1E2C3B26: malloc (vg_replace_malloc.c:447)
==232953== by 0x1E3374C5: xmlSAX2AttributeNs (SAX2.c:1880)
==232953== by 0x1E3393E8: xmlSAX2StartElementNs (SAX2.c:2299)
==232953== by 0x1E3289F1: xmlParseStartTag2.constprop.0 (parser.c:10091)
==232953== by 0x1E328EBB: xmlParseElementStart (parser.c:10473)
==232953== by 0x1E32AF84: xmlParseElement (parser.c:10406)
==232953== by 0x1E32B267: xmlParseDocument (parser.c:11190)
==232953== by 0x1E332F38: xmlDoRead (parser.c:14835)
==232953== by 0x3EAAE19: *XML::parse<String>:XML::Document (xml.cr:61)
==232953== by 0xFF0FF3: *ActionText::RichText#to_html:String (rich_text.cr:41)
==232953== by 0x41F7A90: *ActionText::RichTextTest#test_render_html_with_image_and_tags:Bool (rich_text_test.cr:70)
==232953== by 0x4B1003: ~proc223Proc(Minitest::Test, Nil)@lib/minitest/src/runnable.cr:17 (runnable.cr:17)
- Valgrind confirmed the double free: the same memory address was freed twice by the Garbage Collector.
- It also showed the allocation site, pointing back to XML parsing, which confirmed the findings from GDB.
- The extra information helped identify where the object was allocated.
Step 4: Root Cause
The root cause of the problem was my custom bindings.
I had extended Crystal's XML with libxml2:
The root cause of the problem was my hacks with extra bindings indeed.
In my approach I exteneded the Crystal's XML with bindings from libxml2:
@[Link("xml2")]
lib LibXML
fun xmlAddChild(parent : Node*, child : Node*)
end
class XML::Node
def add_child(child : Node)
LibXML.xmlAddChild(self, child)
end
end
The actual issue appeared in the way I was using it:
html = XML.parse_html "<article>some text </article>", options: XML::HTMLParserOptions::RECOVER | XML::HTMLParserOptions::NOIMPLIED | XML::HTMLParserOptions::NODEFDTD
html.xpath_nodes("//action-text-attachment").each do |parent|
...
image = XML.parse("<img src='#{blob.redirect_url}'>").xpath_node("//img").not_nil!
parent.add_child(image)
end
The problem:
image = XML.parse("<img src='#{blob.redirect_url}'>").xpath_node("//img").not_nil!
The double-free happens because libxml2 Nodes belong to a single Document:
-
XML.parsecreates a temporaryXML::Document. -
xpath_nodereturns a child node (image) that still belongs to this temporary document. -
parent.add_child(image)inserts the node into another document (parent) without detaching or copying it. - When the temporary document is finalized by Crystal’s GC, it frees all its nodes, including
image. - The
parentdocument still references the sameimagenode. Later, when the parent document is finalized, it tries to freeimageagain → double free.
Valgrind and GDB confirmed this pattern: the same address (0x1f4b6780) was freed twice — first by the temporary document finalizer, second by the parent document finalizer.
Step 5: Fixing the Problem
The solution is to insert a copy of the node into the document, rather than the original. The copy is fully independent and can safely be added to another document.
@[Link("xml2")]
lib LibXML
fun xmlAddChild(parent : Node*, child : Node*)
fun xmlCopyNode(old : Node*, extended : Int) : Node*
end
class XML::Node
def add_child(child : Node)
copied_node = LibXML.xmlCopyNode(child, 1)
LibXML.xmlAddChild(self, copied_node)
end
end
- Now, the temporary document used to create the node can be freed safely without causing crashes.
- Any attributes, children, or other memory owned by the original document are copied correctly.
Step 6: Lessons Learned
- Nodes belong to a single document in
libxml2; sharing them across documents without copying or detaching is unsafe. -
Valgrind and GDB are invaluable debugging tools:
- Valgrind detects invalid frees and memory issues.
- GDB lets you inspect the backtrace at the crash.
- Valgrind can be misleading at first because it does not trigger a crash; instead, you need to read the logs carefully to identify double-free memory addresses. Once found, it shows the allocation and free sites, which greatly helps in investigating the root cause.


Top comments (0)