DEV Community

Cover image for Debugging a Double Free in Crystal with libxml2, GDB, and Valgrind
Michael Nikitochkin
Michael Nikitochkin

Posted on

Debugging a Double Free in Crystal with libxml2, GDB, and Valgrind

This is a personal note about how I tracked down and fixed a double-free bug caused by Crystal’s garbage collector interacting with libxml2. I used gdb and valgrind to trace the issue, understand where memory was allocated and freed, and eventually identify the root cause. I am not an advanced user of these tools, so this write-up serves as a reminder of the steps I took and what I learned along the way.

The Problem

For a few days, some of my tests started crashing intermittently, against the night builds of the Crystal:

$ bin/drar_test --seed 6690 --verbose --parallel 1

free(): double free detected in tcache 2
Enter fullscreen mode Exit fullscreen mode
  • The crashes were non-deterministic: they didn’t always occur locally, and sometimes didn’t even appear in CI.
  • The error message itself wasn’t very helpful at first, and I wasn’t sure where the issue was coming from.

Step 1: Reproducing the Issue

I started by reproducing the failure locally:

  • I used the same app configuration as in CI.
  • I tested different --seed arguments until I found a seed that reliably triggered the crash.

Step 2: Initial Investigation with GDB

Since the error originated from free(), I wanted to see what was happening at the crash:

$ crystal --version
Crystal 1.17.0 [d2c705b53] (2025-07-16)

$ crystal build --stats --threads 1 --time -o bin/drar_test ./test/ext/std/openssl/cipher_test.cr ...

$ gdb --args bin/drar_test --seed 6690 --verbose --parallel 1
> run
Run options: --seed 6690 --verbose --parallel 1...

free(): double free detected in tcache 2

Program received signal SIGABRT, Aborted.
__pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
44       return INTERNAL_SYSCALL_ERROR_P (ret) ? INTERNAL_SYSCALL_ERRNO (ret) : 0;
Enter fullscreen mode Exit fullscreen mode

At the crash, I used bt to print the backtrace:

> bt
...
#6  0x00007ffff7258ad8 in tcache_double_free_verify (e=<optimized out>) at malloc.c:3350
#7  0x00007ffff7e7413b in xmlFreeNodeList (cur=0x1da4db00) at /usr/src/debug/libxml2-2.12.10-5.fc43.x86_64/tree.c:3662
#8  0x00007ffff7e73e68 in xmlFreeDoc (cur=0x1da4b710) at /usr/src/debug/libxml2-2.12.10-5.fc43.x86_64/tree.c:1212
#9  0x0000000003e480a0 in finalize () at /home/miry/src/crystal/crystal/src/xml/document.cr:67
#10 0x0000000000482b86 in -> () at /home/miry/src/crystal/crystal/src/gc/boehm.cr:340
#11 0x00007ffff73cc517 in GC_invoke_finalizers () at extra/../finalize.c:1255
#12 0x00007ffff73cc801 in GC_notify_or_invoke_finalizers () at extra/../finalize.c:1342
#13 GC_notify_or_invoke_finalizers () at extra/../finalize.c:1282
#14 0x00007ffff73d8e77 in GC_generic_malloc_many (lb=<optimized out>, k=1, result=0x7ffff750b130 <first_thread+496>)
    at extra/../mallocx.c:336
#15 0x00007ffff73e67b5 in GC_malloc_kind (bytes=<optimized out>, kind=<optimized out>) at extra/../thread_local_alloc.c:187
Enter fullscreen mode Exit fullscreen mode
  • The backtrace revealed that Crystal’s GC was finalizing an XML::Document and calling xmlFreeDoc.
  • This was my first clue that the crash involved XML nodes being freed twice.

Step 3: Using Valgrind

I then ran the same binary under Valgrind:

$ valgrind --track-origins=yes --leak-check=full bin/drar_test 2> valgrind.logs
Enter fullscreen mode Exit fullscreen mode

In the logs, I found:


==232953== Invalid free() / delete / delete[] / realloc()
==232953==    at 0x1E2C6E43: free (vg_replace_malloc.c:990)
==232953==    by 0x1E33B39B: xmlFreePropList (tree.c:2052)
==232953==    by 0x1E33B39B: xmlFreePropList (tree.c:2047)
==232953==    by 0x1E33B0A7: xmlFreeNodeList (tree.c:3638)
==232953==    by 0x1E33AE67: xmlFreeDoc (tree.c:1212)
==232953==    by 0x3EAB88F: *XML::Document#finalize:Nil (document.cr:67)
...
==232953==  Address 0x1f4b6780 is 0 bytes inside a block of size 96 free'd
==232953==    at 0x1E2C6E43: free (vg_replace_malloc.c:990)
==232953==    by 0x1E33B39B: xmlFreePropList (tree.c:2052)
==232953==    by 0x1E33B39B: xmlFreePropList (tree.c:2047)
==232953==    by 0x1E33B0A7: xmlFreeNodeList (tree.c:3638)
==232953==    by 0x1E33AE67: xmlFreeDoc (tree.c:1212)
==232953==    by 0x3EAB88F: *XML::Document#finalize:Nil (document.cr:67)
...
==232953==  Block was alloc'd at
==232953==    at 0x1E2C3B26: malloc (vg_replace_malloc.c:447)
==232953==    by 0x1E3374C5: xmlSAX2AttributeNs (SAX2.c:1880)
==232953==    by 0x1E3393E8: xmlSAX2StartElementNs (SAX2.c:2299)
==232953==    by 0x1E3289F1: xmlParseStartTag2.constprop.0 (parser.c:10091)
==232953==    by 0x1E328EBB: xmlParseElementStart (parser.c:10473)
==232953==    by 0x1E32AF84: xmlParseElement (parser.c:10406)
==232953==    by 0x1E32B267: xmlParseDocument (parser.c:11190)
==232953==    by 0x1E332F38: xmlDoRead (parser.c:14835)
==232953==    by 0x3EAAE19: *XML::parse<String>:XML::Document (xml.cr:61)
==232953==    by 0xFF0FF3: *ActionText::RichText#to_html:String (rich_text.cr:41)
==232953==    by 0x41F7A90: *ActionText::RichTextTest#test_render_html_with_image_and_tags:Bool (rich_text_test.cr:70)
==232953==    by 0x4B1003: ~proc223Proc(Minitest::Test, Nil)@lib/minitest/src/runnable.cr:17 (runnable.cr:17)

Enter fullscreen mode Exit fullscreen mode
  • Valgrind confirmed the double free: the same memory address was freed twice by the Garbage Collector.
  • It also showed the allocation site, pointing back to XML parsing, which confirmed the findings from GDB.
  • The extra information helped identify where the object was allocated.

Step 4: Root Cause

The root cause of the problem was my custom bindings.
I had extended Crystal's XML with libxml2:

The root cause of the problem was my hacks with extra bindings indeed.
In my approach I exteneded the Crystal's XML with bindings from libxml2:

@[Link("xml2")]
lib LibXML
  fun xmlAddChild(parent : Node*, child : Node*)
end

class XML::Node
  def add_child(child : Node)
    LibXML.xmlAddChild(self, child)
  end
end
Enter fullscreen mode Exit fullscreen mode

The actual issue appeared in the way I was using it:

html = XML.parse_html "<article>some text </article>", options: XML::HTMLParserOptions::RECOVER | XML::HTMLParserOptions::NOIMPLIED | XML::HTMLParserOptions::NODEFDTD
html.xpath_nodes("//action-text-attachment").each do |parent|
    ...
    image = XML.parse("<img src='#{blob.redirect_url}'>").xpath_node("//img").not_nil!
    parent.add_child(image)
end
Enter fullscreen mode Exit fullscreen mode

The problem:

image = XML.parse("<img src='#{blob.redirect_url}'>").xpath_node("//img").not_nil!
Enter fullscreen mode Exit fullscreen mode

The double-free happens because libxml2 Nodes belong to a single Document:

  1. XML.parse creates a temporary XML::Document.
  2. xpath_node returns a child node (image) that still belongs to this temporary document.
  3. parent.add_child(image) inserts the node into another document (parent) without detaching or copying it.
  4. When the temporary document is finalized by Crystal’s GC, it frees all its nodes, including image.
  5. The parent document still references the same image node. Later, when the parent document is finalized, it tries to free image again → double free.

Valgrind and GDB confirmed this pattern: the same address (0x1f4b6780) was freed twice — first by the temporary document finalizer, second by the parent document finalizer.

Step 5: Fixing the Problem

The solution is to insert a copy of the node into the document, rather than the original. The copy is fully independent and can safely be added to another document.

@[Link("xml2")]
lib LibXML
  fun xmlAddChild(parent : Node*, child : Node*)
  fun xmlCopyNode(old : Node*, extended : Int) : Node*
end

class XML::Node
  def add_child(child : Node)
    copied_node = LibXML.xmlCopyNode(child, 1)
    LibXML.xmlAddChild(self, copied_node)
  end
end
Enter fullscreen mode Exit fullscreen mode
  • Now, the temporary document used to create the node can be freed safely without causing crashes.
  • Any attributes, children, or other memory owned by the original document are copied correctly.

Step 6: Lessons Learned

  1. Nodes belong to a single document in libxml2; sharing them across documents without copying or detaching is unsafe.
  2. Valgrind and GDB are invaluable debugging tools:
    • Valgrind detects invalid frees and memory issues.
    • GDB lets you inspect the backtrace at the crash.
  3. Valgrind can be misleading at first because it does not trigger a crash; instead, you need to read the logs carefully to identify double-free memory addresses. Once found, it shows the allocation and free sites, which greatly helps in investigating the root cause.

That's all folks

Top comments (0)