DEV Community

Ilia Alshanetsky
Ilia Alshanetsky

Posted on • Originally published at ilia.ws

php_clickhouse 0.8.1: Three Releases Later, Stable

The launch post for php_clickhouse 0.6.0 covered the framing: native binary protocol, soft fork of the stalled SeasClick, modern ClickHouse types, 30-40% faster than HTTP at high throughput. That post landed April 25, 2026. Today (May 1, 2026) the current tag is 0.8.1, and I'm calling the extension stable.

The six days in between were a focused quality cycle, not a feature sprint. Three buckets:

  • Performance. Insert and write paths build native ClickHouse columns one at a time directly from row-major input. Peak intermediate PHP memory dropped from N_rows × N_cols zvals to one column.
  • Security. Strict full-consumption parsers across Map, narrow-int, Int128 / UInt128, geo, DateTime64, Time64, hex literals, and typed parameters. Wrong-type input throws instead of corrupting memory or coercing silently to zero. Recursive type-conversion gained a depth cap so adversarial server schemas can't blow the stack.
  • Stability. Per-Client state moved from file-scope std::map banks onto the zend_object itself. Unblocks ZTS, plugs leaks on bailout, fixes a refcount bug on the progress callback. Insert path recovers the native handle on every server-side rejection point so a thrown insert no longer wedges the connection.

Three releases (0.7.0, 0.8.0, 0.8.1) closed the API gap with the most-used HTTP client, refactored the extension's state model, hardened the insert surface, and surfaced one upstream UB fix that has since merged into clickhouse-cpp.

Here's the work.

0.7.0: Closing the Ergonomics Gap with smi2/phpClickHouse

The native binary protocol gives you 30-40% throughput. Most teams won't trade a familiar API for that, so the native client has to match the ergonomic surface of the most-used PHP HTTP client (smi2/phpClickHouse). 0.7.0 is the release that actually does that.

What landed:

  • setSettings(array) for client-wide ClickHouse settings (max_execution_time, max_memory_usage, async_insert). Per-call settings as a 5th array argument on select() / insert() / execute() / writeStart(). Per-call overrides global.
  • Server-side typed parameters via the {name:Type} placeholder syntax. Routed through Query::SetParam so the server quotes and parses according to the declared type. Plain {name} placeholders keep their existing client-side identifier-substitution behavior. Arrays format as ClickHouse array literals so Array(UInt32), Array(String) round-trip cleanly.
  • setProgressCallback(?callable) invoked for every Progress packet during a query (rows, bytes, total_rows, written_rows, written_bytes).
  • getStatistics() returning rows_read, bytes_read, total_rows, written_rows, written_bytes, blocks, rows_before_limit, applied_limit, elapsed_ms from the last completed query. Reset at the start of each query.
  • Structured ClickHouseException: server_code (e.g. 159 for TIMEOUT_EXCEEDED), server_name (DB::Exception), query_id. Populated on server errors and on any throw with a query-id context.
  • insertAssoc(table, rows) derives the column list from the keys of the first row.
  • SQL helpers: databaseSize(), tablesSize(), partitions(), showTables(), showCreateTable(), getServerUptime(). Each validates identifiers against the safe-character set.
  • Sub-second timeouts via connect_timeout_ms, receive_timeout_ms, send_timeout_ms config keys. Override the existing seconds-based keys when present.
  • Per-client query log accumulator: enableLogQueries(bool) toggles, getLogQueries() returns and clears. Each entry carries sql, query_id, elapsed_ms, rows_read, bytes_read, error_code, error_message.

The other under-the-hood change in 0.7.0 was migrating to a stub-driven arginfo workflow (clickhouse.stub.php → generated clickhouse_arginfo.h). Method parameter and return types are now declared at the engine boundary and visible to Reflection, IDEs, and static analyzers. Behavior is unchanged for correctly-typed callers; wrong-type callers now hit ZPP at the boundary instead of a custom thrown exception inside the method body.

None of 0.7.0 is novel on its own. The point is that without these the native client made you pay an ergonomics tax to get the speed. 0.7.0 settles that tab.

0.8.0: Per-Object State, ZTS, and Streaming

The 0.6.0 / 0.7.0 surface stored per-Client state in seven file-scope std::map<int, ...> banks keyed on Z_OBJ_HANDLE: the Client*, the in-flight insert Block, the ClientStats, the global settings, the progress and profile callbacks, the log toggle, the query log buffer.

That works, and it has three durability problems baked in:

  1. No ZTS support. Threaded SAPIs share that file-scope state across threads. The 0.6.0 code gated MINIT with a hard error when --enable-zts was on. ClickHouse from RoadRunner / FrankenPHP / Swoole / php-pm was a non-starter.
  2. Leaks on bailout. PHP's userspace __destruct doesn't run on fatal errors, so the map entries (and the underlying Client* and any half-open insert stream) leaked.
  3. Refcount bug on the progress callback. A struct copy of the registered callable went stale when the calling scope went out of scope, and the next progress packet hit a freed zval.

0.8.0 moved the per-Client state onto the zend_object itself via custom create_object / free_obj handlers. The seven file-scope maps disappear entirely. ZTS gating at MINIT was deleted in the same release.

The refactor unblocks three things at once:

  • Threaded SAPIs. No global state to thread-isolate, so ZTS Linux is a first-class target now. CI grew a linux-zts job (PHP 8.4 ZTS built from source).
  • Cleanup on bailout. free_obj runs unconditionally, including on fatal errors. The Client* and any half-open insert stream get torn down properly.
  • The progress-callback fix lands. setProgressCallback now uses ZVAL_COPY instead of a struct copy, so the callable doesn't get freed out from under the next packet.

A Windows config.w32 shipped in the same release, rewritten from a 9-line warning stub to a full Windows build script that mirrors config.m4's source list and flags. Optional --enable-clickhouse-openssl plumbing is mirrored via CHECK_LIB("libssl.lib", ...). CI exercises Windows as a build + extension-load smoke test (no live ClickHouse on Windows yet).

Streaming reads

0.8.0 introduced two new read paths for result sets that don't fit comfortably in a single PHP array:

$it = $ch->selectStream("SELECT id, payload FROM events WHERE day = today()");
foreach ($it as $row) {
    process($row);
}
Enter fullscreen mode Exit fullscreen mode

selectStream() returns a ClickHouseRowIterator (Iterator + Countable) that walks blocks lazily. The iterator survives unset($client) because blocks own their column data via shared_ptr.

For unbounded streams where you don't want to count or rewind:

$ch->selectStreamCallback(
    "SELECT id, body FROM events_unbounded",
    fn(array $row) => writeToS3($row),
);
Enter fullscreen mode Exit fullscreen mode

The callback fires once per row as blocks arrive, never accumulating the full result.

The plain select() path is unchanged and remains the faster choice when you actually want a full PHP array. The streaming variants exist for the row-millions case where you don't.

Geo, LowCardinality(Nullable), and the Map matrix

The type surface expanded too:

  • Geo types Point, Ring, Polygon, MultiPolygon round-trip via ColumnGeo. Point as [Float64, Float64], the others as nested arrays.
  • LowCardinality(Nullable(String)) and LowCardinality(Nullable(FixedString)) round-trip on read and write.
  • The insert path now accepts any Map(K, V) over scalar K and V (String, all signed/unsigned integer widths, Float32/64, UUID) plus LowCardinality(String) keys and values. The read path mirrors the same matrix except for LowCardinality keys (vendor gap). Previously only five hardcoded combinations worked.
  • SimpleAggregateFunction(f, T) reads transparently as T.

Geo support unblocks one of the two large reasons people stayed on the HTTP client. The other was streaming.

Other 0.8.0 surfaces worth naming

  • selectStatement() returns a ClickHouseStatement result wrapper: Iterator, Countable, ArrayAccess, JsonSerializable, plus fetchOne() / fetchKeyPair() / fetchColumn() / toArray() / statistics(). Read-only (offsetSet / offsetUnset throw). Carries a per-call stats snapshot so it survives the client running other queries afterwards.
  • setVerbose(bool|callable) for protocol-level lifecycle tracing. Pass true for JSON lines on STDERR, or a callable invoked with ($eventName, $context). Events: select_start, data_block, select_finish, execute_start, execute_finish, server_exception. No-op when off, so the hot path stays cheap on production.
  • DDL helpers: isExists(), showDatabases(), showProcesslist(), getServerVersion(), tableSize(), truncateTable(), dropPartition(). All identifier args validated; dropPartition SQL-escapes the partition value.
  • Client introspection: resetConnection(), getServerInfo() (name, version, revision, timezone, display_name), getCurrentEndpoint() (host/port of the active endpoint when an endpoints[] pool is in use), setProfileCallback(), ping_before_query config key.
  • query_id echoed through getStatistics() so callers can correlate a stats snapshot to a server-side query in system.query_log.
  • smi2-style sugar: setSettings() returns $this for chaining, setSetting(key, value) for the single-key form, setDatabase(string) issues USE and updates the cached default used by databaseSize() / showTables(), getter aliases (getServerCode(), getServerName(), getQueryId()) on ClickHouseException.

IPv4 / IPv6 crash, fixed

This one's worth calling out as a bug-of-the-release. clickhouse-cpp v2.6.1 made ColumnIPv4 / ColumnIPv6 siblings of (not subclasses of) ColumnUInt32 / ColumnFixedString. The 0.6.0 / 0.7.0 read paths were doing As<ColumnUInt32>() / As<ColumnFixedString>() on IP columns, which now returned null instead of dispatching. The next dereference segfaulted the worker.

Fixed by switching to ColumnIPv*::AsString(row) for canonical dotted-quad / ::1 form. If you hit a crash on IP column reads pre-0.8.0, this is why.

Distribution: pre-built binaries via PIE

Binaries for Linux glibc (x86_64 + arm64) and macOS (x86_64 + arm64) are now available. On a supported platform the install collapses to one line:

pie install iliaal/php_clickhouse
Enter fullscreen mode Exit fullscreen mode

No vendored clickhouse-cpp build, no abseil compile, no five-minute make. TLS still requires the source build (pie install iliaal/php_clickhouse --enable-clickhouse-openssl), but that's a smaller set of users.

0.8.1: The Insert Path That Recovers

0.8.0 was the architecture release. 0.8.1 was the hardening pass: nine rounds of reviewer-driven fixes, mostly on the insert and write surface plus the type-conversion boundary.

The headline bug:

ClickHouseException: cannot execute query while inserting
Enter fullscreen mode Exit fullscreen mode

If a server-side insert rejection (missing table, bad column, CHECK constraint, schema drift) threw out of BeginInsert / SendInsertBlock / EndInsert, the vendored client's inserting_ flag stayed set. Subsequent select / execute on the same handle threw the message above until the caller manually called resetConnection().

0.8.1 wraps every server-side rejection point in a connection-reset-then-rethrow. Same handle stays usable.

Destructor cleanup mirrors the same dirty/clean recovery split: an in-flight streaming insert with sent blocks is dropped via ResetConnection on unset() rather than committed via EndInsert. Clean sessions still EndInsert. Avoids partial commits on script bailout.

Memory: column-at-a-time insert

Pre-0.8.1, insert() and write() materialized a full column-major PHP zval matrix from the user's row-major input before building the native ClickHouse columns. For a 1M-row × 30-column insert that's 30M zvals sitting in PHP memory while the column build runs.

0.8.1 builds native columns one at a time directly from the row-major input. Peak intermediate PHP memory drops from N_rows × N_cols to one column.

insertAssoc() benefited from the same change: no more positional copy of input rows. The column gatherer reads each column directly from the original associative rows, and key validation uses zend_hash_exists against the first row's HashTable instead of allocating a new std::string for every row key.

Strict parsers across the type surface

Map, narrow-int (Int8 / Int16 / Int32 / their unsigned siblings), Int128 / UInt128, geo, DateTime64, Time64 insert paths now use full-consumption strict parsers. Non-numeric strings, fractional doubles, non-finite floats, and out-of-range values throw instead of silently coercing to 0 / 0.0 inside the column.

UInt64 inserts gained a shared strict_zval_u64 parser that accepts decimal and hex strings above ZEND_LONG_MAX on both the scalar and Map(*, UInt64) paths. Reads continue to surface upper-half values as decimal strings.

The class of bug strict parsing eliminates is the worst kind of insert bug: the string "foo" lands in an Int32 column as 0, no error, no audit trail. Now it throws.

Validation and reentry

A few smaller fixes worth naming:

  • write() rejects rows narrower or wider than the writeStart column count. The previous path took the first row's element count as authoritative, so [1] against writeStart(t, ['a','b']) landed 1 into column a with b defaulted server-side.
  • insert() rejects rows with extra positional or named cells. A row like [1, 99] against a single-column table previously landed as 1 with 99 lost.
  • A failed later write() no longer commits previously sent blocks. The catch path tracks whether any block has been sent in the current writeStart() session and chooses ResetConnection (discard) over EndInsert (commit) on a dirty session.
  • insertAssoc() rejects integer-keyed later rows and any key-set drift from the first row. The first row defines the column set; every later row must match.
  • Enum8 / Enum16 inserts reject undeclared integers, NULL on non-Nullable columns, and unknown string names.
  • Single-token placeholder validator: {name} placeholders accept exactly one identifier and reject comma-separated lists. Comma-list callers must use array form.
  • Same-client reentry guard: a userland progress / profile callback that fires another query on the same handle now throws cleanly instead of crashing the worker on the next ReceiveData.
  • Recursive type-conversion depth cap (32) keeps deeply nested structures (Array(Array(...)), Map(K, Tuple(...))) from blowing the stack.

23 new PHPTs (072–094) pin all of the above.

Upstream: One Fix Merged Back to clickhouse-cpp 🎆

The ASan job added in 0.8.0 caught a latent UB in the vendored library that nobody had been hitting in production, but UBSan flagged on every empty LowCardinality(String) value:

runtime error: null pointer passed as argument 2,
  which is declared to never be null
Enter fullscreen mode Exit fullscreen mode

ColumnStringBlock::AppendUnsafe was calling memcpy(pos, str.data(), str.size()) unconditionally. When str was constructed from an empty std::string, str.data() is allowed to be NULL, and libc declares memcpy's second argument with __attribute__((nonnull)) regardless of the size. Every libc no-ops memcpy(_, NULL, 0) in practice, so the bug was benign on real workloads, but the false-positive UBSan trip was noising the extension's ASan job and obscuring real findings.

Patch: guard the memcpy with if (str.size() > 0). Submitted upstream as clickhouse-cpp#489, merged 2026-04-27. The local patch in lib/clickhouse-cpp/LOCAL_PATCHES.md will drop the next time the vendored library bumps.

What's Still Missing

Two limitations carry forward from clickhouse-cpp v2.6.1:

  • SELECT ... WITH TOTALS and SETTINGS extremes=1 throw unimplemented 7 from the cpp layer. The vendored library does not dispatch the Totals / Extremes packet types (upstream issue #297). getTotals() / getExtremes() are deferred.
  • Map(LowCardinality(K), V) reads are not yet decoded by the vendored library (writes succeed). showProcesslist() selects a fixed projection of standard columns to avoid the unsupported Map columns (ProfileEvents, Settings, used_*).

If either blocks your workload, file an issue at github.com/iliaal/php_clickhouse with the schema and a minimal repro. Both are upstream and tracked.

The repo is at github.com/iliaal/php_clickhouse. Install via PIE: pie install iliaal/php_clickhouse (add --enable-clickhouse-openssl for TLS). The original launch post that framed the fork story sits at ilia.ws.

Top comments (0)