xAI open-sourced their ranker. It doesn't compile.

#ai #opensource #security #discuss

On 2026-05-15, xAI pushed an update to xai-org/x-algorithm — the recommender behind X's For You feed. Three years after Twitter's original 2023 release, under different leadership.

I cloned it and ran analysis on it.

The released code does not compile.

What's missing

There's no Cargo.toml anywhere in the repo. The Rust crates reference crate::params::FAVORITE_WEIGHT, crate::params::REPORT_WEIGHT, crate::params::OON_WEIGHT_FACTOR — sixty-plus named symbols — but crate::params is not in the tree. Same for crate::clients::* (referenced 58 times, all Prod*Client implementations missing), the entire xai_feature_switches plane, and the training code.

The values were withheld. That's the headline read.

The seams of the sanitization

What's interesting is the unevenness of the pass. The redaction ran out of energy in specific, telling ways:

const TWEET_EVENT_TOPIC: &str = "";

The topic name variable is preserved. The topic string is empty. A mechanical search-and-replace on string literals left the variable intact.

std::env::var("")

Same shape — the env-var-reading call survives; the env-var name is empty-stringed out.

if ( and self.deluxe ...

Actual invalid Python in grox/classifiers/content/safety_ptos.py. A predicate was deleted; the and was left dangling. The file does not parse as shipped. Nobody re-ran the test suite after the cut.

ModelName.EAPI_REASONING_INTERNAL

Preserved verbatim in a public file. An internal model identifier with _INTERNAL in its name, shipped to a public release. That's a survivor.

PTOS_CUTOFF_TWEET_ID = 2_054_275_414_225_846_272

A real Snowflake date cutoff for "tweets after this date require PTOS_REVIEWED label or default to MediumRisk." That's a policy boundary, in plaintext, in a public file.

Zero TODO, zero FIXME, zero XXX across two hundred source files. Someone swept the comments. Combined with the syntax errors, the empty strings, and the _INTERNAL leak, it reads as a hurried mechanical pass over code that wasn't structured for public release.

If you're keeping score, this looks like a textbook screw-up: they shipped less than 2023 Twitter did (which shipped actual numeric weights — favorite=0.5, reply=13.5, reply_engaged_by_author=75, report=-369), the redaction was sloppy, and the consolation prize is that nothing in the released form is even runnable.

What the redaction did not touch

But hold that read next to this.

Every weight has a symbol. Every threshold has a symbol. Every feature flag has a symbol. The values are gone. The schema is fully public.

Now think about who the competitor audience actually is. Meta, TikTok, Pinterest, Reddit, Discord — companies with their own users, their own engagement data, their own A/B testing infrastructure. What do they lack?

They don't lack data. They have their own users. What they lack is the list of dials xAI considers worth having.

"FAVORITE_WEIGHT = 1.0" is a single data point on someone else's product. But the set:

FAVORITE_WEIGHT, REPORT_WEIGHT, OON_WEIGHT_FACTOR, NEW_USER_OON_WEIGHT_FACTOR,
AUTHOR_DIVERSITY_DECAY, AUTHOR_DIVERSITY_FLOOR, MAX_POST_AGE, ...

That's not data. That's the design of the search space. The values are recoverable from there if you have your own data — which a serious competitor does.

Search space matters more than search values when you have your own engagement data. The release shipped the search space.

So which is it?

Reading A — the oopsie. Mechanical sanitization stripped numeric values. The same script left symbol names alone because it didn't have to scrub them. The schema leak is a side-effect. They thought they were hiding the operational details. They didn't realize that for serious competitors, the architecture is most of the operational details.

Reading B — the play. Someone at xAI knew exactly what they were doing. The values get you sued, regulated, front-paged. The names signal architectural sophistication, anchor recruiting conversations, lock competitors into your conceptual framework, and let you claim transparency without leaking what actually matters. The schema is meant to be visible.

You can't tell which is right from the artifact alone. But the diagnostic question generalizes.

Read the function, not the artifact

A partial disclosure is one observation. The interesting object is the function that produces the artifact — and that function shows itself across releases, not within one.

Reading A predicts: the next release will have more leaks of the same kind, because the underlying process is mechanical and the codebase keeps evolving.

Reading B predicts: the next release will close some of the current leaks and open new disclosure surfaces, because there's a calibration loop.

They diverge over a release cadence, not over a single release.

If I had to bet right now, I'd bet both are partially true: leadership chose architectural-transparency-without-operational-disclosure as the release shape (deliberate), and the actual execution of that shape was a sanitization pipeline that wasn't entirely careful (accidental). The empty-string Kafka topics and the broken Python are sloppy execution of a deliberate strategy.

The operational consequence is identical either way: the schema is public. What matters is what comes next.

What I'd take from this

When you can't grep for numbers, grep for symbol names. The set of names tells you what the operator considers a tunable axis. That's design intent, and it's hard to fake or obscure once it's in source.

The redaction's seams are diagnostic. Mechanical sanitization leaves footprints: empty strings, broken syntax, surviving _INTERNAL identifiers, zero-TODO source trees. If the seams are obvious, the redaction was probably mechanical and the disclosure was probably not curated artifact-by-artifact. If there are no seams, somebody was paying attention.

Originally posted at bjro.dev. Co-authored with Claude Opus 4.7.