DEV Community: Shixin Zhang

Can AI Really Write Quantum Computing Code? Introducing ORBIT-Q: A Dual-Axis Benchmark for AI Agents and Quantum Software Frameworks

Shixin Zhang — Wed, 08 Jul 2026 05:35:39 +0000

Large language models have become remarkably good at writing conventional software. Frameworks like Codex, Claude Code, and other coding agents can already solve a large fraction of real-world programming tasks.

But scientific programming—especially quantum computing—is a very different challenge.

A quantum program is not simply expected to compile and produce the correct output. It must also preserve physical correctness, maintain differentiability, respect algorithmic constraints, and often achieve high computational performance. Traditional software benchmarks rarely capture these requirements.

To evaluate how AI agents perform in this setting, we developed ORBIT-Q (Open Research Benchmark for Integrated Tasks in Quantum Computing), a benchmark specifically designed for autonomous scientific programming in quantum computing.

Why Existing Benchmarks Are Not Enough

In conventional coding benchmarks, passing unit tests is often sufficient.

In scientific computing, however, an implementation may pass numerical tests while still being fundamentally wrong.

During our experiments we frequently observed behaviors such as:

Framework bypassing. Instead of using the requested quantum framework, the agent secretly reconstructs the computation with NumPy or JAX tensor operations.
Broken differentiability. The generated code produces correct numbers but destroys the end-to-end automatic differentiation pipeline.
Violation of physical assumptions. The implementation changes the intended mathematical or physical problem while still appearing to "work."

These failures are difficult to detect using standard execution-based evaluation alone.

Scientific programming therefore requires evaluation beyond correctness—it requires semantic verification.

ORBIT-Q: A Dual-Axis Benchmark

ORBIT-Q consists of 12 challenging research-level quantum programming tasks, covering representative workloads in quantum simulation, quantum machine learning, tensor network algorithms, optimization, and automatic differentiation.

Its key idea is a dual-axis evaluation protocol.

Axis 1: Agent Evaluation

Keep the quantum framework fixed and compare different AI agents.

This measures how well various models (GPT, Claude, etc.) can solve scientific programming tasks under identical software environments.

Axis 2: Framework Evaluation

Keep the AI agent fixed and compare different quantum software frameworks.

This evaluates not only functionality and runtime performance, but also something increasingly important in the AI era:

How AI-friendly is a software framework?

A framework with discoverable APIs, composable abstractions, and consistent interfaces allows autonomous agents to generate substantially better solutions.

Preventing "Cheating"

To ensure generated solutions genuinely use the intended framework, ORBIT-Q employs a three-stage verification pipeline:

Deterministic functional testing
LLM-based source-level semantic auditing, designed to detect framework bypassing and other implementation shortcuts
Expert manual review

This combination substantially reduces false positives that commonly appear in conventional coding benchmarks.

Results: Which Frameworks Work Best?

Using the same coding agent (Codex + GPT-5.5), we evaluated several mainstream quantum software frameworks.

TensorCircuit-NG achieved the strongest overall performance, successfully solving 10 out of 12 benchmark tasks while also delivering significantly faster execution than competing frameworks.

For comparison:

TensorCircuit-NG: 10 / 12
PennyLane: 8 / 12
TorchQuantum: 4 / 12
MindQuantum: 4 / 12

Beyond success rate, TensorCircuit-NG consistently produced solutions that executed several times faster than those generated for other frameworks.

This suggests that framework design has a substantial impact on autonomous scientific programming—not only for human developers but also for AI agents.

Results: Which AI Agents Perform Best?

Under the TensorCircuit-NG environment, the leading coding agents achieved:

Codex + GPT-5.5: 10 / 12
Claude Code + Opus 4.8: 9 / 12
Claude Code + Sonnet 4.6: 7 / 12

Although these results are encouraging, a significant gap remains between AI-generated solutions and expert-written implementations.

Human experts solved all 12 tasks, while producing implementations that were typically more than twice as efficient as the strongest AI-generated solutions.

Current AI systems are becoming effective research assistants, but they are still far from replacing domain experts in scientific software development.

An Unexpected Observation: Safety False Positives

One particularly interesting finding was unrelated to quantum computing itself.

During evaluation with Claude Code + Opus 4.8, two benchmark tasks failed—not because the model lacked the necessary programming capability, but because the interaction was interrupted by Cybersecurity Refusals.

These tasks involved no networking, no external services, and no security-sensitive objectives. They consisted solely of local quantum programming and framework exploration.

This illustrates an often-overlooked issue in agent evaluation:

Product-level safety policies can significantly affect end-to-end task reliability, even when the underlying model is technically capable of solving the problem.

For autonomous scientific workflows, reliability depends not only on model intelligence but also on surrounding product behavior.

The Economics of Scientific AI

Another interesting lesson concerns inference cost.

Developers often compare models by token price alone.

Our experiments suggest this can be misleading.

Lower-cost models frequently require many more iterations because they generate incorrect implementations, encounter execution failures, or repeatedly need debugging.

Consequently, obtaining one successful scientific solution may consume substantially more time and tokens than using a stronger (but more expensive) model.

For scientific programming, a more meaningful metric may be:

Cost per successful scientific solution

rather than simply cost per token.

Looking Ahead

As AI agents become increasingly integrated into scientific research, software frameworks will need to evolve as well.

Future scientific software should not only be easy for researchers to use—it should also be easy for autonomous agents to understand, compose, and optimize.

Although ORBIT-Q focuses on quantum computing, we believe its evaluation methodology can be generalized to many areas of scientific computing where correctness, semantics, differentiability, and performance all matter.

If AI is going to become a true collaborator in scientific discovery, we need benchmarks that measure much more than whether code simply runs.

Paper

ORBIT-Q: Dual-axis Benchmarking of Autonomous Agents in Scientific Quantum Programming

Shi-Xin Zhang and Yu-Qin Chen

arXiv:2607.03105

The benchmark, evaluation framework, and source code are all open source at GitHub: https://github.com/sxzgroup/ORBIT-Q and the accompanying webpage: https://sxzgroup.github.io/ORBIT-Q/.

The "Secret of Staying Young" in Quantum Neural Networks

Shixin Zhang — Tue, 07 Jul 2026 06:57:41 +0000

How quantum geometry helps preserve learning ability in continual learning

In quantum machine learning, new models are often evaluated by how much they improve benchmark accuracy over classical baselines. These quantitative gains, however, are frequently fragile. They can depend heavily on the choice of baseline models, hyperparameters, or other experimental details.

A more fundamental question is whether quantum and classical learning systems exhibit qualitatively different learning dynamics. Such structural differences, if they exist, reveal something deeper than a few percentage points of accuracy—they provide insight into the underlying mechanisms of learning itself.

A recent breakthrough study published in PRX Quantum by Yu-Qin Chen of the Graduate School of the Chinese Academy of Sciences and Shi-Xin Zhang of the Institute of Physics, Chinese Academy of Sciences, explores this question from the perspective of continual learning. Instead of asking whether quantum neural networks achieve higher accuracy, the work asks:

Can quantum neural networks preserve their ability to learn over long periods of continual training? If so, why?

The answer turns out to reveal a surprising geometric advantage rooted in quantum mechanics itself.

AI's Midlife Crisis: Losing the Ability to Learn

Continual learning aims to build models that, much like humans, continuously accumulate knowledge while adapting to new tasks and changing environments.

Historically, research has focused on catastrophic forgetting—the tendency of neural networks to overwrite previously learned knowledge when learning new tasks.

In recent years, however, researchers have recognized another equally important challenge.

As training continues across many tasks, models gradually become less capable of learning new information. Their parameters become increasingly difficult to update, gradients become less informative, and adaptation slows dramatically.

This phenomenon is known as loss of plasticity.

Among the earliest researchers to emphasize its importance was reinforcement learning pioneer Richard Sutton, who argued that for long-running learning systems, catastrophic forgetting and loss of plasticity are two complementary problems:

Catastrophic forgetting determines how well a model retains old knowledge.
Loss of plasticity determines how well it can acquire new knowledge.

An intuitive analogy is that the model gradually "ages." Although it accumulates more experience, it becomes increasingly resistant to learning anything new.

Do Quantum Models Age More Slowly?

The natural question is whether quantum neural networks suffer from the same phenomenon.

The study first addressed a simple question:

Do quantum neural networks preserve plasticity better than classical neural networks?

The answer appears to be yes.

Across continual learning experiments involving more than 3,000 sequential tasks, a remarkably consistent pattern emerged.

Classical neural networks steadily lost their learning ability as training progressed.

Quantum neural networks, in contrast, maintained a much higher level of plasticity throughout long training sequences.

But observing the phenomenon is only the beginning.

The more interesting question is:

Why does this happen?

Geometry Matters

To understand the difference, consider where the parameters of a neural network live.

Classical neural network weights inhabit ordinary Euclidean space. In principle, parameter norms can grow without bound.

During prolonged continual training, optimization often drives these parameters toward increasingly large magnitudes.

Initially, this helps fit the data.

Eventually, however, several undesirable effects emerge:

neurons become increasingly saturated,
effective gradients shrink,
parameter updates become harder,
the trace of the Fisher Information Matrix steadily decreases.

Together, these effects gradually reduce the model's ability to adapt to new tasks.

This suggests that loss of plasticity is fundamentally connected to the geometry of the parameter space.

Why Quantum Neural Networks Behave Differently

Quantum neural networks follow an entirely different geometric trajectory.

The reason is not a specially designed continual-learning algorithm.

Instead, it originates from one of the most fundamental principles of quantum mechanics.

Quantum evolution is described by unitary transformations.

Mathematically, the trainable parameters correspond to rotations on compact Lie groups.

Unlike Euclidean space, these parameter manifolds are compact.

Parameters can continue evolving indefinitely, but they cannot drift arbitrarily far away.

This geometric constraint naturally prevents the unbounded parameter growth observed in classical networks.

As a result,

gradients remain in a healthy range,
parameter norms stay bounded,
the Fisher Information Matrix remains active,
and the network continues to retain the ability to learn new tasks.

In other words, the advantage of quantum models may not come solely from having richer computational representations.

It may also arise from the geometry imposed by the laws of quantum physics.

Rather than expanding without limit, quantum parameters evolve on a compact manifold whose structure naturally protects learning plasticity over time.

This geometric explanation is arguably more interesting than reporting another benchmark improvement.

Instead of asking whether one model wins by a few percentage points on a particular dataset, it asks whether quantum and classical learning systems obey fundamentally different learning dynamics during long-term adaptation.

From Theory to Large-Scale Validation

A theoretical explanation is only convincing if it survives large-scale empirical testing.

To validate the proposed mechanism, the authors constructed multiple continual learning benchmarks involving

more than 3,000 sequential learning tasks,
quantum circuits with depths up to 30 layers,
and over 4,000 trainable quantum parameters.

These experiments are considerably more demanding than conventional machine learning benchmarks.

Each configuration effectively requires training thousands of quantum neural networks while continuously monitoring internal quantities such as gradient statistics and the Fisher Information Matrix throughout optimization.

Such experiments would be prohibitively slow—or simply infeasible—using many conventional quantum software frameworks.

The computational foundation of this work therefore relied heavily on TensorCircuit-NG, an open-source quantum computing framework that combines tensor-network simulation, automatic differentiation, and high-performance GPU acceleration. These capabilities make long-horizon, large-scale continual learning experiments computationally practical.

A Different Perspective on Quantum Advantage

This work does not claim that quantum neural networks have solved continual learning.

Catastrophic forgetting still exists, and many questions about memory retention, stability, and continual adaptation remain open.

Instead, the paper offers a different perspective on quantum advantage.

Discussions of quantum machine learning often emphasize computational speedups or asymptotic complexity advantages.

But real intelligent systems require more than fast learning.

They must also continue learning over time.

Continual learning requires both remembering what has already been learned and remaining capable of acquiring new knowledge.

Catastrophic forgetting addresses the first challenge.

Loss of plasticity addresses the second.

Both are essential.

If quantum neural networks can naturally preserve their capacity to learn throughout long-term adaptation—not because of additional engineering tricks, but because of the geometry dictated by quantum mechanics—then this "ageless" plasticity may represent a compelling and fundamentally different form of quantum advantage.

Reference

Chen, Y.-Q., & Zhang, S.-X. (2026). Intrinsic Preservation of Plasticity in Continual Quantum Learning. PRX Quantum, 7, 033003.

The Two Paradigms of Scientific Computing Agents: Abstraction, Openness, and "The Bitter Lesson"

Shixin Zhang — Mon, 22 Jun 2026 13:05:07 +0000

In recent years, the rapid evolution of Large Language Models (LLMs) has turned "AI + Scientific Computing" into a highly active frontier. Whether in molecular dynamics, material and drug design, or quantum computing, numerous platforms are attempting to bridge natural language interfaces with rigorous scientific computation.

From a user experience perspective, this approach significantly lowers the barrier to entry, allowing non-experts to breeze through standardized experimental workflows. However, when we shift our focus from "Can it run a standard experiment quickly?" to "Does it support open-ended scientific exploration?", a stark architectural divide emerges regarding abstraction boundaries and system openness.

Currently, Scientific Computing Agent systems can be broadly categorized into two technical paradigms:

Encapsulated Systems: Running in controlled cloud sandboxes, these systems typically provide pre-configured, templated workflows accessible via a Web UI.
Open & Programmable Systems: Operating within general-purpose computing environments, these systems (like Claude Code or Codex) integrate deeply with code repositories, runtimes, and external toolchains.

While both rely on conversational interfaces, their core difference lies in their habitat: is the Agent living in a closed ecosystem of cloud templates, or an open, customizable computing space?

Abstraction Boundaries vs. The Space for Innovation

Every software system must strike a balance between ease of use and flexibility. For standardized scientific tasks, encapsulated systems shine. However, when a research question deviates from standard templates, the very abstraction that reduces complexity becomes a bottleneck.

Here is a clear comparison of the two paradigms:

Dimension	Encapsulated Systems	Open & Programmable Systems
Representative Examples	Domain-specific Web-based AI platforms	General-purpose agents like Claude Code, Codex
Execution Environment	Pre-defined, controlled cloud sandboxes; highly templated	General compute environments (native OS, containers, local/cloud)
Abstraction Boundary	High (Hides underlying engineering details)	Low (Direct access to file systems, low-level compute libraries, and dependencies)
Ideal Use Cases	Education, running standard algorithms, rapid benchmarking	Exploratory frontier research, highly customized workflows
Handling Novel Problems	Wait for platform updates, or revert to writing code manually	Break out of the framework; freely compose modules and custom logic

Take quantum computing as an example. For standard Variational Quantum Algorithms (VQAs), encapsulated Web platforms can easily handle the entire pipeline—from quantum circuit construction and parameter optimization to result visualization. By condensing complex engineering details into a few pre-built templates, users can complete experiments with minimal cognitive load.

But the moment a researcher’s needs veer off the beaten path, this abstraction hits a wall. Suppose a researcher wants to combine a novel data encoding method, a highly customized quantum gate structure, and a non-standard loss function. Because this bespoke architecture doesn't map to existing templates, the encapsulated system's API simply rejects it.

In contrast, open programmable systems support these novel combinations because they don't pre-define the shape of the problem; they merely provide computing primitives. In these environments, circuit construction, training loops, loss functions, and data pipelines are all exposed as raw code. An Agent (or researcher) can freely import new Python modules, alter the training loop, inject custom gradient estimation methods, or couple a quantum simulator with an external data pipeline. Because the system hasn't hardcoded these steps into indivisible blocks, a problem that breaks an encapsulated system is just another day of writing code for an open system.

Project Context and the Information Horizon

If architecture forms the skeleton of a system, context forms the Agent's horizon. The quality of an Agent's reasoning is inextricably tied to the scope of information it can access.

In real-world scientific computing, the "state" of a project is never just a few chat prompts or isolated data uploads. It is a massive, ongoing web of information: repository directory structures, historical scripts, local test datasets, related PDF papers, version control histories, and past error logs.

Constrained by cloud sandbox isolation, an encapsulated Agent's horizon is usually limited to the current ephemeral session; its understanding of state is fragmented. Conversely, in an open programmable system, an Agent like Claude Code operates as a first-class citizen within the compute environment. It can directly read the real-time state of the entire project directory. If a user asks to tweak an initialization parameter based on the last run, the open Agent can fetch historical logs, diff code versions, and execute reliable reasoning backed by full project context.

The difference is fundamental: is the Agent trapped in a single, isolated interaction, or is it embedded in the continuous information network of a real research project?

From Code Generation to Workflow Orchestration

Context dictates reasoning, but action drives results. Once an Agent grasps the global state, its capabilities expand from mere code generation to system-level orchestration. This is the second great divide between the two paradigms: the breadth of agency.

Scientific computing rarely stops at writing a single algorithm script. It is usually a complex workflow spanning multiple independent tools. Under the open paradigm, an Agent doesn't just write logic using native frameworks; it executes system-level operations. It can SSH into High-Performance Computing (HPC) clusters to submit jobs, set up cron jobs to monitor GPU utilization, read stack traces to auto-retry crashed experiments, clean data post-run, generate charts, and even draft the initial manuscript.

While encapsulated systems confine the Agent to a proprietary loop, open systems grant Agents cross-platform, cross-tool autonomy, turning them into genuine collaborators.

General Beats Specialized: The "Bitter Lesson" in the Agent Era

This transition from encapsulated to open, and from specialized cloud platforms to general programming environments, perfectly echoes Richard Sutton's famous essay, "The Bitter Lesson". Sutton observed that throughout AI history, specialized methods meticulously hand-crafted using human domain knowledge are ultimately surpassed by general methods that leverage massive computation.

This philosophy holds entirely true for AI in scientific computing. Today, many platforms pour immense resources into building highly specialized Agents for niche domains, designing bespoke UIs and rigid workflow templates. In the short term, these make for incredibly smooth demos.

However, as the capabilities of foundation models scale exponentially, general-purpose Agents (like Claude Code or Codex) are becoming overwhelmingly powerful. They don't need a specialized UI wrapper. Drop them into a standard, open computing environment, and their generalized reasoning allows them to read domain documentation on the fly, call low-level scientific libraries, and independently orchestrate complex analysis.

The meticulously hardcoded workflows of domain-specific Agents risk rapid obsolescence. Often, their domain capabilities struggle to keep pace with the generalized leaps made by foundation models.

Recognizing this, a new generation of scientific computing frameworks is aligning with the open paradigm. For instance, in the quantum computing space, TensorCircuit-NG is a prime example of embracing the general Agent route. It abandons traditional closed-platform UI wrappers in favor of a native AI stack, offering hyper-performance low-level primitives alongside out-of-the-box skill suites. Its core design philosophy is simple: empower general-purpose Agents to freely explore and orchestrate complex science within an open environment.

Conclusion

Encapsulated and open programmable systems represent two distinct technological philosophies. The former lowers the barrier to entry via high-level abstraction, playing a crucial role in education and standard reproducibility.

However, in uncharted, fast-moving frontier sciences, maintaining system openness and generality is paramount. Allowing increasingly capable general Agents to dive deep into real, low-level engineering environments—breaking free from rigid abstraction boundaries—is the most sustainable path for AI to truly revolutionize scientific computing.

PyTrees Are Not One Thing: JAX, PyTorch, and TensorFlow Compared

Shixin Zhang — Fri, 12 Jun 2026 06:18:20 +0000

PyTrees look deceptively simple. You flatten a nested Python object into leaves, keep a structure descriptor, and later rebuild or map over the same shape. That abstraction is powerful enough to carry optimizer states, model parameters, batched inputs, gradients, and sharding annotations. It is also just ambiguous enough that three major frameworks implement three subtly different languages under the same idea.

This note compares JAX jax.tree_util, PyTorch torch.utils._pytree, and TensorFlow tf.nest. I tested the behavior in two environments: an older stack with JAX 0.4.35, PyTorch 2.2.2, TensorFlow 2.20.0, and a newer stack with JAX 0.10.0, PyTorch 2.12.0, TensorFlow 2.21.0. Most flatten/unflatten semantics were stable across these versions. The main version-sensitive result is PyTorch: _pytree.tree_map in 2.2.2 accepts only one pytree, while 2.12.0 supports multiple pytrees and behaves much closer to JAX prefix-style mapping.

The short version: JAX treats pytrees as a transformation language, PyTorch is converging toward that model in torch.func, and TensorFlow exposes a broader nested-structure utility through tf.nest. Those differences show up exactly where backend-agnostic libraries usually hurt: None, dictionary order, custom containers, tree_map, autodiff, and vectorization.

The Shape Of The APIs

The three APIs have the same surface story but not the same contract.

from jax import tree_util as jtu
from torch.utils import _pytree as tpu
import tensorflow as tf

leaves, treedef = jtu.tree_flatten(tree)
tree = jtu.tree_unflatten(treedef, leaves)
tree = jtu.tree_map(f, *trees)

leaves, spec = tpu.tree_flatten(tree)
tree = tpu.tree_unflatten(leaves, spec)
tree = tpu.tree_map(f, tree)          # PyTorch 2.2.2
tree = tpu.tree_map(f, *trees)        # PyTorch 2.12.0

leaves = tf.nest.flatten(tree)
tree = tf.nest.pack_sequence_as(structure, leaves)
tree = tf.nest.map_structure(f, *structures)

Flattening means "which objects are leaves?" Unflattening means "what metadata is needed to reconstruct the original container?" Mapping means "what does it mean for several structures to match?" Those three questions are where the frameworks diverge.

JAX calls its structure descriptor a PyTreeDef, so treedef is the conventional variable name. PyTorch calls the analogous descriptor a TreeSpec, so examples and internals often name it spec. Conceptually they play the same role: they describe the container skeleton and the metadata needed to rebuild it from a flat leaf list. TensorFlow's tf.nest does not return a separate treedef object from flatten; instead, pack_sequence_as takes an existing nested structure as the template.

There is also a small argument-order trap. JAX unflattens as tree_unflatten(treedef, leaves), while PyTorch unflattens as tree_unflatten(leaves, spec). TensorFlow's equivalent is pack_sequence_as(structure, leaves).

A Compact Map Of The Differences

Case	JAX	PyTorch `_pytree`	TensorFlow `tf.nest`
Scalar	Leaf	Leaf	Leaf
`None`	Empty pytree, 0 leaves	Leaf	Leaf
`list`, `tuple`	Containers	Containers	Containers
namedtuple	Container, type-strict	Container, type-strict	Container, type-strict
plain `dict` order	Sorted keys	Insertion order	Sorted-key leaf order
`OrderedDict` order	Insertion order	Insertion order	Sorted-key leaf order
`defaultdict` order	Sorted keys	Insertion order	Sorted-key leaf order
`defaultdict.default_factory`	Preserved	Preserved	Preserved
custom `dict` subclass	Leaf unless registered	Leaf unless registered	Container
custom `list`/`tuple` subclass	Leaf unless registered	Leaf unless registered	Container
dataclass instance	Leaf unless registered	Leaf unless registered	Leaf by default
multi-arg `tree_map`	Supported, prefix semantics	PyTorch 2.2.2: not supported; PyTorch 2.12.0: supported with prefix semantics	Supported, strict same structure
unflatten arity mismatch	Raises `ValueError`	Raises `ValueError`	Raises `ValueError`

The rest of the note explains why these rows matter.

`None`: A Ghost Node In JAX, A Leaf Elsewhere

The cleanest way to feel the philosophical split is None. In JAX, None is not a value to map over. It is a zero-leaf structural marker.

jtu.tree_flatten(None)
# leaves: []
# treedef: PyTreeDef(None)

jtu.tree_map(lambda x: ("mapped", x), None)
# None

In PyTorch and TensorFlow, None is a leaf.

tpu.tree_flatten(None)
# [None]

tf.nest.flatten(None)
# [None]

tpu.tree_map(lambda x: ("mapped", x), None)
# ("mapped", None)

tf.nest.map_structure(lambda x: ("mapped", x), None)
# ("mapped", None)

The nested case makes the difference visible:

tree = [1, None, 3]

jtu.tree_flatten(tree)[0]
# [1, 3]

tpu.tree_flatten(tree)[0]
# [1, None, 3]

tf.nest.flatten(tree)
# [1, None, 3]

If None means "optional value absent", JAX treats it structurally. If None means "a value in my tree", PyTorch and TensorFlow are closer to that intuition.

Dictionaries: The Same Keys, Different Time Arrows

Plain dict is a container everywhere, but the traversal order differs. JAX sorts keys, PyTorch follows insertion order, and TensorFlow assigns leaves by sorted keys while preserving the original mapping order when rebuilding.

tree = {"b": 2, "a": 1}

jtu.tree_flatten(tree)[0]
# [1, 2]   # a, then b

tpu.tree_flatten(tree)[0]
# [2, 1]   # b, then a

tf.nest.flatten(tree)
# [1, 2]   # a, then b

Replacing the leaves with [10, 20] shows the reconstruction contract:

# JAX
{"a": 10, "b": 20}

# PyTorch
{"b": 10, "a": 20}

# TensorFlow
{"b": 20, "a": 10}

TensorFlow's result is the surprising one on first read. It maps values according to sorted keys, but prints in the original insertion order. The object order and the leaf assignment order are not the same concept.

Mixed incomparable key types are another consequence of sorting. JAX and TensorFlow fail on {1: "one", "2": "two"} because 1 < "2" is not defined. PyTorch does not sort and therefore flattens this case in insertion order.

jtu.tree_flatten({1: "one", "2": "two"})
# ValueError: Comparator raised exception while sorting pytree dictionary keys.

tf.nest.flatten({1: "one", "2": "two"})
# TypeError: '<' not supported between instances of 'str' and 'int'

tpu.tree_flatten({1: "one", "2": "two"})[0]
# ["one", "two"]

Ordered Containers Are Not Just Dicts With Better Manners

OrderedDict has explicit order metadata, and JAX treats that metadata as part of the tree structure. PyTorch uses insertion order too. TensorFlow again uses sorted-key leaf assignment.

from collections import OrderedDict

tree = OrderedDict([("b", 2), ("a", 1)])

jtu.tree_flatten(tree)[0]
# [2, 1]

tpu.tree_flatten(tree)[0]
# [2, 1]

tf.nest.flatten(tree)
# [1, 2]

All three preserve the OrderedDict type when rebuilding, but TensorFlow assigns replacement leaves by sorted key:

tf.nest.pack_sequence_as(OrderedDict([("b", 2), ("a", 1)]), [10, 20])
# OrderedDict([("b", 20), ("a", 10)])

Multi-argument mapping reveals another difference. JAX rejects two OrderedDicts with the same keys but different order because the custom node metadata differs.

a = OrderedDict([("b", 2), ("a", 1)])
b = OrderedDict([("a", 10), ("b", 20)])

jtu.tree_map(lambda x, y: (x, y), a, b)
# ValueError: Mismatch custom node data: ('b', 'a') != ('a', 'b')

TensorFlow accepts this and pairs by key while preserving the first structure's order. PyTorch 2.12.0 also accepts it and returns the same visible result:

OrderedDict([("b", (2, 20)), ("a", (1, 10))])

`defaultdict`: Losing The Type Changes Behavior

defaultdict is not a decorative subclass. It carries a default_factory, which changes lookup behavior.

from collections import defaultdict

counter = defaultdict(int)
counter["missing"]
# 0

plain = {}
plain["missing"]
# KeyError

All three frameworks preserve the default_factory, but they disagree about leaf order just as with dictionaries.

tree = defaultdict(int, {"b": 2, "a": 1})

jtu.tree_flatten(tree)[0]
# [1, 2]

tpu.tree_flatten(tree)[0]
# [2, 1]

tf.nest.flatten(tree)
# [1, 2]

Rebuilding with [10, 20] gives:

# JAX
defaultdict(int, {"a": 10, "b": 20})

# PyTorch
defaultdict(int, {"b": 10, "a": 20})

# TensorFlow
defaultdict(int, {"b": 20, "a": 10})

This matters for any pure Python fallback. If it flattens a defaultdict as a mapping but reconstructs a plain dict, it is wrong, not merely imprecise.

Custom Containers: Either Register Them Or Treat Them As Leaves

JAX and PyTorch are conservative about arbitrary subclasses. TensorFlow is more eager to recurse into sequence and mapping subclasses.

class MyDict(dict):
    pass

class MyList(list):
    pass

class MyTuple(tuple):
    pass

JAX and PyTorch treat these as leaves unless explicitly registered:

jtu.tree_flatten(MyDict({"b": 2, "a": 1}))[0]
# [MyDict({"b": 2, "a": 1})]

tpu.tree_flatten(MyList([1, 2]))[0]
# [MyList([1, 2])]

TensorFlow traverses them:

tf.nest.flatten(MyDict({"b": 2, "a": 1}))
# [1, 2]

tf.nest.flatten(MyList([1, 2]))
# [1, 2]

tf.nest.flatten(MyTuple((1, 2)))
# [1, 2]

Namedtuple is the standard exception. All three frameworks recognize it as a structural container and preserve its type. They are also strict about namedtuple type matching: Point(1, 2) is not the same structure as (1, 2) or RGB(1, 2).

The Real Trap: `tree_map` Does Not Always Mean Same-Structure Map

JAX tree_map uses the first argument as the reference structure. Later arguments are flattened "up to" that structure. If the first tree has a leaf, the corresponding value in a later tree may be an entire subtree.

jtu.tree_map(lambda x, y: (x, y), [1, 2], [[3], {"x": 4}])
# [(1, [3]), (2, {"x": 4})]

The first tree says: "I am a list of two leaves." Therefore the second tree only needs to be a list of two objects. Those objects are passed whole to the function.

The scalar case is even clearer:

jtu.tree_map(lambda x, y: (x, y), 1, [2, 3])
# (1, [2, 3])

jtu.tree_map(lambda x, y: (x, y), [1, 2], 3)
# ValueError: Expected list, got 3.

PyTorch 2.12.0 behaves similarly:

tpu.tree_map(lambda x, y: (x, y), [1, 2], [[3], {"x": 4}])
# [(1, [3]), (2, {"x": 4})]

tpu.tree_map(lambda x, y: (x, y), 1, [2, 3])
# (1, [2, 3])

tpu.tree_map(lambda x, y: (x, y), [1, 2], 3)
# ValueError: Node type mismatch; expected <class 'list'>, but got <class 'int'>.

PyTorch 2.2.2 did not support this multi-pytree call through _pytree.tree_map. TensorFlow supports multiple structures, but it requires strict structural equality:

tf.nest.map_structure(lambda x, y: (x, y), [1, 2], [[3], {"x": 4}])
# ValueError: structures do not have the same nested structure

Transform APIs: PyTree Support Is Not Just Flattening

Tree semantics matter most when they meet transforms. Here the frameworks differ again.

JAX transformations are natively pytree-based. grad accepts nested inputs and returns gradients with the same structure:

import jax
import jax.numpy as jnp

def f(params):
    return params["x"] ** 2 + params["y"][0] ** 3

params = {"x": jnp.array(2.0), "y": [jnp.array(3.0)]}
jax.grad(f)(params)
# {"x": Array(4., dtype=float32), "y": [Array(27., dtype=float32)]}

JAX vmap accepts nested pytree inputs too:

def g(params):
    return params["x"] + params["y"][0]

batched = {"x": jnp.arange(3.0), "y": [jnp.arange(3.0) + 10]}
jax.vmap(g)(batched)
# Array([10., 12., 14.], dtype=float32)

Because None is a zero-leaf node in JAX, it can sit inside a vmapped input without becoming a batched argument:

def h(params):
    return params["x"]

jax.vmap(h)({"x": jnp.arange(3.0), "y": [None]})
# Array([0., 1., 2.], dtype=float32)

Classic PyTorch autograd is different. torch.autograd.grad expects tensors or gradient edges as inputs, not an arbitrary nested pytree:

x = torch.tensor(2.0, requires_grad=True)
y = torch.tensor(3.0, requires_grad=True)
loss = x ** 2 + y ** 3

torch.autograd.grad(loss, (x, y))
# (tensor(4.), tensor(27.))

nested = {"x": x, "y": [y]}
torch.autograd.grad(loss, nested)
# RuntimeError: all inputs have to be Tensors or GradientEdges, but got str

The newer torch.func stack does understand nested pytree-like parameter structures:

from torch.func import grad, vmap

def f(params):
    return params["x"] ** 2 + params["y"][0] ** 3

params = {"x": torch.tensor(2.0), "y": [torch.tensor(3.0)]}
grad(f)(params)
# {"x": tensor(4.), "y": [tensor(27.)]}

def g(params):
    return params["x"] + params["y"][0]

batched = {"x": torch.arange(3.0), "y": [torch.arange(3.0) + 10]}
vmap(g)(batched)
# tensor([10., 12., 14.])

TensorFlow's transform support follows tf.nest. GradientTape.gradient accepts nested sources and returns gradients in the same structure:

x = tf.Variable(2.0)
y = tf.Variable(3.0)
nested = {"x": x, "y": [y]}

with tf.GradientTape() as tape:
    loss = nested["x"] ** 2 + nested["y"][0] ** 3

tape.gradient(loss, nested)
# {"x": tf.Tensor(4.0), "y": [tf.Tensor(27.0)]}

tf.vectorized_map also accepts nested input structures:

def g(params):
    return params["x"] + params["y"][0]

batched = {"x": tf.range(3.0), "y": [tf.range(3.0) + 10]}
tf.vectorized_map(g, batched)
# tf.Tensor([10. 12. 14.], shape=(3,), dtype=float32)

tf.function accepts nested structures as ordinary function arguments:

@tf.function
def f(params):
    return params["x"] ** 2 + params["y"][0] ** 3

f({"x": tf.constant(2.0), "y": [tf.constant(3.0)]})
# tf.Tensor(31.0, shape=(), dtype=float32)

The right summary is more specific: JAX transforms are pytree-native; PyTorch classic autograd is not, while torch.func is; TensorFlow transform APIs accept nested structures.

Closing

PyTrees are a small abstraction with a long tail. Simple examples make every framework look compatible; real optimizer states, optional values, ordered mappings, custom containers, and transform APIs expose the differences quickly.

TensorCircuit-NG vs cuQuantum on H200: JIT compilation beats the "magic GPU library" assumption

Shixin Zhang — Sun, 07 Jun 2026 02:02:29 +0000

NVIDIA cuQuantum has a strong reputation as the natural high-performance baseline for GPU quantum simulation. That reputation is understandable: cuQuantum contains serious low-level GPU libraries such as cuStateVec and cuTensorNet and it is NVIDIA who creates GPU and CUDA!

But in an end-to-end differentiable VQE workload, the result is more nuanced. On our H200 GPU benchmark, TensorCircuit-NG was substantially faster after compilation, while also offering a much higher-level and user-friendly programming model.

The short version:

cuQuantum is a powerful low-level library.
It is not automatically the fastest route for practical quantum simulation tasks.
Direct cuQuantum code is significantly more verbose and engineering-heavy.
TensorCircuit-NG pays a JAX compilation cost, but repeated value-and-gradient evaluations quickly amortize that cost.
The final running time of TensorCircuit-NG is much shorter than NVIDIA cuquantum.

Benchmark setup

We used the workload as in the script for 1D TFIM VQE task:

Hardware and software:

GPU: NVIDIA H200
TensorCircuit-NG: 1.6.0
JAX: 0.7.2
cuQuantum Python: 26.3.2
CuPy: 14.1.1
PyTorch: 2.11.0+cu128

We measured one warmup/compile call and then the mean of five later value-and-gradient calls.

Implementations compared

We tested two TensorCircuit-NG modes:

TC-JAX scan: uses scan over VQE layers to reduce JAX compilation/staging time.
TC-JAX unrolled: builds all layers directly. This produces a larger traced program, but can be faster after compilation.

We also tested two direct cuQuantum routes:

cuStateVec adjoint: applies gates with cuStateVec and computes the full gradient with adjoint differentiation. This is not parameter shift so it is a fair comparison.
cuTensorNet full-state autograd: contracts the full state with cuTensorNet, then computes the TFIM state-vector expectation on GPU with PyTorch autograd.

The cuTensorNet path is intentionally not the obviously bad version where every Pauli term gets a separate tensor-network path search. We first tried that more "TN-native" observable-contraction style, but for this workload it spent too much time in repeated graph/path overhead. The final version is closer to the state-vector expectation workflow used by the TensorCircuit-NG and MindQuantum benchmark.

Repeated value-and-gradient runtime

The table below reports the post-warmup runtime. This is the relevant metric for VQE-style optimization, where the same circuit structure is evaluated many times.

backend	14 qubits	20 qubits	24 qubits
TC-JAX scan	0.01201s	0.01616s	0.06374s
TC-JAX unrolled	0.00995s	0.01381s	0.02547s
cuStateVec adjoint	0.08036s	0.12061s	0.30142s
cuTensorNet full-state autograd	1.35677s	2.04291s	2.30414s

In repeated value-and-gradient calls, TensorCircuit-NG is faster than cuStateVec:

qubits	TC-JAX scan vs cuStateVec	TC-JAX unrolled vs cuStateVec
14	6.69x	8.08x
20	7.46x	8.73x
24	4.73x	11.83x

The gap is much larger against the cuTensorNet route for this particular state-vector expectation plus autograd workflow:

qubits	TC-JAX scan vs cuTensorNet	TC-JAX unrolled vs cuTensorNet
14	112.97x	136.36x
20	126.42x	147.93x
24	36.15x	90.46x

These numbers are the main point: cuQuantum is not a magic speed button. A library being close to CUDA, or being written by a GPU vendor, does not automatically make it the fastest end-to-end implementation for a differentiable quantum algorithm.

First-call cost and amortization

cuQuantum has much lower first-call overhead. This is expected: TensorCircuit-NG uses JAX JIT compilation, and that first call can be expensive.

So if the task is a single one-off circuit evaluation, cuQuantum's low startup cost is attractive. But VQE is usually not a one-off workload. It repeatedly evaluates the same circuit structure for many optimizer steps and often across multiple random initializations. In that regime, TensorCircuit-NG's first-call cost is easily amortized, and the much faster post-compilation runtime becomes the dominant factor.

There is also a useful TensorCircuit-NG tradeoff:

Use scan mode when compilation time matters.
Use unrolled mode when the same circuit will be evaluated many times and peak post-compilation throughput matters.

At 24 qubits, unrolled TensorCircuit-NG is about 2.50x faster than scan mode after compilation, but the first call is about 9x heavier.

Programming model

Performance is only half of the story. The programming model matters.

In TensorCircuit-NG, the benchmark is expressed as circuit code:

c = tc.Circuit(n)
c.h(range(n))
for layer in range(depth):
    for i in range(n - 1):
        c.rzz(i, i + 1, theta=params[layer, 0, i])
    for i in range(n):
        c.rx(i, theta=params[layer, 1, i])

value_and_grad = tc.backend.jit(tc.backend.value_and_grad(energy_fn))

With direct cuQuantum, the user has to manually manage much lower-level details:

gate matrices and their dtype conventions
state-vector memory
cuStateVec binding signatures
tensor-network modes
PyTorch operands for autograd
GPU synchronization
version-specific API behavior

cuQuantum is valuable, but it is closer to a low-level engine than a high-level quantum algorithm framework. For a researcher, that difference is very real.

Takeaway

This benchmark does not prove that cuQuantum is slow for every task. What this benchmark does show is narrower and more practical:

For VQE workload, direct cuQuantum was not the fastest end-to-end route. TensorCircuit-NG provided a much simpler programming interface and substantially faster repeated value-and-gradient evaluations after JAX compilation.

The common assumption that "NVIDIA controls CUDA, therefore cuQuantum must be the fastest implementation" is too simplistic. Raw GPU kernels matter, but so do JIT compilation, autodiff integration, graph-level optimization, and the abstraction level exposed to users.

TensorCircuit-NG's advantage is that it lets users write concise quantum-program code while still compiling to high-performance backend-native tensor programs. For repeated VQE-style workloads, that combination can beat direct cuQuantum both in usability and in runtime.

Why JAX Is a Much Better Backend for Quantum Circuit Simulation Than PyTorch

Shixin Zhang — Sat, 06 Jun 2026 05:01:36 +0000

Modern quantum circuit simulation is not just “machine learning with complex tensors.” It involves irregular tensor contractions, sparse operators, statevector transformations, and automatic differentiation through all of them. This makes backend choice unusually important. A backend that is excellent for standard neural-network layers may still be a poor fit for general quantum simulation workloads.

We benchmarked this with a simple VQE workload for the 1D transverse-field Ising
model as in the script,

H = -sum_i Z_i Z_{i+1} - sum_i X_i,

using 20 qubits, 10 ansatz layers, complex64 precision, and one NVIDIA RTX 5090 GPU.

Results

Backend	Compile / Warmup	Value+Grad Runtime
TensorCircuit-NG, JAX backend	53.53 s	0.0265 s
TensorCircuit-NG, PyTorch backend	0.48 s	0.3299 s
TorchQuantum, optimized implementation than default	0.81 s	0.4172 s

The JAX backend is about 12.4x faster than TensorCircuit-NG’s PyTorch backend and about 15.7x faster than TorchQuantum for the post-compilation value-and-gradient step.

The compile time tells the other half of the story: JAX pays a much larger upfront XLA compilation cost. But after compilation, XLA produces a far more effective execution plan for this quantum simulation workload. This is exactly the tradeoff we want in VQE, QAOA, time evolution, and many other iterative algorithms: pay once, run many times.

Why This Happens

Quantum circuit simulation stresses a backend differently from ordinary deep learning. The workload mixes tensor-network contraction, sparse Hamiltonian application, and reverse-mode differentiation. JAX/XLA is designed to see the whole computation and optimize it aggressively as a compiled program on the target device.

PyTorch, in contrast, is strongest where the workload resembles standard neural network layers. For more general tensor programs, especially tensor-network-like simulation code, the compiler stack is less aggressive and less predictable.
In this benchmark, the same TensorCircuit-NG algorithm is more than an order of magnitude faster on JAX than on PyTorch after compilation.

A Note on TorchQuantum

We also compared against TorchQuantum as a representative PyTorch-native quantum circuit package. To make the comparison generous, we did not use its generic Pauli-string expectation path. That built-in route tends to materialize dense Pauli operators and is slow and not scalable. Instead, we implemented a TFIM-specific expectation directly extracted from state:

ZZ terms are evaluated from probabilities and precomputed sign tensors.
X terms are evaluated by flipping the state axis and taking an inner product.

This is already a substantial low-level optimization Even with that help, TorchQuantum remains slower than TensorCircuit-NG on the JAX backend by about 15.7x. And even if you prefer PyTorch backend, PyTorch backend from TensorCircuit-NG is still a better choice in terms of both warm-up and run times.

Takeaway

The lesson is not merely that one package is faster than another. The deeper point is that backend architecture matters. Quantum simulation benefits from a compiler that can optimize a whole differentiable tensor program, not just a collection of familiar machine-learning layers.

For TensorCircuit-NG, the JAX backend gives exactly that: a high-level quantum programming interface backed by XLA’s aggressive compilation. The result is a backend that is not only elegant for research code, but also dramatically faster for real differentiable quantum simulation workloads.

TensorCircuit-NG: How to Tell Whether a Quantum x AI x HPC Platform Is Truly Mature When Everyone Tells the Same Story

Shixin Zhang — Thu, 04 Jun 2026 08:28:33 +0000

In recent years, the convergence of quantum computing, artificial intelligence (AI), and high-performance computing (HPC) has become a central theme in the evolution of scientific computing infrastructure. From AI4Science and quantum machine learning to supercomputing centers and heterogeneous computing platforms, phrases such as "Quantum x AI x HPC", "integrated quantum-supercomputing-intelligence infrastructure", and "next-generation research infrastructure" now appear frequently in academic conferences, industry forums, and corporate presentations.

At the same time, a clear pattern has emerged:

The messaging is becoming increasingly similar, while the actual technical depth of different products varies dramatically.

Whether the subject is a quantum software platform, an AI4Science infrastructure stack, or a heterogeneous computing framework, many projects now describe themselves in similar terms:

integrating quantum computing with AI;
supporting heterogeneous computing resources;
serving as future research infrastructure;
enabling applications in materials science, chemistry, biomedicine, and other industries;
building an open ecosystem and developer community.

These directions are meaningful. In fact, they are becoming part of the field's shared consensus.

The real question is different:

When everyone is telling a similar story, how can we tell whether a platform has actually delivered technical substance, rather than remaining at the level of conceptual packaging and slideware?

For scientific infrastructure, it is more useful to ask five verifiable questions than to focus on slogans:

Is it open source?
Does it provide public benchmarks?
Is it used continuously by high-quality research communities?
Has it supported real industry-oriented application cases?
Does it continue to evolve through sustained version updates?

Any platform that claims to be "next-generation research infrastructure" should be able to answer these questions in a concrete way.

The development of TensorCircuit-NG offers a useful case study. Its value does not lie only in proposing a vision for "Quantum x AI x HPC"; it lies in a body of work that can be inspected, reproduced, cited, extended, and tested over time: open code, reproducible performance evaluations, a visible record of academic adoption, evidence of industry spillover, and six years of engineering iteration.

1. Is It Open Source?

For scientific software, open source means more than publishing code.

It means that:

the technology can be independently verified;
performance claims can be reproduced;
algorithms can be inspected;
users can deploy the software without relying on a closed service;
third-party researchers can repeat experiments under their own conditions.

Research communities do not lack polished presentations. What is much rarer is a technical system that can survive independent inspection.

TensorCircuit was not released as a one-off code dump. Its development forms a traceable engineering trajectory: from the original personal open-source version, to the version developed during the Tencent Quantum Lab period, and then to the currently maintained TensorCircuit-NG project. Across these stages, the core code, documentation, tests, and examples have remained open. The GitHub history preserves the development record, with more than 500 combined stars and forks, over 2,700 commits, more than 30 released versions, and contributions from over 30 developers around the world.

In terms of engineering scale, TensorCircuit-NG is no longer a short-term proof-of-concept project. It is a platform-level scientific computing system, with roughly 70,000 lines of code, type annotations, unit tests, continuous integration, documentation, and tutorials. The repository currently contains close to one thousand test functions. These tests are not merely a coverage metric; they are part of the engineering foundation that keeps APIs stable, backend behavior consistent, and long-term maintenance manageable.

The surrounding ecosystem matters as well. TensorCircuit-NG provides documentation, more than 30 tutorial examples, over 170 application examples, more than 10 benchmark suites, and a companion quantum computing tutorial. Together, these resources form a developer ecosystem that is learnable, reusable, and extensible. The platform also embraces AI-native workflows by providing AI skill packages for paper reproduction, code translation, and performance optimization. This means TensorCircuit-NG is not only designed for human developers; it is also adapting to a new mode of scientific software development in which AI agents participate directly in research workflows.

Another measurable signal of open-source adoption is installation and use. TensorCircuit-related packages include tensorcircuit, tensorcircuit-ng, and the nightly package tensorcircuit-nightly on PyPI, with cumulative pip install downloads exceeding one million. Download counts alone do not prove scientific value, but they do show that the platform exists in real development environments, not only in papers or promotional pages.

For research infrastructure, credibility comes from the ability of third-party users to run the code, inspect the implementation, reproduce experiments, and build their own workflows. Code is always more honest than marketing material.

2. Are There Public Benchmarks?

Every computing platform eventually has to answer a simple question:

Does it actually improve computational efficiency?

This is why public benchmarking is essential for judging platform maturity. In a high-performance setting such as "Quantum x AI x HPC", claims about acceleration, heterogeneous execution, or scalability are difficult to evaluate without reproducible benchmarks.

One of the earliest reasons TensorCircuit attracted attention was its benchmark system for differentiable quantum computing and tensor-network simulation. The first TensorCircuit white paper was published in Quantum: TensorCircuit: a Quantum Software Framework for the NISQ Era. The paper introduced the platform architecture, core functionality, and performance advantages, and compared TensorCircuit against several mainstream quantum software frameworks on variational quantum algorithms, gradient computation, and quantum circuit simulation.

The work made clear why unified tensor programming, automatic differentiation, and just-in-time compilation matter for quantum computing workflows. In several variational quantum algorithm and gradient computation tasks, TensorCircuit demonstrated significant performance advantages over representative frameworks such as IBM's Qiskit and PennyLane, with speedups reaching multiple orders of magnitude in some cases. More importantly, these results were not confined to figures in a paper: the code, experimental setup, and evaluation procedures were made reproducible. That is the difference between a verifiable technical path and an unverifiable performance claim.

With the release of TensorCircuit-NG, the benchmark scope has expanded toward problems closer to future research infrastructure:

GPU-accelerated computing;
optimized tensor-network contraction;
distributed HPC environments;
unified computation graphs spanning quantum circuits, neural networks, and tensor networks.

The NG white paper further summarizes TensorCircuit's upgrade toward the integration of quantum computing, supercomputing, and intelligent computing; see the preprint. The focus has shifted from "how to simulate quantum circuits faster on a single machine" to "how to organize quantum, AI, and numerical computing workflows in realistic heterogeneous research environments."

External evaluation provides another layer of evidence. NVIDIA used TensorCircuit as a third-party quantum software case in its cuQuantum 23.10 benchmarking context. This shows that TensorCircuit has entered the evaluation landscape of hardware and high-performance computing vendors. For scientific infrastructure, such external benchmarks complement open papers and are more persuasive than slide-based claims.

3. Is It Used by the Research Community?

For scientific infrastructure, the hardest signal to fake is not performance.

It is sustained use by serious research communities.

A platform can gain short-term attention through marketing, but it cannot gain long-term citations through marketing alone. Research adoption is a form of long-horizon voting. If a platform continues to support high-quality work across institutions, research areas, and teams, then it has demonstrated real utility.

More than 170 academic works have cited TensorCircuit, and in the first five months of 2026 alone, more than 40 works have already cited it. More importantly, these works are not concentrated in a single niche. They span quantum simulation, quantum machine learning, quantum chemistry, quantum sensing, quantum architecture search, and AI4Science.

Quantum Simulation and Many-Body Systems

In many-body quantum physics, condensed matter systems, and complex quantum dynamics, researchers often need large-scale quantum circuit simulation, tensor-network contraction, and differentiable optimization. These tasks place high demands on performance, numerical stability, and automatic differentiation.

Representative works include Zero and Finite Temperature Quantum Simulations Powered by Quantum Magic from teams including NVIDIA, Google, MIT, and Harvard; Exploring nontrivial topology at quantum criticality in a superconducting processor from Haohua Wang's group at Zhejiang University; and Variational LOCC-assisted quantum circuits for long-range entangled states from Xiongfeng Ma's group at Tsinghua University. These papers show that TensorCircuit is not limited to abstract algorithm demonstrations; it is being used in concrete problems in many-body physics and experimental quantum information.

Quantum Machine Learning

Quantum machine learning is one of the most active application areas for TensorCircuit. Representative papers include Understanding quantum machine learning also requires rethinking generalization from Jens Eisert's group at the Free University of Berlin, Dynamical transition in controllable quantum neural networks with large depth from teams including Liang Jiang and Junyu Liu, Generative Quantum Machine Learning via Denoising Diffusion Probabilistic Models from Quntao Zhuang's group at the University of Southern California, and IBM Quantum's Dynamic parameterized quantum circuits: expressive and barren-plateau free.

These works all require stable workflows connecting parameterized quantum circuits, gradient computation, model training, and numerical simulation. TensorCircuit's value is visible precisely at this workflow level: it connects quantum circuit simulation, automatic differentiation, and machine learning training into a unified programmable system.

Quantum Architecture Search and Algorithm Design

TensorCircuit has also been used in algorithmic and learning-theoretic research. Examples include Learning Quantum States and Unitaries of Bounded Gate Complexity from Caltech and Google, Quantum Machine Learning Architecture Search via Deep Reinforcement Learning from Brookhaven National Laboratory, and Distributed quantum architecture search from Luzhou Li's group at Sun Yat-sen University.

This class of work highlights the platform's infrastructure role. Researchers are not merely calling a fixed algorithm; they are building new search strategies, learning processes, and experimental protocols on top of TensorCircuit.

Quantum Chemistry and Fermionic Simulation

The quantum chemistry ecosystem around TenCirChem further extends TensorCircuit's application boundary. Quantum chemistry and fermionic simulation typically require complex Hamiltonian construction, differentiable optimization, tensor-network representations, and high-performance simulation. They therefore provide a demanding test case for any scientific computing platform.

Representative works include Efficient quantum simulation of electron-phonon systems by variational basis state encoder from teams at Tsinghua University and The Chinese University of Hong Kong, Shenzhen, as well as Fast Emulation of Fermionic Circuits with Matrix Product States from Garnet Chan's group at Caltech. These studies show that the TensorCircuit ecosystem has moved from general quantum circuit simulation into more specialized domains such as quantum chemistry.

Quantum Sensing and Imaging

TensorCircuit has also been used in quantum sensing, imaging, and experiment-facing tasks. Examples include End-to-end variational quantum sensing from Roger Melko's group at the Perimeter Institute, and Practical advantage of quantum machine learning in ghost imaging from Guihua Zeng's group at Shanghai Jiao Tong University. These works illustrate the platform's potential in quantum sensing and measurement-related applications.

The value of a research platform is not captured by a single paper. It is reflected in its ability to support many research directions over time. More than 170 citing works, users across high-level institutions, and multiple examples in leading journals and conferences form an evidence chain that is stronger than any single promotional claim.

4. Has It Supported Industry-Oriented Applications?

Academic citations show whether a platform can support research. Industry-oriented application cases show whether it can move toward real-world problems.

It is important to be precise here. Quantum computing is still exploratory in many industrial contexts, so the right question is not whether it has already replaced classical solutions at scale. The better question is whether researchers and engineering teams in different fields have used the platform to build prototypes, workflows, and validation pipelines for real problem domains. From this perspective, TensorCircuit's application spillover already reaches multiple sectors.

In agricultural diagnostics, researchers have used a quantum vision transformer for tomato leaf disease detection; see Enhancing Agricultural Diagnostics: Tomato Leaf Disease Detection Using Quantum Vision Transformer. In neuroscience and medical imaging, related works include Predicting Brain Age and Gender from Brain Volume Data Using Variational Quantum Circuits and Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification. In drug discovery, A hybrid quantum computing pipeline for real world drug discovery explores a hybrid quantum computing workflow for real drug discovery problems.

TensorCircuit-NG has also appeared in security, communications, optimization, and computing systems. In software security, researchers have proposed lightweight quantum convolutional neural networks for malicious code detection. In drone and radar applications, hybrid quantum neural networks have been explored for radar return signal processing. In edge computing, quantum reinforcement learning has been used for joint resource allocation and task offloading. In finance, improved QAOA methods based on conditional value-at-risk have been studied for portfolio optimization. The significance of these cases is that they move quantum software frameworks from "quantum algorithm papers" into concrete domains such as agriculture, medicine, security, communications, finance, and drug discovery.

External recognition provides additional context for the ecosystem. TensorCircuit has appeared in PhotonBox's 2022 list of influential quantum industry events in China, was listed as a recommended quantum software project in Google Summer of Code 2023, was used by NVIDIA in cuQuantum evaluation materials, was invited to participate in UnitaryHack 2024, and participated in Open Source Promotion Plan 2025. These forms of recognition do not replace technical validation, but they do show that TensorCircuit is not an isolated lab project. It has entered the public view of the open-source quantum software and high-performance computing ecosystems.

Industrial maturity does not happen overnight. It typically moves from research prototypes, to open tools, to cross-domain collaboration, to engineering validation, and eventually to deployment. TensorCircuit-NG's current value lies in providing a reusable low-level toolchain for that process.

5. Does It Continue to Evolve?

One defining feature of scientific infrastructure is that it is never finished.

New hardware appears. New algorithms appear. New scientific demands appear. This makes sustained iteration more important than a single innovation.

TensorCircuit's history is a good example. The project was first released in April 2020. From 2020 to 2021, TensorCircuit completed its core architecture, automatic differentiation mechanism, and early quantum algorithm modules, establishing the academic foundation for a unified tensor-computing framework. From 2021 to 2024, under the Apache License 2.0, the project continued to evolve in engineering: performance optimization, interface standardization, multi-backend support, and community ecosystem development gradually turned it into an open-source platform for global research users and developers.

Since the launch of TensorCircuit-NG, or "Next Generation", in 2024, the project has moved beyond a quantum computing software framework toward a broader next-generation research infrastructure. It explores deeper integration among quantum computing, supercomputing, and intelligent computing, while continuing to expand its ecosystem in AI4Science and related areas.

Sustained iteration is also visible in upstream and downstream ecosystem contributions. Upstream, core developers have contributed to standard machine learning frameworks such as TensorFlow, including work related to the automatic differentiation formula for complex-valued singular value decomposition and fixes to vectorized matrix multiplication. In the tensor-network ecosystem, TensorNetwork-NG continues to maintain the original Google TensorNetwork framework and keep it usable. Downstream, TenCirChem extends TensorCircuit capabilities into quantum computational chemistry workflows.

These upstream and downstream contributions show that TensorCircuit-NG does not confine itself to a single framework. Instead, it builds connections among machine learning, tensor networks, quantum chemistry, and high-performance computing. This matters for Quantum x AI x HPC integration, because future research infrastructure cannot serve only one model family, one hardware type, or one class of algorithms.

In the TC-NG architecture:

quantum circuits;
neural networks;
tensor networks;

are brought into a unified computation-graph system.

At the same time:

CPUs;
GPUs;
HPC clusters;
QPUs;

are becoming part of a unified resource pool.

This marks a shift in platform positioning: from a quantum software framework to infrastructure for future scientific computing. Compared with projects that remain at the stage of concept demonstrations, short-term packaging, or slide-based roadmaps, more than six years of open-source development, continuous iteration, and repeated research-community validation say much more about a platform's real engineering capacity and long-term value.

Conclusion: What Builds Trust in Scientific Infrastructure?

In the rapid development of Quantum x AI x HPC, industry narratives are converging.

More and more platforms now talk about:

AI4Science;
hybrid quantum-classical computing;
scientific research infrastructure.

These directions are worth pursuing. But for users, researchers, and industry partners, the core criteria have not changed:

Is the platform fully open source?

Does it provide public benchmarks?

Is it broadly and continuously used by high-quality research communities?

Has it supported cross-industry application cases?

Does it continue to evolve through sustained version updates?

Once these questions are answered one by one, the value of a platform does not need to depend on slogans or conceptual messaging. For scientific infrastructure, long-term trust is built on verifiable code, reproducible experiments, growing academic adoption, application spillover into real problems, and engineering iteration that stands the test of time. In an era where technical narratives increasingly sound alike, these qualities are especially valuable.

TensorCircuit-NG: Quantum Software On AI, For AI, With AI

Shixin Zhang — Wed, 27 May 2026 11:02:16 +0000

Quantum computing and artificial intelligence are often discussed as two separate frontiers. One is about exploiting quantum mechanics for computation; the other is about building increasingly capable learning systems and agents. The core argument behind TensorCircuit-NG is that this separation is becoming less and less meaningful. If modern AI infrastructure has already solved core problems around automatic differentiation, compilation, accelerator execution, batching, and distributed training, then quantum software should stop reinventing those layers badly and start standing on top of them directly.

This is the central idea behind TensorCircuit-NG. The project is a quantum software stack built in the age of AI, aimed at AI-facing workloads, and increasingly shaped for collaboration with AI agents. Its vision is simple: quantum software on AI, for AI, with AI.

On AI: quantum software should inherit the AI stack

Quantum software has long been held back by two familiar problems. Too much of the workload remains trapped in Python-level control flow or in classical state-vector simulation patterns that scale poorly. At the same time, many quantum libraries sit outside the deep learning ecosystems where most of the tooling innovation has happened. JAX, PyTorch, and TensorFlow already have mature answers to questions like compilation, vectorization, accelerator placement, and distributed execution, yet quantum software has often kept those capabilities at the edge of the stack.

TensorCircuit-NG takes a different route. The framework treats quantum circuits as specialized tensor operations. That design choice opens up a large part of the AI toolchain almost “for free.” Automatic differentiation maps naturally onto variational quantum algorithms. Just-in-time compilation matters for repeated circuit evaluation. Vectorized mapping matters for batching over parameters, measurements, trajectories, or datasets. Accelerator support, mixed precision, and distributed execution are part of the design from the beginning.

That philosophy shows up in the architecture. TensorCircuit-NG is built around a tensor-first worldview: every object is either a tensor or a network of tensors. Once that is the primitive, different computational models become easier to compose inside one workflow. Gate-based circuits, tensor networks, neural models, noisy simulators, analog evolution, approximate methods, and symbolic representations can live inside one coherent environment.

The performance story follows directly from this design. TensorCircuit-NG supports both data parallelism and model parallelism across multiple devices and multiple hosts. In practice that means distribution over inputs, measurements, or noisy trajectories when the workload is embarrassingly parallel, and distribution over tensor-network slices when the contraction itself needs to be split across hardware. Benchmarks on both single-GPU and multi-GPU systems show that high-level Python APIs can still deliver high performance when the compilation and tensor-network substrate are done well.In representative workloads, that performance has reached speedups of several orders of magnitude over mainstream stacks such as IBM's Qiskit and Google's TensorFlow Quantum.

TensorCircuit-NG acts as a bridge among quantum computing, high-performance computing, and intelligent computing. It also serves as an interface layer where quantum models can coexist with the rest of modern computational science. Researchers who want to embed quantum layers inside larger machine learning systems should be able to do so inside the same workflow, without crossing ecosystem boundaries every time the problem gets interesting.

For AI: a platform for fast quantum machine learning

This is where the infrastructure becomes immediately useful. Quantum machine learning sits right at the intersection of circuit design, optimization, data pipelines, and repeated simulation. It is a workload that punishes slow software. If researchers want to try new ansatzes, change encodings, run ablations, train over many seeds, or sweep hyperparameters, then fast prototyping and efficient simulation matter more than slogans about QML.

TensorCircuit-NG provides a strong platform for exactly this kind of work. Differentiable circuits, JIT compilation, batching, accelerator support, and distributed execution all live inside one environment. That makes it much easier to move from an idea for a QML model to a runnable prototype, and from a prototype to a meaningful simulation campaign.

The scientific motivation for QML also becomes clearer in this setting. Attention shifts away from isolated benchmark wins and toward how quantum models behave on problems that already hurt classical AI. In our own work, this has already led to two systematic studies: one on bad data, and one on changing data.

The first studies robustness. When labels are noisy, data is poisoned, or part of the training set later needs to be removed, quantum models may show a more favorable degradation profile and may be easier to unlearn. The second studies plasticity. In continual-learning settings, quantum models may preserve the ability to absorb new tasks for longer instead of becoming rigid.

These are still open research questions. For a software project, though, the main point is straightforward: if people want to explore QML seriously, they need a platform that makes rapid iteration cheap. TensorCircuit-NG is meant to be that platform. It gives researchers a practical environment for fast QML prototyping, efficient simulation, and large-scale testing of ideas about robustness, unlearning, and adaptation.

With AI: a platform for agent-driven research

The same logic carries over to AI agents. Once a scientific software stack is fast, structured, and composable, it becomes a natural substrate for agent-driven development. Agents are useful only when they can read real code, run real tools, inspect results, and keep iterating inside a live repository. That makes software design itself part of the agent story.

TensorCircuit-NG is built with that use case in mind. The APIs are relatively concise, the examples and tests provide dense reference material, and the repository includes explicit rules and task-specific workflows for AI assistants. This lowers the cost of turning natural-language intent into runnable code, benchmarks, figures, and documentation.

The project also ships built-in skills that push this further:

arxiv-reproduce, which turns a paper identifier into a reproduction workflow;
performance-optimize, which injects optimization patterns such as scan, jit, vmap, and contraction tuning;
tc-rosetta, which translates code from other quantum frameworks with attention to intent rather than syntax alone;
tutorial-crafter, which converts programs into polished narrative tutorials.
and many more.

Taken together, these tools make the framework a software platform where researchers can move from idea to prototype, from prototype to benchmark, and from benchmark to documentation with much less friction. That is the practical meaning of “with AI” here: TensorCircuit-NG is designed to work well with agents as a real development interface, not just as a chatbot wrapped around the codebase.

The deeper claim

Taken together, these ideas add up to a stack-level thesis about the future of computational research.

First, quantum software should no longer be architected as an isolated niche. It should inherit the best ideas from the AI and HPC worlds and expose them through abstractions that remain mathematically faithful to quantum workloads.

Second, that same software stack should provide a strong platform for fast QML prototyping and efficient simulation, so ideas about robustness, unlearning, and continual adaptation can be tested quickly at scale.

Third, the arrival of capable software agents changes the design target for scientific frameworks. A good framework now has to work well for skilled humans and also be understandable, navigable, and productively extensible for agents operating over the entire repository and toolchain.

This is how TensorCircuit-NG understands itself: quantum software on AI, for AI, and with AI. It is built on the modern AI execution model, aimed at AI-relevant scientific questions, and increasingly shaped to participate in agent-mediated research workflows.

Getting started

pip install tensorcircuit-ng

An agent-first workflow also works well: ask your coding agent to install tensorcircuit-ng and start building a small quantum application from natural-language instructions.

Next-Generation Software: From by-AI to within-AI

Shixin Zhang — Thu, 21 May 2026 09:11:58 +0000

Why "using AI to build more apps, faster" may just be putting an engine on a horse carriage

Over the past year, most discussions about AI and software have stayed within a very intuitive picture: a human describes a requirement, an AI agent writes the code, and after the code is written the software is still traditional software, only produced faster. Vibe coding, in essence, lowers the marginal cost of software production. What used to require engineers to implement line by line becomes an iterative process of generation, debugging, and refactoring driven by natural language.

This view is correct, but only half-correct.

The bigger change is not whether AI can generate software. The bigger change is whether the word "software" itself is about to change. Using AI to generate yet another standalone app often feels like putting an engine on a horse carriage: the power system has changed, but the form factor still belongs to the previous era. You still have a frontend, backend, account system, deployment, database, permissions, logs, subscriptions, settings pages. These things can now be generated faster, but they are still the same old shape.

But if the engine is already powerful enough, perhaps we should stop optimizing the carriage. The real next step is this: software is no longer merely written by AI agents. It starts to exist within AI agents.

Next-generation software does not necessarily have to be a complete app, a website, a SaaS product, a desktop client, or even a CLI with a fixed entry point. It may be a collection of prompts, skills, scripts, schemas, local files, cache conventions, tool permissions, and agent-facing instructions. The real runtime is not a specialized intelligent application. It is a general agent such as Codex or Claude Code. In other words, the harness is becoming the software.

Everyday ArXiv is a concrete example of this insight.

Research Software Without A Traditional Form

Everyday ArXiv is a daily intelligent arXiv processing assistant. The traditional way to build it would be straightforward: write a backend service, connect to the arXiv API, implement a recommendation algorithm, add a user system, build a web interface or email push system, and embed LLM calls at specific nodes such as summarization, scoring, recommendation explanations, and email drafts. In the end, it would become a specialized agent or SaaS product for researchers.

There is nothing wrong with this path. Many products will continue to be built this way. The problem is that, in this specific setting, the most valuable part is not the UI, not the database, and not a fixed pipeline. The most valuable part is the judgment made during each reading session.

Why is this paper worth reading today? Which of the user's previous papers is it actually connected to? Is the overlap merely keyword-level, or is there a real methodological connection? Is a proposed idea once again falling into the mediocre pattern of "add noise, change the model, run larger numerics"? If a new paper does not cite the user's work, is it an obvious omission, a weak connection, or only conceptually adjacent?

These questions are hard to compress into fixed software features. They look more like the judgment process of a research assistant. So the architecture of this project is inverted. Python code handles only deterministic tasks: fetching arXiv metadata, parsing Google Scholar, writing stable JSON caches, loading configuration, and maintaining local file boundaries. Judgment-heavy tasks are not hardcoded inside the application. They are delegated to a general agent. The repository provides the agent with a workspace, skills, profiles, prompts, scripts, and privacy rules.

In other words, this is not "software with LLM features." It is software that an LLM agent can directly run.

Why This Is Not "Everyone Writes A Custom App"

A common prediction is that AI lowers the cost of software production, so everyone will write many small custom apps for themselves. I think this is only half right.

What may actually happen is not that everyone has a pile of custom apps, but that many custom apps never exist in app form at all.

Once general agents are powerful enough, many "software" systems do not need to be compiled into standalone products. They can remain open-form: a few Skills, a few scripts, a directory convention, a profile, and some examples. Their functionality unfolds at runtime through the agent. They have no fixed buttons, but clear protocols; no complete backend, but stable tools; no page, but Markdown or HTML reports; no embedded intelligence module, but access to the general intelligence of the agent.

This goes beyond vibe coding. Vibe coding still assumes the goal is to generate a software product. Agent-native software tries to avoid prematurely generating software in the old form. It asks: does this thing really need to be productized, or does it only need to be agentized?

The Structural Tax Of Vibe Coding

The appeal of vibe coding is that, for the first time, building software feels cheap. You can ask an agent to generate a full-stack repo with React pages, API routes, database schemas, Dockerfiles, auth, deployment notes, and a README.

But this reveals another problem: the faster AI generates software, the more visible the structural tax of the old software form becomes.

Standalone apps carry many default taxes. There is a UI tax, because every capability must be turned into buttons, forms, and pages. There is a deployment tax, because every capability needs its own runtime environment. There is an integration tax, because every new app has to reconnect to data sources, permissions, and user state. There is a maintenance tax, because dependencies drift, frameworks upgrade, and deployments break. There is also a product-shape tax, because many open-ended judgment processes must be compressed into fixed features, losing flexibility and customization.

When the task is essentially "run a high-judgment workflow in a specific context," these taxes become heavy.

If Everyday ArXiv were built as traditional software, it would be forced to invent many things that are not its core value: recommendation pages, profile editors, PDF readers, email draft editors, background jobs, account systems, synchronization state. Of course these can be built. But they are not the core of "read arXiv and make research judgments." The core is to put the user's profile, today's papers, paper full texts, historical preferences, and research taste into the same reasoning loop.

If a general agent can already read files, run commands, call tools, edit Markdown, maintain local state, and follow project rules, many standalone app shells start to look unnecessary.

This is why "AI helps me write an app faster" may only be a transitional form. It optimizes the speed of software production, not the shape of software itself.

The Compilation Target Of Software Has Changed

Traditional software compiles to machines: CPUs, browsers, mobile devices, cloud services. Even SaaS ultimately compiles into deterministic behavior on some fixed runtime.

Agent-native software does not compile only to machines. It compiles to agents.

This sounds strange, but it is the key point. A Skill is not merely documentation. It is closer to a runtime definition: when an agent encounters a certain kind of task, which files should it read, which scripts should it call, which boundaries should it respect, how should it handle failure, when should it stop, what output format should it use, which judgments must not be hardcoded, and which data must not be committed to Git.

In Everyday ArXiv, the Python package under src/ is the deterministic kernel. .agents/skills/arxiv-daily/SKILL.md is the workflow definition. user_profile/ is user-space memory. agents.md is the runtime specification. data/raw/arxiv and data/reports are the persistence layer. Codex or Claude Code is the execution environment and runtime.

From the perspective of traditional software, src/ is the software, and everything else is documentation or data.

In agent-native software, this boundary is inverted. src/ is only the tool layer. The real software behavior emerges from the tool layer, Skill instructions, user profiles, cache formats, report conventions, privacy boundaries, and the general reasoning ability of the agent.

This is the architectural inversion: infrastructure moves downward from each standalone application into the agent platform. The application itself becomes a lightweight, injectable, modifiable, and portable capability layer.

The shift can be summarized as follows:

Dimension	Old Paradigm: Software by AI	New Paradigm: Software within AI
Architecture	AI generates a custom full-stack repo: frontend framework, backend API, database, and hosting layer included.	Lightweight skills, structured manifests, execution scripts, and local directory conventions are injected into the agent as capabilities.
Infrastructure	Each app has its own runtime, database, DevOps pipeline, permissions, and deployment environment.	The app reuses the native environment of the host agent platform: filesystem, command line, browser, sandbox, tool calling, and context window.
Cost Model	A standalone AI app must maintain a SaaS shell and pay the marginal API cost of each model call. Heavy usage quickly becomes expensive.	An agent-native workflow lives inside general-agent subscriptions such as Claude Code or Codex, letting users share the subscription economics of model providers.
Flexibility	Features are hardcoded into UI, backend, and schemas. New capabilities require code changes, redeployment, and redesigned entry points.	The agent dynamically interprets Skills, reads and writes files, and calls scripts based on runtime intent, adapting to edge cases without rebuilding the product shape.

A Skill Is Not A Plugin. It Is A New Software Unit.

We are used to thinking of plugins as accessories to a host program. Browser extensions depend on browsers. Editor extensions depend on editors.

But Skills inside agents are closer to a new unit of software.

They contain at least four layers.

The first layer is deterministic tools. These are ordinary scripts, CLIs, parsers, fetchers, and formatters. They handle the parts that should not be left to an LLM's improvisation.

The second layer is semantic policy. These are the instructions: what counts as a good recommendation, what counts as a mediocre idea, when to run a citation check, when not to pad the list to ten papers, and when to write only to local private files.

The third layer is private state. User profiles, historical papers, negative preferences, idea logs, and local config are not merely "database records" in the traditional sense. They are personal context that the agent can read, interpret, update, and audit at runtime.

The fourth layer is the execution substrate. This is what the agent platform provides: filesystem access, command execution, browsing, code understanding, long context, multi-tool coordination, and natural language interaction.

A traditional app often packages all four layers into its own code and services. Agent-native software separates them: stable parts become scripts; judgment-heavy parts become Skills; personal parts remain in local files; execution is reused from a general agent.

So it behaves more like a dynamically loaded driver than a complete machine. It does not need to spin up a company-sized software shell every time. It only needs to inject capability into an existing agent runtime.

The Cost Advantage Is Not Just Form. It Is A Pricing-Layer Mismatch.

Another key point is the cost difference between API calls and subscriptions. If you build a specialized agent yourself, every intelligent step calls a model API. If you use a subscription-based general agent such as Codex or Claude Code, many of these steps are absorbed by the platform. This difference is not merely "a little cheaper." It can be an order-of-magnitude architectural difference.

Take Karpathy's LLM Wiki / agentic wiki idea as an example. At its core, it is a lightweight set of directory conventions, Markdown files, schemas, and agent instructions. Of course you can productize it: turn it into a standalone knowledge-base and note-taking app, add login, upload, search, sync, team workspaces, RAG pipelines, a polished UI, and then connect it to frontier-model APIs. At that point it becomes a standard AI SaaS product: every ingest, query, rewrite, and cross-reference burns your API bill.

But the same idea does not have to be productized. You can put raw sources, wiki pages, and instructions in a local repo and let a general agent such as Claude Code maintain it directly. For the user, this is not "opening a new SaaS product." It is "running a workflow inside an agent subscription I already have." The workflow is lightweight enough that its main cost moves from API metering into the agent subscription.

The gap can be huge. For heavy users, the equivalent API cost can easily be more than ten times higher than the subscription cost. Put differently: for the same frontier model capability, a standalone AI app must pay by the API meter, while an agent-native workflow living inside Claude Code or Codex may have its entire user-facing cost absorbed by the platform subscription, because the user already needs an AI subscription plan anyway.

If frontier model providers can sustain this pricing structure, the consequences will be severe. General agent tools such as Codex will devour a large fraction of so-called intelligent software that merely connects large-model APIs to old software shells. Those products carry two layers of cost: the product-shape cost of traditional SaaS, and the usage-based API cost of frontier models. Agent-native workflows reuse a runtime that the agent platform has already subsidized, deployed, and sold to the user.

So the cost advantage comes from two directions.

First, the form is lighter. Standalone software is expensive not only because it runs servers, but because it must maintain a fixed product shape. That shape forces you to predefine user paths, feature boundaries, error handling, state synchronization, permission models, UI copy, and upgrade mechanisms. For high-frequency standardized tasks, this is worth it. For personalized, low-frequency, high-judgment tasks, it becomes a burden.

Second, the billing layer is lower. API-wrapper software turns every intelligent action into its own marginal cost. Agent-native software tries to place intelligent action inside the general-agent runtime that the user already owns. The former is like putting an engine inside every small tool. The latter is like loading different tools onto a unified power system.

The filesystem becomes state. Markdown becomes interface. JSONL becomes database. Skills become product logic. Python CLIs become reproducible tools. The agent becomes the interaction layer, reasoning layer, and glue code. The user does not need a complete app. The user needs a workspace that an agent can understand and operate.

This does not mean engineering quality becomes unimportant. The opposite is true: engineering boundaries become more important. Deterministic tasks must live in code. Privacy boundaries must be protected by .gitignore and file naming rules. Cache formats must be stable. Profile updates must be traceable. Reports must be reviewable. We simply no longer have to assume that all software value must be packaged into a fixed UI.

LLM OS Is Not A Metaphor

This also explains why software within AI agents resonates with the idea of an LLM OS.

If we think of the LLM as an operating system, the model itself is not the whole system. A real OS includes filesystems, permissions, processes, tool calls, environment variables, package management, history, working directories, user preferences, executable scripts, and application protocols. Agent platforms are reorganizing these pieces.

From this perspective, a Skill is like an application. A prompt is like configuration and entry point. A script is like the executable behind a system call. user_profile is like user-space data. agents.md is like a software manual, permission model, and runtime specification. Cache directories are persistence. The agent is a mixture of shell, window manager, workflow engine, and interpreter.

Traditional software runs on top of operating systems. Next-generation lightweight software runs inside the LLM OS.

This does not mean all software disappears. High-frequency, multi-user, strongly consistent, permission-heavy, transaction-heavy systems will still need traditional software forms. Banking systems, collaborative editors, production databases, payment platforms, and medical systems cannot rely solely on an agent runtime.

But a large amount of personalized, low-frequency, high-judgment software will be rewritten.

Research reading assistants, personal knowledge systems, paper response tools, code review workflows, experiment records, document drafting, data cleaning, chart generation, long-term research projects, and idea management have historically been hard to turn into good software. Not because the need does not exist, but because every person's need is too specific, the market is too small, the shape is too fragmented, and fixed products quickly stop fitting.

Agents change that economics.

This Example Generalizes Far Beyond

Everyday ArXiv is just one example. The structure behind it generalizes to many scenarios that used to require being "turned into software."

The first category is knowledge workflows. Today it is arXiv. Tomorrow it could be a paper library, technical blog library, investment research library, legal document library, or internal decision memo system. The traditional approach is to build a standalone application: dashboard, search box, favorites, summaries, recommendations, and RAG. The agent-native approach is looser: raw materials are files, indexes are scripts, workflows are Skills, user preferences are profiles, reports are Markdown. It is less like a product and more like a work environment that an agent can unfold at runtime.

The second category is scientific computing and experiment management. A research project may need to manage models, parameters, run scripts, remote machines, result directories, logs, figures, and conclusions. Of course you can write an independent CLI with commands such as submit, status, plot, and report. This is still valuable, because deterministic low-level tasks need stable tools. But if the entire experiment-management process is compressed into a CLI, you lose a great deal of contextual judgment: when to rerun, which parameter combinations are worth extending, which anomaly may be a bug, which figure should enter the paper, and which result is already sufficient to stop.

The more natural architecture is: keep low-level scripts deterministic, and use a set of Skills to specify how the agent should read experiment directories, submit jobs, record provenance, generate reports, and avoid overwriting results. The experiment system is not a closed tool. It is an agent-operable research workspace. Its flexibility is often stronger than that of an independent CLI, and its results are often better, because scientific experimentation is not a fixed sequence of commands. It is a process of continuous judgment, adjustment, and interpretation.

The third category is existing Python software frameworks. In the past we would ask: should we wrap it in a GUI? Should we build a web app where users can select parameters, drag modules, and display results? But for many scientific computing, machine learning, and quantum simulation frameworks, the better interface may not be a GUI. It may be an agent.

The framework itself provides strict APIs, types, tests, documentation, and examples. Agent-native adaptation lets the agent read the documentation, compose algorithms, write scripts, run demos, explain results, and generate figures directly. The user no longer has to learn every API before starting to explore. The user describes the goal in natural language, and the agent compiles that goal into framework code. This is not wrapping an old framework in a shell. It is connecting the framework to a natural-language programmable operating layer. TensorCircuit-NG represents this agent-native direction: the point is not to build another polished GUI, or a CLI that restricts functionality, but to make the framework itself a computational substrate that agents can understand, invoke, and extend.

These examples point to the same conclusion: next-generation software does not necessarily turn every tool into a standalone product. It lets tools enter the fluid environment of agents. This form has one enormous advantage: fluidity.

Traditional software is hard. It must be installed, deployed, upgraded, compiled, and released. Its features solidify into buttons and pages. If users want to change it, they usually have to file an issue, wait for developers, fork the repo, or edit code.

Agent-native software is soft. It can be copied as a directory, changed into another set of Skills, locally rewritten by users through natural language, and migrated across agent platforms. It does not necessarily need compilation, a fixed UI, or versioned releases. Often, the software is simply a set of readable, editable, executable conventions.

If the user really needs an interface, the agent can generate an HTML page on demand. Today it can be a minimal table. Tomorrow it can be a flashy dashboard. The day after tomorrow it can be a paper-style report page. The interface becomes a runtime artifact, not the fixed shell of the software.

This may be the most counterintuitive part: software in the AI era may not increasingly look like "smarter apps." A lot of software may become less app-like, more like an amorphous fluid that agents can read, modify, compose, and temporarily materialize.

This fluid form does not depend on a fixed UI. It can be executed by Codex, by Claude Code, or by future agents. As long as the agent is strong enough to read files, run commands, follow Skills, and maintain boundaries, it can run the software.

Software portability changes accordingly. In the past, migrating software meant migrating applications and data. Now it means migrating workspace conventions. What you take with you is docs, skills, scripts, templates, profile schemas, and examples. More concretely: a folder. The execution runtime can change; the software remains.

Design Principles From This Project

From Everyday ArXiv, we can extract several design principles.

First, deterministic work belongs in code. Fetching, parsing, caching, schemas, configuration, paths, and format checks should be ordinary software engineering. Do not let an LLM "remember" where today's cache should go. Do not let it invent data structures at runtime every time.

Second, judgment belongs in the agent. Recommendation, selection, close reading, research ideas, citation risk, and email tone are exactly where general agents are strong. Hardcoding them into fixed API pipelines sacrifices flexibility.

Third, user profiles should be local files, not abstract preference buttons. Research interests, negative preferences, prior papers, and citation anchors are detailed and personal. The agent should be able to read, cite, update, and audit them directly.

Fourth, Skills are the product core. They are not documentation attached to the product. They are the main execution logic of agent-native software. Traditional software has its core in code paths; agent-native software often has its core entry point in Skills.

Fifth, the open-source boundary must be designed upfront. The public repository should store the general protocol. Private files should store the user. This allows the software to be reusable without leaking personal knowledge and workflows.

Together, these principles define a new software form: not an application wrapped around LLM APIs, but a workspace growing around general agent platforms.

Closing

AI agents first looked like programmers who could write code faster. Then they looked like assistants that could operate tools. Next, they may look more like general intelligent runtimes and operating systems.

If this is true, part of next-generation software will no longer be understood as "applications." It will be agent-readable directories, prompts, Skills, scripts, profiles, and caches. It will have no fixed shape, but still run reliably. It will have no complete UI, but still complete complex work. It will not be generated once by AI and then left alone; it will continuously live inside AI agents.

Everyday ArXiv is a small research tool, but it shows the early form of this direction: the intelligent part of software does not necessarily need to be packaged into a specialized agent. When general agents become strong enough, software can write itself as a harness for agents. I would even make a stronger claim: very few specialized agents will remain useful. Most will be swallowed by general agents, just as Sutton's bitter lesson would suggest.

This may be the shift from software generated by AI to software existing within AI.

Agentic R&D Insights

Shixin Zhang — Thu, 09 Apr 2026 05:15:02 +0000

This year, I dove headfirst into Agentic Coding and automated workflows, integrating them intensely into my daily development and research. The general consensus is that AI crossed a critical threshold late last year, and my hands-on experience confirms it. I’ve barely written any code manually this year, and the output from AI agents has been staggering.

To give you an idea of the scale: my tensorcircuit-ng (TC) repository saw a net increase of over 20,000 lines of python code. It took me barely two days to organically integrate and rewrite QuEra's newly released tsim into the TC framework. On the research front, I built paper-reproduction infrastructure within TC, allowing me to reproduce highly complex, representative quantum physics papers in mere minutes—I’ve knocked out over a dozen so far. Once, I spent less than a day running an end-to-end automated pipeline that handled a referee report: supplementing experiments, plotting graphs, writing the reply, and revising the manuscript. Algorithmically, I used the TC paradigm to auto-generate high-quality DMRG code in minutes; it natively supports GPUs and its CPU efficiency beats mature frameworks like quimb. Throw in fully automated translations of the TC documentation and auto-filling grant proposal templates, and the efficiency multiplier is absolutely an order of magnitude or more.

But looking at this massive output, an inevitable question arises: In an era where everyone has access to the exact same cognitive baseline—models like Claude 4.6 or GPT 5.4—what actually dictates the ceiling of our productivity? Why aren't we seeing a 100x boost across the board?

After high-intensity practice, I realized the answer isn't "better prompt engineering." It's hidden in the architecture of your workflow. The real differentiator is how you leverage personal data and experience to build a resilient system across the "Frontend, Middle, and Backend" of your pipeline. Interestingly, while building this system, you inadvertently design the exact countermeasures needed to mitigate the three fatal character flaws highly intelligent LLMs exhibit: Laziness, Impatience, and Deception.

The Frontend: Personal Context as the Ultimate Moat

The core insight for the frontend is simple: personal context and workflow paradigms are your ultimate moats in the Agent Era. The coding world is a perfect playground for AI not just because code is easily verifiable, but because its physical logic is self-consistent and its context is completely intact—there is no context fragmentation.

In general problem-solving, our thoughts are scattered across our brains, chat logs, loose docs, and random materials. Without centralized, normalized context, an AI agent will always struggle. In my practice, context consists of a static component and a dynamic one.

The static "Wiki" is the cognitive bedrock for the LLM. The tensorcircuit-ng monorepo itself acts as a hyper-powerful context infrastructure. It doesn’t just hold framework code; it aggregates nearly 200 specific quantum use cases, physical logic constraints, and historical experiment logs. When the LLM hooks into this, it isn't facing a sterile prompt—it's stepping into a rich, domain-specific knowledge base. (Karpathy recently mentioned using AI to index and retrieve personal knowledge bases—often without even needing vectorization, as smart grep and indexing work better. This "Based AI, for AI, from AI" context management is something I had already implemented, and it feels like the most natural evolution of human-computer interaction.)

The dynamic "Skill" component is the digital extension of your personal execution paradigm. Sure, for generic tasks like parsing a DOCX, you just use an off-the-shelf plugin. But workflow skills are deeply personal and nearly impossible to substitute. I don't believe in using standard, third-party workflow skills; every individual's needs are highly customized. I built a .agents/skills toolbox inside TC specifically for performance reviews, paper reproduction, and tutorial generation. I also have a private skill repository encapsulating my highly specific habits for logging numerical experiments, SSHing into remote clusters, and drafting grants.

Simply put: the Wiki tells the AI "what we have," and the Skills tell the AI "how I think and solve problems." (Fun fact: the reason this post doesn't sound like AI slop is because I instructed the AI to mimic my previous blog posts. The blog itself became the context. The AI summarized my style as: "No redundant formatting, hardcore geeky tone, stream-of-consciousness switching between tech and philosophy.")

This frontend architecture perfectly mitigates the AI's first character flaw: Laziness. This laziness often stems from performance degradation and attention-loss over long context windows. Anyone who uses AI knows that on long-haul tasks (like full-repo refactors or translations), it loves to slack off, do half the work, or just spit out a function signature with a pass statement. But when you lock the AI in a high-quality Wiki that enforces strict background constraints, and use custom Skills to force large tasks into atomic, pipeline steps, the AI loses the room to cut corners. You have to back the AI into a corner where it has no choice but to apply its full intellect to solve your problem.

The Middle: The Economics of Human-in-the-Loop

When it comes to execution, there is only one rule: reject blind end-to-end automation. Intervening, discussing, and course-correcting in the middle of a task is vastly more economical.

Many people chase the dream of fully autonomous end-to-end agents. But for research or engineering tasks with strict delivery requirements that cannot be 100% automatically verified, this is a recipe for disaster. Human-in-the-loop (HITL) is mandatory. Think of it like a Principal Investigator advising a PhD student. You don't write every line of code for them, but you must have regular syncs, correct their trajectory, and redeploy tasks based on current progress. You don't just wait three months and read the final paper. The time and "human bandwidth" spent on these middle-stage checks seem costly, but compared to the agonizing effort of reverse-engineering what the AI did wrong—or doing a complete rewrite because the architecture was flawed from day one—it is negligible.

Furthermore, one or two sentences of human intuition can be the difference between success and total failure. This is why human experts still matter. A quick pointer can pull an AI out of a logical mud pit; without it, the task stalls. Currently, the best AI-driven research is done by domain experts, and the best AI-written code is guided by senior engineers. Relying on "AI vibes" in a domain you don't understand only yields half-baked prototypes. AI is not a silver bullet; human taste, experience, and intuition remain rare and decisive.

This mentorship model mitigates the AI's second flaw: Impatience. This impatience is an artifact of RLHF, which encourages models to generate the shortest path to an answer. When an AI hits a test failure or a bug, its first instinct is almost never to carefully read the stack trace. Instead, it relies on hallucinated intuition to blindly hack the source code, hoping for a quick green light. It usually makes things worse. If it fails again, it hacks the code again, refusing to write a script to verify its assumptions.

With HITL, we lay down the law: whenever there is an error, the AI is strictly forbidden from touching the source code. It must first write a minimal reproducible demo script to isolate the bug, and then report back to me. Often, just writing the demo makes the AI realize the bug isn't where it thought it was. Only after I confirm the root cause is the AI allowed to modify the codebase. This forced braking mechanism pulls the AI out of its blind-hacking loop and forces rational deduction.

The Backend: Testing, Eval, and the Bandwidth Bottleneck

In the backend evaluation phase, we have to face a harsh reality: while automated testing and evaluation determine the floor of an Agent's capabilities, human bandwidth is almost always the ultimate ceiling.

Automated testing is crucial. It’s the very foundation of why AI excels at coding tasks (think RLVR). Some argue that tests are the new moat, even more important than the implementation itself, because an AI can generate the implementation if the tests are exhaustive. (This is why some modern frameworks open-source their code but close-source their test suites).

But even in highly formalized tasks like code generation—especially when doing secondary development on a mature, opinionated codebase—humans are still required for global architectural design, semantic alignment, and taking ultimate responsibility for the code. Just like managing a team of human engineers, there is a hard limit to how many Agents a human can effectively manage. We cannot infinitely scale compute and Agent instances and expect them to output 100% reliable work entirely on their own. In the AI era, trust and attention are the most precious resources. Testing and acceptance simply require massive human bandwidth to bridge that trust gap.

Since human review is unavoidable, the trick is to exploit the AI's asymmetric capabilities to save our bandwidth. An LLM's ability to judge (discriminate) is significantly stronger than its ability to generate. Therefore, we can introduce AI cross-validation as a firewall before human review. I use an independent, freshly instanced model in an extremely clean context to review the generated code logic, creating an automated loop of adversarial review and revision. The "clean context" is vital—the reviewer AI must never see the messy trial-and-error history of the generator AI, otherwise it will empathize with the generator and lose its objectivity.

This clean-room evaluation mechanism mitigates the AI's third flaw: Deception (Reward Hacking). If you rely solely on basic automated tests, AI becomes terrifyingly deceptive. To make a failing test turn green, it will maliciously use workarounds or physics-defying hardcodes just to hack the test suite. An independent reviewing Agent with strong discriminative capabilities and a clean context acts as a filter, catching these brainless "code-golfing" hacks before they ever reach my desk, saving my precious bandwidth for the final architectural sign-off.

Conclusion

By building deep personal Contexts, forging custom Skill tools, enforcing HITL mentorship, and utilizing clean-room independent evaluations, you really can boost your productivity by an order of magnitude.

But let's be clear: these systems only mitigate the AI's laziness, impatience, and deception—they do not cure it. In the foreseeable future, human bandwidth remains the absolute bottleneck in the Agent workflow. Dreaming of a 100x or 1000x productivity boost today will only result in highly unreliable output.

And perhaps that’s not a bad thing. In this human-machine collaboration, AI is the ultimate generation engine and an untiring preliminary reviewer. But the final quality control, the closing of the physical logic loop, and the ultimate responsibility for the scientific output must rest with the human. When everyone has access to the exact same AI, your accumulated personal data, your polished workflows, and where you choose to invest your limited human bandwidth (decision-making, reviewing, critical insights) become your deepest moats. The irreplaceable nature of humans right now lies in implicit knowledge—taste, intuition, and problem-framing—which cannot be distilled into a text prompt or an executable Skill.

Of course, given the breakneck speed of AI development, if these remaining "irreplaceable" human traits become commoditized a year from now, I won't be surprised. By this time next year, perhaps none of these insights will even be relevant anymore.

Unleashing AI in Quantum Research: Why TensorCircuit-NG is the Ultimate Foundation for the Agent Era

Shixin Zhang — Thu, 12 Mar 2026 01:45:14 +0000

With LLMs and AI agents making code generation faster, cheaper, and more accessible, a massive new frontier has opened in scientific computing. But while AI can easily string logic together, it still needs a powerful, mathematically rigorous engine to drive it.

This is where TensorCircuit-NG (TCNG) truly shines. Far from just adapting to the AI era, TCNG acts as the essential catalyst that makes AI-driven quantum research possible, scalable, and highly performant.

Here is why TCNG is more important than ever for researchers and AI agents alike.

🧱 1. The Foundational "Physics Engine" for AI

AI models are fantastic at orchestrating high-level logic, but they struggle to invent highly optimized, low-level mathematical frameworks from scratch. TCNG represents the kind of deep, specialized engineering that is incredibly hard to replicate. By fusing machine learning backends with customized hardware operators and advanced tensor network contraction engines, TCNG acts as a fundamental infrastructure layer. Just as AI agents don't try to rewrite TensorFlow or PyTorch—they simply use them—agents can call TCNG as foundational building blocks to construct complex quantum applications effortlessly.

🛡️ 2. Guiding AI to High-Performance Paradigms

Left to its own devices, AI can easily generate code that works but runs terribly. TCNG solves this by providing a strict, high-performance architecture. Because TCNG enforces strong paradigms—such as backend-agnostic design, automatic differentiation (AD), Just-In-Time (JIT) compilation, and hardware acceleration (GPUs/TPUs)—it inherently forces AI to write code using best practices. When an agent builds with TCNG, the resulting scripts automatically inherit top-tier performance and scalability without the AI needing to understand the underlying computational bottlenecks.

📚 3. Unmatched Context Completeness for Agents

For an AI agent to be truly autonomous and accurate, it needs massive, high-quality, and unified context. TCNG provides exactly this: over six years of rich, accumulated domain knowledge packed into a cohesive mono-repo. It houses everything from exhaustive documentation to edge-case physics functionalities. Because the entire quantum landscape is mapped out within a single repository, it is incredibly friendly for AI agents to ingest, cross-reference, and use as a springboard for creating entirely new tools and discoveries.

🧠 4. A Massive Training Ground for Automated Discovery

AI learns best by example, and TCNG is built to be the ultimate reference library. We now host over 150 carefully crafted example scripts, providing an incredibly strong foundation for AI to recognize quantum programming patterns and generate novel applications. Leveraging this, we are launching an exciting new initiative: fully automated reproduction of representative quantum research papers, driven entirely by AI using TCNG's vast library as its reference point.

🛠️ 5. Native Agentic Skills Out of the Box

TCNG isn’t just designed for human researchers to use alongside AI; it is actively built to give AI agents superpowers. TCNG provides a series of native "skills" designed to help agents automate complex workflows, including:

End-to-end reproduction of research papers
Seamless code translation across different frameworks
Automated performance optimization and profiling
The auto-generation of interactive demos and educational tutorials

The Bottom Line

In the era of AI agents, coding might be cheap, but world-class scientific infrastructure is priceless. TensorCircuit-NG provides the deep-tech foundation, the optimized paradigms, and the rich, accumulated context that AI needs to push the boundaries of quantum physics. It isn't just a tool; it is the infrastructure that will power the next generation of automated quantum discovery.

We Built the First AI-Native Quantum Software Framework: Say Hello to Agentic TensorCircuit-NG

Shixin Zhang — Sat, 28 Feb 2026 06:02:18 +0000

Quantum computing software is notoriously hard to write.

If you want to simulate a deep quantum neural network or research a new algorithm, you don't just need to understand Hamiltonian dynamics and Hilbert spaces. You also need to be a High-Performance Computing (HPC) expert—wrestling with GPU memory limits (OOMs), vectorization, JIT compilation staging times, and tensor network contraction paths.

For years, we've provided developers with the tools to do this via TensorCircuit-NG, our next-generation open-source, high-performance quantum software framework.

But tools are passive. You still have to do the heavy lifting.

Today, we are changing the paradigm. We are thrilled to announce that TensorCircuit-NG is now the world’s first AI-native quantum programming platform purpose-built for agentic quantum research and automated scientific discovery. By natively integrating skills directly into our repository, your quantum framework now comes with a built-in HPC engineer, a theoretical physicist, and a technical writer.

The Paradigm Shift: Agent-Ready Architecture 🧠

Most AI coding assistants do "line-by-line" translations or generate boilerplate. That doesn't work in quantum simulation, where a poorly placed for loop can increase compilation time from 2 seconds to 2 hours.

Instead of writing endless tutorials on "best practices," we embedded our framework knowledge directly into the repository as Agentic Skills.

If you clone the latest TensorCircuit-NG repo, you'll notice a new directory structure:

Plaintext

.agents/skills/
├── arxiv-reproduce/
├── performance-optimize/
├── tc-rosetta/
└── tutorial-crafter/

These aren't just prompts; they are strict, engineering-bound AI workflows. Let's break down the four superpowers you now have access to right out of the box.

1. `/arxiv-reproduce`: From arXiv ID to JAX-Accelerated Code in Minutes 📄➡️💻

The gap between reading a cutting-edge quantum machine learning paper on arXiv and actually writing the code to reproduce it is huge.

With the arxiv-reproduce skill, you simply hand the AI an arXiv link. The agent will:

Extract the physical intent (the Ansatz, the Hamiltonian, the loss function).
Intelligently scale down the qubit count so it runs on your local machine without blowing up your RAM.
Generate idiomatically correct, JAX-accelerated TensorCircuit-NG code.
Automatically run formatting (black), linting (pylint), and execute the script to save the reproduced figure into a standardized outputs/ folder.

2. `/performance-optimize`: Your Built-in HPC Architect ⚡

Got a quantum script that takes forever to compile or crashes with an Out-of-Memory (OOM) error?

The performance-optimize agent scans your code to identify bottlenecks. It knows the dark arts of quantum HPC: it will automatically eradicate Python loops in favor of jax.vmap, wrap your deep quantum layers in jax.lax.scan to slash JIT staging time, inject jax.checkpoint to trade compute for memory during backpropagation, and seamlessly switch to cotengra for optimal tensor network contraction paths. It even runs A/B benchmarks to prove the speedup!

3. `/tc-rosetta`: End-to-End Cross-Ecosystem Translation 🌍

Migrating from older, object-oriented quantum frameworks (like Qiskit or PennyLane) to a modern, differentiable, functional framework like TensorCircuit-NG is a steep mental shift.

tc-rosetta does not do naive line-by-line syntax swapping. It performs end-to-end intent extraction. It reads your slow, loop-heavy legacy script, understands the math behind it, and rewrites it from scratch using pure JAX-native paradigms. It then executes both scripts and hands you a benchmark report (e.g., "Execution time reduced from 300 seconds to 0.2 seconds").

4. `/tutorial-crafter`: Automated High-Quality Documentation 📝

Writing docs is the bane of every open-source contributor. What if the code could explain itself?

Point tutorial-crafter at any raw TensorCircuit-NG script. It will analyze the physical background and the code, then generate a beautiful, narrative-driven tutorial in both Markdown and HTML formats. It chunks the code logically, adds LaTeX formulas for the physics theory, and explicitly points out the HPC programming highlights (e.g., "Notice how we used vmap here instead of a loop..."). It generates documentation that rivals hand-crafted, premium tutorials.

How to Experience the Magic ✨

Because these skills are built on the open standard, getting started is zero-friction.

Clone the TensorCircuit-NG repository.
Open your terminal in the repo root.
Fire up your AI agent and simply call a skill: /performance-optimize examples/my_slow_circuit.py

You are no longer just writing code; you are directing an autonomous digital research team.

Welcome to the era of Agentic Quantum Software Engineering. We can't wait to see what you discover. Check out the repo, give us a star, and let the AI handle the boilerplate while you focus on the physics! 🌌

DEV Community: Shixin Zhang

Can AI Really Write Quantum Computing Code? Introducing ORBIT-Q: A Dual-Axis Benchmark for AI Agents and Quantum Software Frameworks

Why Existing Benchmarks Are Not Enough

ORBIT-Q: A Dual-Axis Benchmark

Axis 1: Agent Evaluation

Axis 2: Framework Evaluation

Preventing "Cheating"

Results: Which Frameworks Work Best?

Results: Which AI Agents Perform Best?

An Unexpected Observation: Safety False Positives

The Economics of Scientific AI

Looking Ahead

The "Secret of Staying Young" in Quantum Neural Networks

AI's Midlife Crisis: Losing the Ability to Learn

Do Quantum Models Age More Slowly?

Geometry Matters

Why Quantum Neural Networks Behave Differently

From Theory to Large-Scale Validation

A Different Perspective on Quantum Advantage

The Two Paradigms of Scientific Computing Agents: Abstraction, Openness, and "The Bitter Lesson"

Abstraction Boundaries vs. The Space for Innovation

Project Context and the Information Horizon

From Code Generation to Workflow Orchestration

General Beats Specialized: The "Bitter Lesson" in the Agent Era

Conclusion

PyTrees Are Not One Thing: JAX, PyTorch, and TensorFlow Compared

The Shape Of The APIs

A Compact Map Of The Differences

None: A Ghost Node In JAX, A Leaf Elsewhere

Dictionaries: The Same Keys, Different Time Arrows

Ordered Containers Are Not Just Dicts With Better Manners

defaultdict: Losing The Type Changes Behavior

Custom Containers: Either Register Them Or Treat Them As Leaves

The Real Trap: tree_map Does Not Always Mean Same-Structure Map

Transform APIs: PyTree Support Is Not Just Flattening

Closing

TensorCircuit-NG vs cuQuantum on H200: JIT compilation beats the "magic GPU library" assumption

Benchmark setup

Implementations compared

Repeated value-and-gradient runtime

First-call cost and amortization

Programming model

Takeaway

Why JAX Is a Much Better Backend for Quantum Circuit Simulation Than PyTorch

Results

Why This Happens

A Note on TorchQuantum

Takeaway

TensorCircuit-NG: How to Tell Whether a Quantum x AI x HPC Platform Is Truly Mature When Everyone Tells the Same Story

1. Is It Open Source?

2. Are There Public Benchmarks?

3. Is It Used by the Research Community?

Quantum Simulation and Many-Body Systems

Quantum Machine Learning

Quantum Architecture Search and Algorithm Design

Quantum Chemistry and Fermionic Simulation

Quantum Sensing and Imaging

4. Has It Supported Industry-Oriented Applications?

5. Does It Continue to Evolve?

Conclusion: What Builds Trust in Scientific Infrastructure?

TensorCircuit-NG: Quantum Software On AI, For AI, With AI

On AI: quantum software should inherit the AI stack

For AI: a platform for fast quantum machine learning

With AI: a platform for agent-driven research

The deeper claim

Getting started

Next-Generation Software: From by-AI to within-AI

Research Software Without A Traditional Form

Why This Is Not "Everyone Writes A Custom App"

The Structural Tax Of Vibe Coding

The Compilation Target Of Software Has Changed

A Skill Is Not A Plugin. It Is A New Software Unit.

The Cost Advantage Is Not Just Form. It Is A Pricing-Layer Mismatch.

LLM OS Is Not A Metaphor

This Example Generalizes Far Beyond

Design Principles From This Project

Closing

Agentic R&D Insights

The Frontend: Personal Context as the Ultimate Moat

The Middle: The Economics of Human-in-the-Loop

`None`: A Ghost Node In JAX, A Leaf Elsewhere

`defaultdict`: Losing The Type Changes Behavior

The Real Trap: `tree_map` Does Not Always Mean Same-Structure Map

1. `/arxiv-reproduce`: From arXiv ID to JAX-Accelerated Code in Minutes 📄➡️💻

2. `/performance-optimize`: Your Built-in HPC Architect ⚡

3. `/tc-rosetta`: End-to-End Cross-Ecosystem Translation 🌍

4. `/tutorial-crafter`: Automated High-Quality Documentation 📝