<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Shixin Zhang</title>
    <description>The latest articles on DEV Community by Shixin Zhang (@refractionray).</description>
    <link>https://dev.to/refractionray</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.us-east-2.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3763205%2F5c5af020-0b22-4443-aa68-28b3150f48e4.png</url>
      <title>DEV Community: Shixin Zhang</title>
      <link>https://dev.to/refractionray</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/refractionray"/>
    <language>en</language>
    <item>
      <title>PyTrees Are Not One Thing: JAX, PyTorch, and TensorFlow Compared</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Fri, 12 Jun 2026 06:18:20 +0000</pubDate>
      <link>https://dev.to/refractionray/pytrees-are-not-one-thing-jax-pytorch-and-tensorflow-compared-hjh</link>
      <guid>https://dev.to/refractionray/pytrees-are-not-one-thing-jax-pytorch-and-tensorflow-compared-hjh</guid>
      <description>&lt;p&gt;PyTrees look deceptively simple. You flatten a nested Python object into leaves, keep a structure descriptor, and later rebuild or map over the same shape. That abstraction is powerful enough to carry optimizer states, model parameters, batched inputs, gradients, and sharding annotations. It is also just ambiguous enough that three major frameworks implement three subtly different languages under the same idea.&lt;/p&gt;

&lt;p&gt;This note compares JAX &lt;code&gt;jax.tree_util&lt;/code&gt;, PyTorch &lt;code&gt;torch.utils._pytree&lt;/code&gt;, and TensorFlow &lt;code&gt;tf.nest&lt;/code&gt;. I tested the behavior in two environments: an older stack with JAX 0.4.35, PyTorch 2.2.2, TensorFlow 2.20.0, and a newer stack with JAX 0.10.0, PyTorch 2.12.0, TensorFlow 2.21.0. Most flatten/unflatten semantics were stable across these versions. The main version-sensitive result is PyTorch: &lt;code&gt;_pytree.tree_map&lt;/code&gt; in 2.2.2 accepts only one pytree, while 2.12.0 supports multiple pytrees and behaves much closer to JAX prefix-style mapping.&lt;/p&gt;

&lt;p&gt;The short version: JAX treats pytrees as a transformation language, PyTorch is converging toward that model in &lt;code&gt;torch.func&lt;/code&gt;, and TensorFlow exposes a broader nested-structure utility through &lt;code&gt;tf.nest&lt;/code&gt;. Those differences show up exactly where backend-agnostic libraries usually hurt: &lt;code&gt;None&lt;/code&gt;, dictionary order, custom containers, &lt;code&gt;tree_map&lt;/code&gt;, autodiff, and vectorization.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Shape Of The APIs
&lt;/h2&gt;

&lt;p&gt;The three APIs have the same surface story but not the same contract.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;jax&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tree_util&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;jtu&lt;/span&gt;
&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.utils&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;_pytree&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tpu&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;tensorflow&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;

&lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;treedef&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_unflatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;treedef&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;trees&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;

&lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_unflatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;spec&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;          &lt;span class="c1"&gt;# PyTorch 2.2.2
&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;trees&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;        &lt;span class="c1"&gt;# PyTorch 2.12.0
&lt;/span&gt;
&lt;span class="n"&gt;leaves&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pack_sequence_as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;structure&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;leaves&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map_structure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="o"&gt;*&lt;/span&gt;&lt;span class="n"&gt;structures&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Flattening means "which objects are leaves?" Unflattening means "what metadata is needed to reconstruct the original container?" Mapping means "what does it mean for several structures to match?" Those three questions are where the frameworks diverge.&lt;/p&gt;

&lt;p&gt;JAX calls its structure descriptor a &lt;code&gt;PyTreeDef&lt;/code&gt;, so &lt;code&gt;treedef&lt;/code&gt; is the conventional variable name. PyTorch calls the analogous descriptor a &lt;code&gt;TreeSpec&lt;/code&gt;, so examples and internals often name it &lt;code&gt;spec&lt;/code&gt;. Conceptually they play the same role: they describe the container skeleton and the metadata needed to rebuild it from a flat leaf list. TensorFlow's &lt;code&gt;tf.nest&lt;/code&gt; does not return a separate treedef object from &lt;code&gt;flatten&lt;/code&gt;; instead, &lt;code&gt;pack_sequence_as&lt;/code&gt; takes an existing nested &lt;code&gt;structure&lt;/code&gt; as the template.&lt;/p&gt;

&lt;p&gt;There is also a small argument-order trap. JAX unflattens as &lt;code&gt;tree_unflatten(treedef, leaves)&lt;/code&gt;, while PyTorch unflattens as &lt;code&gt;tree_unflatten(leaves, spec)&lt;/code&gt;. TensorFlow's equivalent is &lt;code&gt;pack_sequence_as(structure, leaves)&lt;/code&gt;. &lt;/p&gt;

&lt;h2&gt;
  
  
  A Compact Map Of The Differences
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Case&lt;/th&gt;
&lt;th&gt;JAX&lt;/th&gt;
&lt;th&gt;PyTorch &lt;code&gt;_pytree&lt;/code&gt;
&lt;/th&gt;
&lt;th&gt;TensorFlow &lt;code&gt;tf.nest&lt;/code&gt;
&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Scalar&lt;/td&gt;
&lt;td&gt;Leaf&lt;/td&gt;
&lt;td&gt;Leaf&lt;/td&gt;
&lt;td&gt;Leaf&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;None&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Empty pytree, 0 leaves&lt;/td&gt;
&lt;td&gt;Leaf&lt;/td&gt;
&lt;td&gt;Leaf&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;list&lt;/code&gt;, &lt;code&gt;tuple&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;td&gt;Containers&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;namedtuple&lt;/td&gt;
&lt;td&gt;Container, type-strict&lt;/td&gt;
&lt;td&gt;Container, type-strict&lt;/td&gt;
&lt;td&gt;Container, type-strict&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;plain &lt;code&gt;dict&lt;/code&gt; order&lt;/td&gt;
&lt;td&gt;Sorted keys&lt;/td&gt;
&lt;td&gt;Insertion order&lt;/td&gt;
&lt;td&gt;Sorted-key leaf order&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;OrderedDict&lt;/code&gt; order&lt;/td&gt;
&lt;td&gt;Insertion order&lt;/td&gt;
&lt;td&gt;Insertion order&lt;/td&gt;
&lt;td&gt;Sorted-key leaf order&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;defaultdict&lt;/code&gt; order&lt;/td&gt;
&lt;td&gt;Sorted keys&lt;/td&gt;
&lt;td&gt;Insertion order&lt;/td&gt;
&lt;td&gt;Sorted-key leaf order&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;defaultdict.default_factory&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;Preserved&lt;/td&gt;
&lt;td&gt;Preserved&lt;/td&gt;
&lt;td&gt;Preserved&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;custom &lt;code&gt;dict&lt;/code&gt; subclass&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;custom &lt;code&gt;list&lt;/code&gt;/&lt;code&gt;tuple&lt;/code&gt; subclass&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Container&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;dataclass instance&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Leaf unless registered&lt;/td&gt;
&lt;td&gt;Leaf by default&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;multi-arg &lt;code&gt;tree_map&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Supported, prefix semantics&lt;/td&gt;
&lt;td&gt;PyTorch 2.2.2: not supported; PyTorch 2.12.0: supported with prefix semantics&lt;/td&gt;
&lt;td&gt;Supported, strict same structure&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;unflatten arity mismatch&lt;/td&gt;
&lt;td&gt;Raises &lt;code&gt;ValueError&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Raises &lt;code&gt;ValueError&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;Raises &lt;code&gt;ValueError&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The rest of the note explains why these rows matter.&lt;/p&gt;

&lt;h2&gt;
  
  
  &lt;code&gt;None&lt;/code&gt;: A Ghost Node In JAX, A Leaf Elsewhere
&lt;/h2&gt;

&lt;p&gt;The cleanest way to feel the philosophical split is &lt;code&gt;None&lt;/code&gt;. In JAX, &lt;code&gt;None&lt;/code&gt; is not a value to map over. It is a zero-leaf structural marker.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# leaves: []
# treedef: PyTreeDef(None)
&lt;/span&gt;
&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mapped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# None
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;In PyTorch and TensorFlow, &lt;code&gt;None&lt;/code&gt; is a leaf.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [None]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [None]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mapped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ("mapped", None)
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map_structure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;mapped&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ("mapped", None)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The nested case makes the difference visible:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 3]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [1, None, 3]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [1, None, 3]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If &lt;code&gt;None&lt;/code&gt; means "optional value absent", JAX treats it structurally. If &lt;code&gt;None&lt;/code&gt; means "a value in my tree", PyTorch and TensorFlow are closer to that intuition.&lt;/p&gt;

&lt;h2&gt;
  
  
  Dictionaries: The Same Keys, Different Time Arrows
&lt;/h2&gt;

&lt;p&gt;Plain &lt;code&gt;dict&lt;/code&gt; is a container everywhere, but the traversal order differs. JAX sorts keys, PyTorch follows insertion order, and TensorFlow assigns leaves by sorted keys while preserving the original mapping order when rebuilding.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]   # a, then b
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [2, 1]   # b, then a
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]   # a, then b
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Replacing the leaves with &lt;code&gt;[10, 20]&lt;/code&gt; shows the reconstruction contract:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# JAX
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# PyTorch
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# TensorFlow
&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TensorFlow's result is the surprising one on first read. It maps values according to sorted keys, but prints in the original insertion order. The object order and the leaf assignment order are not the same concept.&lt;/p&gt;

&lt;p&gt;Mixed incomparable key types are another consequence of sorting. JAX and TensorFlow fail on &lt;code&gt;{1: "one", "2": "two"}&lt;/code&gt; because &lt;code&gt;1 &amp;lt; "2"&lt;/code&gt; is not defined. PyTorch does not sort and therefore flattens this case in insertion order.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;two&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# ValueError: Comparator raised exception while sorting pytree dictionary keys.
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;two&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;span class="c1"&gt;# TypeError: '&amp;lt;' not supported between instances of 'str' and 'int'
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;one&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;two&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;})[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# ["one", "two"]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Ordered Containers Are Not Just Dicts With Better Manners
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;OrderedDict&lt;/code&gt; has explicit order metadata, and JAX treats that metadata as part of the tree structure. PyTorch uses insertion order too. TensorFlow again uses sorted-key leaf assignment.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;OrderedDict&lt;/span&gt;

&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [2, 1]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [2, 1]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three preserve the &lt;code&gt;OrderedDict&lt;/code&gt; type when rebuilding, but TensorFlow assigns replacement leaves by sorted key:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;pack_sequence_as&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)]),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# OrderedDict([("b", 20), ("a", 10)])
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Multi-argument mapping reveals another difference. JAX rejects two &lt;code&gt;OrderedDict&lt;/code&gt;s with the same keys but different order because the custom node metadata differs.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;a&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;
&lt;span class="n"&gt;b&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)])&lt;/span&gt;

&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="n"&gt;a&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;b&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ValueError: Mismatch custom node data: ('b', 'a') != ('a', 'b')
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TensorFlow accepts this and pairs by key while preserving the first structure's order. PyTorch 2.12.0 also accepts it and returns the same visible result:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nc"&gt;OrderedDict&lt;/span&gt;&lt;span class="p"&gt;([(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)),&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;))])&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  &lt;code&gt;defaultdict&lt;/code&gt;: Losing The Type Changes Behavior
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;defaultdict&lt;/code&gt; is not a decorative subclass. It carries a &lt;code&gt;default_factory&lt;/code&gt;, which changes lookup behavior.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;collections&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;defaultdict&lt;/span&gt;

&lt;span class="n"&gt;counter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;counter&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# 0
&lt;/span&gt;
&lt;span class="n"&gt;plain&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{}&lt;/span&gt;
&lt;span class="n"&gt;plain&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;missing&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# KeyError
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;All three frameworks preserve the &lt;code&gt;default_factory&lt;/code&gt;, but they disagree about leaf order just as with dictionaries.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tree&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [2, 1]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tree&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Rebuilding with &lt;code&gt;[10, 20]&lt;/code&gt; gives:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# JAX
&lt;/span&gt;&lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# PyTorch
&lt;/span&gt;&lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;

&lt;span class="c1"&gt;# TensorFlow
&lt;/span&gt;&lt;span class="nf"&gt;defaultdict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;int&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;})&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This matters for any pure Python fallback. If it flattens a &lt;code&gt;defaultdict&lt;/code&gt; as a mapping but reconstructs a plain &lt;code&gt;dict&lt;/code&gt;, it is wrong, not merely imprecise.&lt;/p&gt;

&lt;h2&gt;
  
  
  Custom Containers: Either Register Them Or Treat Them As Leaves
&lt;/h2&gt;

&lt;p&gt;JAX and PyTorch are conservative about arbitrary subclasses. TensorFlow is more eager to recurse into sequence and mapping subclasses.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyDict&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;dict&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyList&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;list&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;

&lt;span class="k"&gt;class&lt;/span&gt; &lt;span class="nc"&gt;MyTuple&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;tuple&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;pass&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JAX and PyTorch treat these as leaves unless explicitly registered:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyDict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}))[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [MyDict({"b": 2, "a": 1})]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyList&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]))[&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="c1"&gt;# [MyList([1, 2])]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TensorFlow traverses them:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyDict&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;b&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;a&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;}))&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyList&lt;/span&gt;&lt;span class="p"&gt;([&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;]))&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;flatten&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nc"&gt;MyTuple&lt;/span&gt;&lt;span class="p"&gt;((&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;)))&lt;/span&gt;
&lt;span class="c1"&gt;# [1, 2]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Namedtuple is the standard exception. All three frameworks recognize it as a structural container and preserve its type. They are also strict about namedtuple type matching: &lt;code&gt;Point(1, 2)&lt;/code&gt; is not the same structure as &lt;code&gt;(1, 2)&lt;/code&gt; or &lt;code&gt;RGB(1, 2)&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Real Trap: &lt;code&gt;tree_map&lt;/code&gt; Does Not Always Mean Same-Structure Map
&lt;/h2&gt;

&lt;p&gt;JAX &lt;code&gt;tree_map&lt;/code&gt; uses the first argument as the reference structure. Later arguments are flattened "up to" that structure. If the first tree has a leaf, the corresponding value in a later tree may be an entire subtree.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="c1"&gt;# [(1, [3]), (2, {"x": 4})]
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The first tree says: "I am a list of two leaves." Therefore the second tree only needs to be a list of two objects. Those objects are passed whole to the function.&lt;/p&gt;

&lt;p&gt;The scalar case is even clearer:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# (1, [2, 3])
&lt;/span&gt;
&lt;span class="n"&gt;jtu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ValueError: Expected list, got 3.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PyTorch 2.12.0 behaves similarly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="c1"&gt;# [(1, [3]), (2, {"x": 4})]
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="c1"&gt;# (1, [2, 3])
&lt;/span&gt;
&lt;span class="n"&gt;tpu&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tree_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# ValueError: Node type mismatch; expected &amp;lt;class 'list'&amp;gt;, but got &amp;lt;class 'int'&amp;gt;.
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;PyTorch 2.2.2 did not support this multi-pytree call through &lt;code&gt;_pytree.tree_map&lt;/code&gt;. TensorFlow supports multiple structures, but it requires strict structural equality:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;nest&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;map_structure&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="k"&gt;lambda&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;[[&lt;/span&gt;&lt;span class="mi"&gt;3&lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="mi"&gt;4&lt;/span&gt;&lt;span class="p"&gt;}])&lt;/span&gt;
&lt;span class="c1"&gt;# ValueError: structures do not have the same nested structure
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  Transform APIs: PyTree Support Is Not Just Flattening
&lt;/h2&gt;

&lt;p&gt;Tree semantics matter most when they meet transforms. Here the frameworks differ again.&lt;/p&gt;

&lt;p&gt;JAX transformations are natively pytree-based. &lt;code&gt;grad&lt;/code&gt; accepts nested inputs and returns gradients with the same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;jax&lt;/span&gt;
&lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;jax.numpy&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;jnp&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;array&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;
&lt;span class="n"&gt;jax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {"x": Array(4., dtype=float32), "y": [Array(27., dtype=float32)]}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;JAX &lt;code&gt;vmap&lt;/code&gt; accepts nested pytree inputs too:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;batched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="n"&gt;jax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;batched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# Array([10., 12., 14.], dtype=float32)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Because &lt;code&gt;None&lt;/code&gt; is a zero-leaf node in JAX, it can sit inside a vmapped input without becoming a batched argument:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;h&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;jax&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;h&lt;/span&gt;&lt;span class="p"&gt;)({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;jnp&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="bp"&gt;None&lt;/span&gt;&lt;span class="p"&gt;]})&lt;/span&gt;
&lt;span class="c1"&gt;# Array([0., 1., 2.], dtype=float32)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Classic PyTorch autograd is different. &lt;code&gt;torch.autograd.grad&lt;/code&gt; expects tensors or gradient edges as &lt;code&gt;inputs&lt;/code&gt;, not an arbitrary nested pytree:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;requires_grad&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="bp"&gt;True&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;autograd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="c1"&gt;# (tensor(4.), tensor(27.))
&lt;/span&gt;
&lt;span class="n"&gt;nested&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;autograd&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# RuntimeError: all inputs have to be Tensors or GradientEdges, but got str
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The newer &lt;code&gt;torch.func&lt;/code&gt; stack does understand nested pytree-like parameter structures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="kn"&gt;from&lt;/span&gt; &lt;span class="n"&gt;torch.func&lt;/span&gt; &lt;span class="kn"&gt;import&lt;/span&gt; &lt;span class="n"&gt;grad&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;vmap&lt;/span&gt;

&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="n"&gt;params&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;tensor&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)]}&lt;/span&gt;
&lt;span class="nf"&gt;grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;f&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {"x": tensor(4.), "y": [tensor(27.)]}
&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;batched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;torch&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;arange&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="nf"&gt;vmap&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;)(&lt;/span&gt;&lt;span class="n"&gt;batched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# tensor([10., 12., 14.])
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;TensorFlow's transform support follows &lt;code&gt;tf.nest&lt;/code&gt;. &lt;code&gt;GradientTape.gradient&lt;/code&gt; accepts nested sources and returns gradients in the same structure:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;x&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;y&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Variable&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;nested&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;x&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;y&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;

&lt;span class="k"&gt;with&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;GradientTape&lt;/span&gt;&lt;span class="p"&gt;()&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;
    &lt;span class="n"&gt;loss&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="n"&gt;tape&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;gradient&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;loss&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;nested&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# {"x": tf.Tensor(4.0), "y": [tf.Tensor(27.0)]}
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tf.vectorized_map&lt;/code&gt; also accepts nested input structures:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;g&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="n"&gt;batched&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;10&lt;/span&gt;&lt;span class="p"&gt;]}&lt;/span&gt;
&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;vectorized_map&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;g&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;batched&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="c1"&gt;# tf.Tensor([10. 12. 14.], shape=(3,), dtype=float32)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;code&gt;tf.function&lt;/code&gt; accepts nested structures as ordinary function arguments:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="nd"&gt;@tf.function&lt;/span&gt;
&lt;span class="k"&gt;def&lt;/span&gt; &lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;return&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;2&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;][&lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;**&lt;/span&gt; &lt;span class="mi"&gt;3&lt;/span&gt;

&lt;span class="nf"&gt;f&lt;/span&gt;&lt;span class="p"&gt;({&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;x&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;2.0&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;y&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;tf&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;constant&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="mf"&gt;3.0&lt;/span&gt;&lt;span class="p"&gt;)]})&lt;/span&gt;
&lt;span class="c1"&gt;# tf.Tensor(31.0, shape=(), dtype=float32)
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The right summary is more specific: JAX transforms are pytree-native; PyTorch classic autograd is not, while &lt;code&gt;torch.func&lt;/code&gt; is; TensorFlow transform APIs accept nested structures.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;PyTrees are a small abstraction with a long tail. Simple examples make every framework look compatible; real optimizer states, optional values, ordered mappings, custom containers, and transform APIs expose the differences quickly.&lt;/p&gt;

</description>
      <category>jax</category>
      <category>torch</category>
      <category>tensorflow</category>
    </item>
    <item>
      <title>TensorCircuit-NG vs cuQuantum on H200: JIT compilation beats the "magic GPU library" assumption</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Sun, 07 Jun 2026 02:02:29 +0000</pubDate>
      <link>https://dev.to/refractionray/tensorcircuit-ng-vs-cuquantum-on-h200-jit-compilation-beats-the-magic-gpu-library-assumption-d5c</link>
      <guid>https://dev.to/refractionray/tensorcircuit-ng-vs-cuquantum-on-h200-jit-compilation-beats-the-magic-gpu-library-assumption-d5c</guid>
      <description>&lt;p&gt;NVIDIA cuQuantum has a strong reputation as the natural high-performance baseline for GPU quantum simulation. That reputation is understandable: cuQuantum contains serious low-level GPU libraries such as cuStateVec and cuTensorNet and it is NVIDIA who creates GPU and CUDA!&lt;/p&gt;

&lt;p&gt;But in an end-to-end differentiable VQE workload, the result is more nuanced. On our H200 GPU benchmark, TensorCircuit-NG was substantially faster after compilation, while also offering a much higher-level and user-friendly programming model.&lt;/p&gt;

&lt;p&gt;The short version:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;cuQuantum is a powerful low-level library.&lt;/li&gt;
&lt;li&gt;It is not automatically the fastest route for practical quantum simulation tasks.&lt;/li&gt;
&lt;li&gt;Direct cuQuantum code is significantly more verbose and engineering-heavy.&lt;/li&gt;
&lt;li&gt;TensorCircuit-NG pays a JAX compilation cost, but repeated value-and-gradient evaluations quickly amortize that cost.&lt;/li&gt;
&lt;li&gt;The final running time of TensorCircuit-NG is much shorter than NVIDIA cuquantum.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Benchmark setup
&lt;/h2&gt;

&lt;p&gt;We used the workload as in &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng/blob/master/examples/benchmark_cuquantum_vs_tc_vqe.py" rel="noopener noreferrer"&gt;the script&lt;/a&gt; for 1D TFIM VQE task:&lt;/p&gt;

&lt;p&gt;Hardware and software:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU: NVIDIA H200&lt;/li&gt;
&lt;li&gt;TensorCircuit-NG: &lt;code&gt;1.6.0&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;JAX: &lt;code&gt;0.7.2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;cuQuantum Python: &lt;code&gt;26.3.2&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;CuPy: &lt;code&gt;14.1.1&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;PyTorch: &lt;code&gt;2.11.0+cu128&lt;/code&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We measured one warmup/compile call and then the mean of five later value-and-gradient calls.&lt;/p&gt;

&lt;h2&gt;
  
  
  Implementations compared
&lt;/h2&gt;

&lt;p&gt;We tested two TensorCircuit-NG modes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;TC-JAX scan&lt;/strong&gt;: uses &lt;code&gt;scan&lt;/code&gt; over VQE layers to reduce JAX compilation/staging time.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;TC-JAX unrolled&lt;/strong&gt;: builds all layers directly. This produces a larger traced program, but can be faster after compilation.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;We also tested two direct cuQuantum routes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;cuStateVec adjoint&lt;/strong&gt;: applies gates with cuStateVec and computes the full gradient with adjoint differentiation. This is not parameter shift so it is a fair comparison.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;cuTensorNet full-state autograd&lt;/strong&gt;: contracts the full state with cuTensorNet, then computes the TFIM state-vector expectation on GPU with PyTorch autograd.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The cuTensorNet path is intentionally not the obviously bad version where every Pauli term gets a separate tensor-network path search. We first tried that more "TN-native" observable-contraction style, but for this workload it spent too much time in repeated graph/path overhead. The final version is closer to the state-vector expectation workflow used by the TensorCircuit-NG and MindQuantum benchmark.&lt;/p&gt;

&lt;h2&gt;
  
  
  Repeated value-and-gradient runtime
&lt;/h2&gt;

&lt;p&gt;The table below reports the post-warmup runtime. This is the relevant metric for VQE-style optimization, where the same circuit structure is evaluated many times.&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;backend&lt;/th&gt;
&lt;th&gt;14 qubits&lt;/th&gt;
&lt;th&gt;20 qubits&lt;/th&gt;
&lt;th&gt;24 qubits&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TC-JAX scan&lt;/td&gt;
&lt;td&gt;0.01201s&lt;/td&gt;
&lt;td&gt;0.01616s&lt;/td&gt;
&lt;td&gt;0.06374s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TC-JAX unrolled&lt;/td&gt;
&lt;td&gt;0.00995s&lt;/td&gt;
&lt;td&gt;0.01381s&lt;/td&gt;
&lt;td&gt;0.02547s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cuStateVec adjoint&lt;/td&gt;
&lt;td&gt;0.08036s&lt;/td&gt;
&lt;td&gt;0.12061s&lt;/td&gt;
&lt;td&gt;0.30142s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;cuTensorNet full-state autograd&lt;/td&gt;
&lt;td&gt;1.35677s&lt;/td&gt;
&lt;td&gt;2.04291s&lt;/td&gt;
&lt;td&gt;2.30414s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;In repeated value-and-gradient calls, TensorCircuit-NG is faster than cuStateVec:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;qubits&lt;/th&gt;
&lt;th&gt;TC-JAX scan vs cuStateVec&lt;/th&gt;
&lt;th&gt;TC-JAX unrolled vs cuStateVec&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;6.69x&lt;/td&gt;
&lt;td&gt;8.08x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;7.46x&lt;/td&gt;
&lt;td&gt;8.73x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;4.73x&lt;/td&gt;
&lt;td&gt;11.83x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The gap is much larger against the cuTensorNet route for this particular state-vector expectation plus autograd workflow:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;qubits&lt;/th&gt;
&lt;th&gt;TC-JAX scan vs cuTensorNet&lt;/th&gt;
&lt;th&gt;TC-JAX unrolled vs cuTensorNet&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;14&lt;/td&gt;
&lt;td&gt;112.97x&lt;/td&gt;
&lt;td&gt;136.36x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;20&lt;/td&gt;
&lt;td&gt;126.42x&lt;/td&gt;
&lt;td&gt;147.93x&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;24&lt;/td&gt;
&lt;td&gt;36.15x&lt;/td&gt;
&lt;td&gt;90.46x&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These numbers are the main point: cuQuantum is not a magic speed button. A library being close to CUDA, or being written by a GPU vendor, does not automatically make it the fastest end-to-end implementation for a differentiable quantum algorithm.&lt;/p&gt;

&lt;h2&gt;
  
  
  First-call cost and amortization
&lt;/h2&gt;

&lt;p&gt;cuQuantum has much lower first-call overhead. This is expected: TensorCircuit-NG uses JAX JIT compilation, and that first call can be expensive.&lt;/p&gt;

&lt;p&gt;So if the task is a single one-off circuit evaluation, cuQuantum's low startup cost is attractive. But VQE is usually not a one-off workload. It repeatedly evaluates the same circuit structure for many optimizer steps and often across multiple random initializations. In that regime, TensorCircuit-NG's first-call cost is easily amortized, and the much faster post-compilation runtime becomes the dominant factor.&lt;/p&gt;

&lt;p&gt;There is also a useful TensorCircuit-NG tradeoff:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Use &lt;strong&gt;scan mode&lt;/strong&gt; when compilation time matters.&lt;/li&gt;
&lt;li&gt;Use &lt;strong&gt;unrolled mode&lt;/strong&gt; when the same circuit will be evaluated many times and peak post-compilation throughput matters.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At 24 qubits, unrolled TensorCircuit-NG is about &lt;code&gt;2.50x&lt;/code&gt; faster than scan mode after compilation, but the first call is about &lt;code&gt;9x&lt;/code&gt; heavier.&lt;/p&gt;

&lt;h2&gt;
  
  
  Programming model
&lt;/h2&gt;

&lt;p&gt;Performance is only half of the story. The programming model matters.&lt;/p&gt;

&lt;p&gt;In TensorCircuit-NG, the benchmark is expressed as circuit code:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;c&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nc"&gt;Circuit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;h&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;layer&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;depth&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt; &lt;span class="o"&gt;-&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rzz&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="o"&gt;+&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;theta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
    &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="nf"&gt;range&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;n&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt;
        &lt;span class="n"&gt;c&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;rx&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;theta&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;layer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="mi"&gt;1&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;i&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;

&lt;span class="n"&gt;value_and_grad&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;jit&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;tc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;backend&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;value_and_grad&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;energy_fn&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;With direct cuQuantum, the user has to manually manage much lower-level details:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;gate matrices and their dtype conventions&lt;/li&gt;
&lt;li&gt;state-vector memory&lt;/li&gt;
&lt;li&gt;cuStateVec binding signatures&lt;/li&gt;
&lt;li&gt;tensor-network modes&lt;/li&gt;
&lt;li&gt;PyTorch operands for autograd&lt;/li&gt;
&lt;li&gt;GPU synchronization&lt;/li&gt;
&lt;li&gt;version-specific API behavior&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;cuQuantum is valuable, but it is closer to a low-level engine than a high-level quantum algorithm framework. For a researcher, that difference is very real.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;This benchmark does &lt;strong&gt;not&lt;/strong&gt; prove that cuQuantum is slow for every task.  What this benchmark does show is narrower and more practical:&lt;/p&gt;

&lt;p&gt;For VQE workload, direct cuQuantum was not the fastest end-to-end route. TensorCircuit-NG provided a much simpler programming interface and substantially faster repeated value-and-gradient evaluations after JAX compilation.&lt;/p&gt;

&lt;p&gt;The common assumption that "NVIDIA controls CUDA, therefore cuQuantum must be the fastest implementation" is too simplistic. Raw GPU kernels matter, but so do JIT compilation, autodiff integration, graph-level optimization, and the abstraction level exposed to users.&lt;/p&gt;

&lt;p&gt;TensorCircuit-NG's advantage is that it lets users write concise quantum-program code while still compiling to high-performance backend-native tensor programs. For repeated VQE-style workloads, that combination can beat direct cuQuantum both in usability and in runtime.&lt;/p&gt;

</description>
      <category>python</category>
      <category>gpu</category>
      <category>cuda</category>
    </item>
    <item>
      <title>Why JAX Is a Much Better Backend for Quantum Circuit Simulation Than PyTorch</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Sat, 06 Jun 2026 05:01:36 +0000</pubDate>
      <link>https://dev.to/refractionray/why-jax-is-a-much-better-backend-for-quantum-circuit-simulation-than-pytorch-ak6</link>
      <guid>https://dev.to/refractionray/why-jax-is-a-much-better-backend-for-quantum-circuit-simulation-than-pytorch-ak6</guid>
      <description>&lt;p&gt;Modern quantum circuit simulation is not just “machine learning with complex tensors.” It involves irregular tensor contractions, sparse operators, statevector transformations, and automatic differentiation through all of them. This makes backend choice unusually important. A backend that is excellent for standard neural-network layers may still be a poor fit for general quantum simulation workloads.&lt;/p&gt;

&lt;p&gt;We benchmarked this with a simple VQE workload for the 1D transverse-field Ising&lt;br&gt;
model as in &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng/blob/master/examples/benchmark_jax_vs_torch_vqe.py" rel="noopener noreferrer"&gt;the script&lt;/a&gt;,&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;H = -sum_i Z_i Z_{i+1} - sum_i X_i,
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;using 20 qubits, 10 ansatz layers, complex64 precision, and one NVIDIA RTX 5090 GPU. &lt;/p&gt;

&lt;h2&gt;
  
  
  Results
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Backend&lt;/th&gt;
&lt;th&gt;Compile / Warmup&lt;/th&gt;
&lt;th&gt;Value+Grad Runtime&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;TensorCircuit-NG, JAX backend&lt;/td&gt;
&lt;td&gt;53.53 s&lt;/td&gt;
&lt;td&gt;0.0265 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TensorCircuit-NG, PyTorch backend&lt;/td&gt;
&lt;td&gt;0.48 s&lt;/td&gt;
&lt;td&gt;0.3299 s&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;TorchQuantum, optimized implementation than default&lt;/td&gt;
&lt;td&gt;0.81 s&lt;/td&gt;
&lt;td&gt;0.4172 s&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The JAX backend is about &lt;strong&gt;12.4x faster&lt;/strong&gt; than TensorCircuit-NG’s PyTorch backend and about &lt;strong&gt;15.7x faster&lt;/strong&gt; than TorchQuantum for the post-compilation value-and-gradient step.&lt;/p&gt;

&lt;p&gt;The compile time tells the other half of the story: JAX pays a much larger upfront XLA compilation cost. But after compilation, XLA produces a far more effective execution plan for this quantum simulation workload. This is exactly the tradeoff we want in VQE, QAOA, time evolution, and many other iterative algorithms: pay once, run many times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Happens
&lt;/h2&gt;

&lt;p&gt;Quantum circuit simulation stresses a backend differently from ordinary deep learning. The workload mixes tensor-network contraction, sparse Hamiltonian application, and reverse-mode differentiation. JAX/XLA is designed to see the whole computation and optimize it aggressively as a compiled program on the target device.&lt;/p&gt;

&lt;p&gt;PyTorch, in contrast, is strongest where the workload resembles standard neural network layers. For more general tensor programs, especially tensor-network-like simulation code, the compiler stack is less aggressive and less predictable.&lt;br&gt;
In this benchmark, the same TensorCircuit-NG algorithm is more than an order of magnitude faster on JAX than on PyTorch after compilation.&lt;/p&gt;

&lt;h2&gt;
  
  
  A Note on TorchQuantum
&lt;/h2&gt;

&lt;p&gt;We also compared against TorchQuantum as a representative PyTorch-native quantum circuit package. To make the comparison generous, we did not use its generic Pauli-string expectation path. That built-in route tends to materialize dense Pauli operators and is slow and not scalable. Instead, we implemented a TFIM-specific expectation directly extracted from state:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;ZZ&lt;/code&gt; terms are evaluated from probabilities and precomputed sign tensors.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;X&lt;/code&gt; terms are evaluated by flipping the state axis and taking an inner product.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This is already a substantial low-level optimization Even with that help, TorchQuantum remains slower than TensorCircuit-NG on the JAX backend by about 15.7x. And even if you prefer PyTorch backend, PyTorch backend from TensorCircuit-NG is still a better choice in terms of both warm-up and run times.&lt;/p&gt;

&lt;h2&gt;
  
  
  Takeaway
&lt;/h2&gt;

&lt;p&gt;The lesson is not merely that one package is faster than another. The deeper point is that backend architecture matters. Quantum simulation benefits from a compiler that can optimize a whole differentiable tensor program, not just a collection of familiar machine-learning layers.&lt;/p&gt;

&lt;p&gt;For TensorCircuit-NG, the JAX backend gives exactly that: a high-level quantum programming interface backed by XLA’s aggressive compilation. The result is a backend that is not only elegant for research code, but also dramatically faster for real differentiable quantum simulation workloads.&lt;/p&gt;

</description>
      <category>jax</category>
      <category>pytorch</category>
    </item>
    <item>
      <title>TensorCircuit-NG: How to Tell Whether a Quantum x AI x HPC Platform Is Truly Mature When Everyone Tells the Same Story</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Thu, 04 Jun 2026 08:28:33 +0000</pubDate>
      <link>https://dev.to/refractionray/tensorcircuit-ng-how-to-tell-whether-a-quantum-x-ai-x-hpc-platform-is-truly-mature-when-everyone-577p</link>
      <guid>https://dev.to/refractionray/tensorcircuit-ng-how-to-tell-whether-a-quantum-x-ai-x-hpc-platform-is-truly-mature-when-everyone-577p</guid>
      <description>&lt;p&gt;In recent years, the convergence of quantum computing, artificial intelligence (AI), and high-performance computing (HPC) has become a central theme in the evolution of scientific computing infrastructure. From AI4Science and quantum machine learning to supercomputing centers and heterogeneous computing platforms, phrases such as "Quantum x AI x HPC", "integrated quantum-supercomputing-intelligence infrastructure", and "next-generation research infrastructure" now appear frequently in academic conferences, industry forums, and corporate presentations.&lt;/p&gt;

&lt;p&gt;At the same time, a clear pattern has emerged:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The messaging is becoming increasingly similar, while the actual technical depth of different products varies dramatically.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Whether the subject is a quantum software platform, an AI4Science infrastructure stack, or a heterogeneous computing framework, many projects now describe themselves in similar terms:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;integrating quantum computing with AI;&lt;/li&gt;
&lt;li&gt;supporting heterogeneous computing resources;&lt;/li&gt;
&lt;li&gt;serving as future research infrastructure;&lt;/li&gt;
&lt;li&gt;enabling applications in materials science, chemistry, biomedicine, and other industries;&lt;/li&gt;
&lt;li&gt;building an open ecosystem and developer community.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These directions are meaningful. In fact, they are becoming part of the field's shared consensus.&lt;/p&gt;

&lt;p&gt;The real question is different:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;When everyone is telling a similar story, how can we tell whether a platform has actually delivered technical substance, rather than remaining at the level of conceptual packaging and slideware?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;For scientific infrastructure, it is more useful to ask five verifiable questions than to focus on slogans:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Is it open source?&lt;/li&gt;
&lt;li&gt;Does it provide public benchmarks?&lt;/li&gt;
&lt;li&gt;Is it used continuously by high-quality research communities?&lt;/li&gt;
&lt;li&gt;Has it supported real industry-oriented application cases?&lt;/li&gt;
&lt;li&gt;Does it continue to evolve through sustained version updates?&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Any platform that claims to be "next-generation research infrastructure" should be able to answer these questions in a concrete way.&lt;/p&gt;

&lt;p&gt;The development of &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng" rel="noopener noreferrer"&gt;TensorCircuit-NG&lt;/a&gt; offers a useful case study. Its value does not lie only in proposing a vision for "Quantum x AI x HPC"; it lies in a body of work that can be inspected, reproduced, cited, extended, and tested over time: open code, reproducible performance evaluations, a visible record of academic adoption, evidence of industry spillover, and six years of engineering iteration.&lt;/p&gt;

&lt;h2&gt;
  
  
  1. Is It Open Source?
&lt;/h2&gt;

&lt;p&gt;For scientific software, open source means more than publishing code.&lt;/p&gt;

&lt;p&gt;It means that:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the technology can be independently verified;&lt;/li&gt;
&lt;li&gt;performance claims can be reproduced;&lt;/li&gt;
&lt;li&gt;algorithms can be inspected;&lt;/li&gt;
&lt;li&gt;users can deploy the software without relying on a closed service;&lt;/li&gt;
&lt;li&gt;third-party researchers can repeat experiments under their own conditions.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Research communities do not lack polished presentations. What is much rarer is a technical system that can survive independent inspection.&lt;/p&gt;

&lt;p&gt;TensorCircuit was not released as a one-off code dump. Its development forms a traceable engineering trajectory: from the original personal open-source version, to the version developed during the Tencent Quantum Lab period, and then to the currently maintained TensorCircuit-NG project. Across these stages, the core code, documentation, tests, and examples have remained open. The GitHub history preserves the development record, with more than 500 combined stars and forks, over 2,700 commits, more than 30 released versions, and contributions from over 30 developers around the world.&lt;/p&gt;

&lt;p&gt;In terms of engineering scale, TensorCircuit-NG is no longer a short-term proof-of-concept project. It is a platform-level scientific computing system, with roughly 70,000 lines of code, type annotations, unit tests, continuous integration, documentation, and tutorials. The repository currently contains close to one thousand test functions. These tests are not merely a coverage metric; they are part of the engineering foundation that keeps APIs stable, backend behavior consistent, and long-term maintenance manageable.&lt;/p&gt;

&lt;p&gt;The surrounding ecosystem matters as well. TensorCircuit-NG provides documentation, more than 30 tutorial examples, over 170 application examples, more than 10 benchmark suites, and a companion quantum computing tutorial. Together, these resources form a developer ecosystem that is learnable, reusable, and extensible. The platform also embraces AI-native workflows by providing AI skill packages for paper reproduction, code translation, and performance optimization. This means TensorCircuit-NG is not only designed for human developers; it is also adapting to a new mode of scientific software development in which AI agents participate directly in research workflows.&lt;/p&gt;

&lt;p&gt;Another measurable signal of open-source adoption is installation and use. TensorCircuit-related packages include &lt;code&gt;tensorcircuit&lt;/code&gt;, &lt;code&gt;tensorcircuit-ng&lt;/code&gt;, and the nightly package &lt;code&gt;tensorcircuit-nightly&lt;/code&gt; on PyPI, with cumulative &lt;code&gt;pip install&lt;/code&gt; downloads exceeding one million. Download counts alone do not prove scientific value, but they do show that the platform exists in real development environments, not only in papers or promotional pages.&lt;/p&gt;

&lt;p&gt;For research infrastructure, credibility comes from the ability of third-party users to run the code, inspect the implementation, reproduce experiments, and build their own workflows. Code is always more honest than marketing material.&lt;/p&gt;

&lt;h2&gt;
  
  
  2. Are There Public Benchmarks?
&lt;/h2&gt;

&lt;p&gt;Every computing platform eventually has to answer a simple question:&lt;/p&gt;

&lt;p&gt;Does it actually improve computational efficiency?&lt;/p&gt;

&lt;p&gt;This is why public benchmarking is essential for judging platform maturity. In a high-performance setting such as "Quantum x AI x HPC", claims about acceleration, heterogeneous execution, or scalability are difficult to evaluate without reproducible benchmarks.&lt;/p&gt;

&lt;p&gt;One of the earliest reasons TensorCircuit attracted attention was its benchmark system for differentiable quantum computing and tensor-network simulation. The first TensorCircuit white paper was published in &lt;em&gt;Quantum&lt;/em&gt;: &lt;a href="https://quantum-journal.org/papers/q-2023-02-02-912/" rel="noopener noreferrer"&gt;TensorCircuit: a Quantum Software Framework for the NISQ Era&lt;/a&gt;. The paper introduced the platform architecture, core functionality, and performance advantages, and compared TensorCircuit against several mainstream quantum software frameworks on variational quantum algorithms, gradient computation, and quantum circuit simulation.&lt;/p&gt;

&lt;p&gt;The work made clear why unified tensor programming, automatic differentiation, and just-in-time compilation matter for quantum computing workflows. In several variational quantum algorithm and gradient computation tasks, TensorCircuit demonstrated significant performance advantages over representative frameworks such as IBM's Qiskit and PennyLane, with speedups reaching multiple orders of magnitude in some cases. More importantly, these results were not confined to figures in a paper: the code, experimental setup, and evaluation procedures were made reproducible. That is the difference between a verifiable technical path and an unverifiable performance claim.&lt;/p&gt;

&lt;p&gt;With the release of TensorCircuit-NG, the benchmark scope has expanded toward problems closer to future research infrastructure:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;GPU-accelerated computing;&lt;/li&gt;
&lt;li&gt;optimized tensor-network contraction;&lt;/li&gt;
&lt;li&gt;distributed HPC environments;&lt;/li&gt;
&lt;li&gt;unified computation graphs spanning quantum circuits, neural networks, and tensor networks.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The NG white paper further summarizes TensorCircuit's upgrade toward the integration of quantum computing, supercomputing, and intelligent computing; see the &lt;a href="https://arxiv.org/abs/2602.14167" rel="noopener noreferrer"&gt;preprint&lt;/a&gt;. The focus has shifted from "how to simulate quantum circuits faster on a single machine" to "how to organize quantum, AI, and numerical computing workflows in realistic heterogeneous research environments."&lt;/p&gt;

&lt;p&gt;External evaluation provides another layer of evidence. NVIDIA used TensorCircuit as a third-party quantum software case in its cuQuantum 23.10 benchmarking context. This shows that TensorCircuit has entered the evaluation landscape of hardware and high-performance computing vendors. For scientific infrastructure, such external benchmarks complement open papers and are more persuasive than slide-based claims.&lt;/p&gt;

&lt;h2&gt;
  
  
  3. Is It Used by the Research Community?
&lt;/h2&gt;

&lt;p&gt;For scientific infrastructure, the hardest signal to fake is not performance.&lt;/p&gt;

&lt;p&gt;It is sustained use by serious research communities.&lt;/p&gt;

&lt;p&gt;A platform can gain short-term attention through marketing, but it cannot gain long-term citations through marketing alone. Research adoption is a form of long-horizon voting. If a platform continues to support high-quality work across institutions, research areas, and teams, then it has demonstrated real utility.&lt;/p&gt;

&lt;p&gt;More than 170 academic works have cited TensorCircuit, and in the first five months of 2026 alone, more than 40 works have already cited it. More importantly, these works are not concentrated in a single niche. They span quantum simulation, quantum machine learning, quantum chemistry, quantum sensing, quantum architecture search, and AI4Science.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantum Simulation and Many-Body Systems
&lt;/h3&gt;

&lt;p&gt;In many-body quantum physics, condensed matter systems, and complex quantum dynamics, researchers often need large-scale quantum circuit simulation, tensor-network contraction, and differentiable optimization. These tasks place high demands on performance, numerical stability, and automatic differentiation.&lt;/p&gt;

&lt;p&gt;Representative works include &lt;a href="https://quantum-journal.org/papers/q-2024-07-23-1422/" rel="noopener noreferrer"&gt;Zero and Finite Temperature Quantum Simulations Powered by Quantum Magic&lt;/a&gt; from teams including NVIDIA, Google, MIT, and Harvard; &lt;a href="https://arxiv.org/abs/2501.04679/" rel="noopener noreferrer"&gt;Exploring nontrivial topology at quantum criticality in a superconducting processor&lt;/a&gt; from Haohua Wang's group at Zhejiang University; and &lt;a href="https://arxiv.org/abs/2409.07281" rel="noopener noreferrer"&gt;Variational LOCC-assisted quantum circuits for long-range entangled states&lt;/a&gt; from Xiongfeng Ma's group at Tsinghua University. These papers show that TensorCircuit is not limited to abstract algorithm demonstrations; it is being used in concrete problems in many-body physics and experimental quantum information.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantum Machine Learning
&lt;/h3&gt;

&lt;p&gt;Quantum machine learning is one of the most active application areas for TensorCircuit. Representative papers include &lt;a href="https://www.nature.com/articles/s41467-024-45882-z" rel="noopener noreferrer"&gt;Understanding quantum machine learning also requires rethinking generalization&lt;/a&gt; from Jens Eisert's group at the Free University of Berlin, &lt;a href="https://www.nature.com/articles/s41467-024-53769-2" rel="noopener noreferrer"&gt;Dynamical transition in controllable quantum neural networks with large depth&lt;/a&gt; from teams including Liang Jiang and Junyu Liu, &lt;a href="https://journals.aps.org/prl/abstract/10.1103/PhysRevLett.132.100602" rel="noopener noreferrer"&gt;Generative Quantum Machine Learning via Denoising Diffusion Probabilistic Models&lt;/a&gt; from Quntao Zhuang's group at the University of Southern California, and IBM Quantum's &lt;a href="https://arxiv.org/abs/2411.05760" rel="noopener noreferrer"&gt;Dynamic parameterized quantum circuits: expressive and barren-plateau free&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;These works all require stable workflows connecting parameterized quantum circuits, gradient computation, model training, and numerical simulation. TensorCircuit's value is visible precisely at this workflow level: it connects quantum circuit simulation, automatic differentiation, and machine learning training into a unified programmable system.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantum Architecture Search and Algorithm Design
&lt;/h3&gt;

&lt;p&gt;TensorCircuit has also been used in algorithmic and learning-theoretic research. Examples include &lt;a href="https://journals.aps.org/prxquantum/abstract/10.1103/PRXQuantum.5.040306" rel="noopener noreferrer"&gt;Learning Quantum States and Unitaries of Bounded Gate Complexity&lt;/a&gt; from Caltech and Google, &lt;a href="https://ieeexplore.ieee.org/abstract/document/10821373" rel="noopener noreferrer"&gt;Quantum Machine Learning Architecture Search via Deep Reinforcement Learning&lt;/a&gt; from Brookhaven National Laboratory, and &lt;a href="https://journals.aps.org/pra/abstract/10.1103/PhysRevA.110.022403" rel="noopener noreferrer"&gt;Distributed quantum architecture search&lt;/a&gt; from Luzhou Li's group at Sun Yat-sen University.&lt;/p&gt;

&lt;p&gt;This class of work highlights the platform's infrastructure role. Researchers are not merely calling a fixed algorithm; they are building new search strategies, learning processes, and experimental protocols on top of TensorCircuit.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantum Chemistry and Fermionic Simulation
&lt;/h3&gt;

&lt;p&gt;The quantum chemistry ecosystem around TenCirChem further extends TensorCircuit's application boundary. Quantum chemistry and fermionic simulation typically require complex Hamiltonian construction, differentiable optimization, tensor-network representations, and high-performance simulation. They therefore provide a demanding test case for any scientific computing platform.&lt;/p&gt;

&lt;p&gt;Representative works include &lt;a href="https://journals.aps.org/prresearch/abstract/10.1103/PhysRevResearch.5.023046" rel="noopener noreferrer"&gt;Efficient quantum simulation of electron-phonon systems by variational basis state encoder&lt;/a&gt; from teams at Tsinghua University and The Chinese University of Hong Kong, Shenzhen, as well as &lt;a href="https://pubs.acs.org/doi/abs/10.1021/acs.jctc.4c00200" rel="noopener noreferrer"&gt;Fast Emulation of Fermionic Circuits with Matrix Product States&lt;/a&gt; from Garnet Chan's group at Caltech. These studies show that the TensorCircuit ecosystem has moved from general quantum circuit simulation into more specialized domains such as quantum chemistry.&lt;/p&gt;

&lt;h3&gt;
  
  
  Quantum Sensing and Imaging
&lt;/h3&gt;

&lt;p&gt;TensorCircuit has also been used in quantum sensing, imaging, and experiment-facing tasks. Examples include &lt;a href="https://www.nature.com/articles/s41534-024-00914-w" rel="noopener noreferrer"&gt;End-to-end variational quantum sensing&lt;/a&gt; from Roger Melko's group at the Perimeter Institute, and &lt;a href="https://www.nature.com/articles/s42005-023-01290-1" rel="noopener noreferrer"&gt;Practical advantage of quantum machine learning in ghost imaging&lt;/a&gt; from Guihua Zeng's group at Shanghai Jiao Tong University. These works illustrate the platform's potential in quantum sensing and measurement-related applications.&lt;/p&gt;

&lt;p&gt;The value of a research platform is not captured by a single paper. It is reflected in its ability to support many research directions over time. More than 170 citing works, users across high-level institutions, and multiple examples in leading journals and conferences form an evidence chain that is stronger than any single promotional claim.&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Has It Supported Industry-Oriented Applications?
&lt;/h2&gt;

&lt;p&gt;Academic citations show whether a platform can support research. Industry-oriented application cases show whether it can move toward real-world problems.&lt;/p&gt;

&lt;p&gt;It is important to be precise here. Quantum computing is still exploratory in many industrial contexts, so the right question is not whether it has already replaced classical solutions at scale. The better question is whether researchers and engineering teams in different fields have used the platform to build prototypes, workflows, and validation pipelines for real problem domains. From this perspective, TensorCircuit's application spillover already reaches multiple sectors.&lt;/p&gt;

&lt;p&gt;In agricultural diagnostics, researchers have used a quantum vision transformer for tomato leaf disease detection; see &lt;a href="https://eej.aut.ac.ir/article_5597.html" rel="noopener noreferrer"&gt;Enhancing Agricultural Diagnostics: Tomato Leaf Disease Detection Using Quantum Vision Transformer&lt;/a&gt;. In neuroscience and medical imaging, related works include &lt;a href="https://www.mdpi.com/2076-3425/14/4/401" rel="noopener noreferrer"&gt;Predicting Brain Age and Gender from Brain Volume Data Using Variational Quantum Circuits&lt;/a&gt; and &lt;a href="https://ieeexplore.ieee.org/abstract/document/10821329" rel="noopener noreferrer"&gt;Expanding the Horizon: Enabling Hybrid Quantum Transfer Learning for Long-Tailed Chest X-Ray Classification&lt;/a&gt;. In drug discovery, &lt;a href="https://www.nature.com/articles/s41598-024-67897-8" rel="noopener noreferrer"&gt;A hybrid quantum computing pipeline for real world drug discovery&lt;/a&gt; explores a hybrid quantum computing workflow for real drug discovery problems.&lt;/p&gt;

&lt;p&gt;TensorCircuit-NG has also appeared in security, communications, optimization, and computing systems. In software security, researchers have proposed lightweight quantum convolutional neural networks for malicious code detection. In drone and radar applications, hybrid quantum neural networks have been explored for radar return signal processing. In edge computing, quantum reinforcement learning has been used for joint resource allocation and task offloading. In finance, improved QAOA methods based on conditional value-at-risk have been studied for portfolio optimization. The significance of these cases is that they move quantum software frameworks from "quantum algorithm papers" into concrete domains such as agriculture, medicine, security, communications, finance, and drug discovery.&lt;/p&gt;

&lt;p&gt;External recognition provides additional context for the ecosystem. TensorCircuit has appeared in PhotonBox's 2022 list of influential quantum industry events in China, was listed as a recommended quantum software project in Google Summer of Code 2023, was used by NVIDIA in cuQuantum evaluation materials, was invited to participate in UnitaryHack 2024, and participated in Open Source Promotion Plan 2025. These forms of recognition do not replace technical validation, but they do show that TensorCircuit is not an isolated lab project. It has entered the public view of the open-source quantum software and high-performance computing ecosystems.&lt;/p&gt;

&lt;p&gt;Industrial maturity does not happen overnight. It typically moves from research prototypes, to open tools, to cross-domain collaboration, to engineering validation, and eventually to deployment. TensorCircuit-NG's current value lies in providing a reusable low-level toolchain for that process.&lt;/p&gt;

&lt;h2&gt;
  
  
  5. Does It Continue to Evolve?
&lt;/h2&gt;

&lt;p&gt;One defining feature of scientific infrastructure is that it is never finished.&lt;/p&gt;

&lt;p&gt;New hardware appears. New algorithms appear. New scientific demands appear. This makes sustained iteration more important than a single innovation.&lt;/p&gt;

&lt;p&gt;TensorCircuit's history is a good example. The project was first released in April 2020. From 2020 to 2021, TensorCircuit completed its core architecture, automatic differentiation mechanism, and early quantum algorithm modules, establishing the academic foundation for a unified tensor-computing framework. From 2021 to 2024, under the Apache License 2.0, the project continued to evolve in engineering: performance optimization, interface standardization, multi-backend support, and community ecosystem development gradually turned it into an open-source platform for global research users and developers.&lt;/p&gt;

&lt;p&gt;Since the launch of TensorCircuit-NG, or "Next Generation", in 2024, the project has moved beyond a quantum computing software framework toward a broader next-generation research infrastructure. It explores deeper integration among quantum computing, supercomputing, and intelligent computing, while continuing to expand its ecosystem in AI4Science and related areas.&lt;/p&gt;

&lt;p&gt;Sustained iteration is also visible in upstream and downstream ecosystem contributions. Upstream, core developers have contributed to standard machine learning frameworks such as TensorFlow, including work related to the automatic differentiation formula for complex-valued singular value decomposition and fixes to vectorized matrix multiplication. In the tensor-network ecosystem, TensorNetwork-NG continues to maintain the original Google TensorNetwork framework and keep it usable. Downstream, TenCirChem extends TensorCircuit capabilities into quantum computational chemistry workflows.&lt;/p&gt;

&lt;p&gt;These upstream and downstream contributions show that TensorCircuit-NG does not confine itself to a single framework. Instead, it builds connections among machine learning, tensor networks, quantum chemistry, and high-performance computing. This matters for Quantum x AI x HPC integration, because future research infrastructure cannot serve only one model family, one hardware type, or one class of algorithms.&lt;/p&gt;

&lt;p&gt;In the TC-NG architecture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;quantum circuits;&lt;/li&gt;
&lt;li&gt;neural networks;&lt;/li&gt;
&lt;li&gt;tensor networks;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are brought into a unified computation-graph system.&lt;/p&gt;

&lt;p&gt;At the same time:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;CPUs;&lt;/li&gt;
&lt;li&gt;GPUs;&lt;/li&gt;
&lt;li&gt;HPC clusters;&lt;/li&gt;
&lt;li&gt;QPUs;&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;are becoming part of a unified resource pool.&lt;/p&gt;

&lt;p&gt;This marks a shift in platform positioning: from a quantum software framework to infrastructure for future scientific computing. Compared with projects that remain at the stage of concept demonstrations, short-term packaging, or slide-based roadmaps, more than six years of open-source development, continuous iteration, and repeated research-community validation say much more about a platform's real engineering capacity and long-term value.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion: What Builds Trust in Scientific Infrastructure?
&lt;/h2&gt;

&lt;p&gt;In the rapid development of Quantum x AI x HPC, industry narratives are converging.&lt;/p&gt;

&lt;p&gt;More and more platforms now talk about:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;AI4Science;&lt;/li&gt;
&lt;li&gt;hybrid quantum-classical computing;&lt;/li&gt;
&lt;li&gt;scientific research infrastructure.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These directions are worth pursuing. But for users, researchers, and industry partners, the core criteria have not changed:&lt;/p&gt;

&lt;p&gt;Is the platform fully open source?&lt;/p&gt;

&lt;p&gt;Does it provide public benchmarks?&lt;/p&gt;

&lt;p&gt;Is it broadly and continuously used by high-quality research communities?&lt;/p&gt;

&lt;p&gt;Has it supported cross-industry application cases?&lt;/p&gt;

&lt;p&gt;Does it continue to evolve through sustained version updates?&lt;/p&gt;

&lt;p&gt;Once these questions are answered one by one, the value of a platform does not need to depend on slogans or conceptual messaging. For scientific infrastructure, long-term trust is built on verifiable code, reproducible experiments, growing academic adoption, application spillover into real problems, and engineering iteration that stands the test of time. In an era where technical narratives increasingly sound alike, these qualities are especially valuable.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>quantum</category>
      <category>hpc</category>
    </item>
    <item>
      <title>TensorCircuit-NG: Quantum Software On AI, For AI, With AI</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Wed, 27 May 2026 11:02:16 +0000</pubDate>
      <link>https://dev.to/refractionray/tensorcircuit-ng-quantum-software-on-ai-for-ai-with-ai-3mae</link>
      <guid>https://dev.to/refractionray/tensorcircuit-ng-quantum-software-on-ai-for-ai-with-ai-3mae</guid>
      <description>&lt;p&gt;Quantum computing and artificial intelligence are often discussed as two separate frontiers. One is about exploiting quantum mechanics for computation; the other is about building increasingly capable learning systems and agents. The core argument behind TensorCircuit-NG is that this separation is becoming less and less meaningful. If modern AI infrastructure has already solved core problems around automatic differentiation, compilation, accelerator execution, batching, and distributed training, then quantum software should stop reinventing those layers badly and start standing on top of them directly.&lt;/p&gt;

&lt;p&gt;This is the central idea behind TensorCircuit-NG. The project is a quantum software stack built in the age of AI, aimed at AI-facing workloads, and increasingly shaped for collaboration with AI agents. Its vision is simple: quantum software &lt;em&gt;on AI, for AI, with AI&lt;/em&gt;.&lt;/p&gt;

&lt;h2&gt;
  
  
  On AI: quantum software should inherit the AI stack
&lt;/h2&gt;

&lt;p&gt;Quantum software has long been held back by two familiar problems. Too much of the workload remains trapped in Python-level control flow or in classical state-vector simulation patterns that scale poorly. At the same time, many quantum libraries sit outside the deep learning ecosystems where most of the tooling innovation has happened. JAX, PyTorch, and TensorFlow already have mature answers to questions like compilation, vectorization, accelerator placement, and distributed execution, yet quantum software has often kept those capabilities at the edge of the stack.&lt;/p&gt;

&lt;p&gt;TensorCircuit-NG takes a different route. The framework treats quantum circuits as specialized tensor operations. That design choice opens up a large part of the AI toolchain almost “for free.” Automatic differentiation maps naturally onto variational quantum algorithms. Just-in-time compilation matters for repeated circuit evaluation. Vectorized mapping matters for batching over parameters, measurements, trajectories, or datasets. Accelerator support, mixed precision, and distributed execution are part of the design from the beginning.&lt;/p&gt;

&lt;p&gt;That philosophy shows up in the architecture. TensorCircuit-NG is built around a tensor-first worldview: every object is either a tensor or a network of tensors. Once that is the primitive, different computational models become easier to compose inside one workflow. Gate-based circuits, tensor networks, neural models, noisy simulators, analog evolution, approximate methods, and symbolic representations can live inside one coherent environment.&lt;/p&gt;

&lt;p&gt;The performance story follows directly from this design. TensorCircuit-NG supports both data parallelism and model parallelism across multiple devices and multiple hosts. In practice that means distribution over inputs, measurements, or noisy trajectories when the workload is embarrassingly parallel, and distribution over tensor-network slices when the contraction itself needs to be split across hardware. Benchmarks on both single-GPU and multi-GPU systems show that high-level Python APIs can still deliver high performance when the compilation and tensor-network substrate are done well.In representative workloads, that performance has reached speedups of several orders of magnitude over mainstream stacks such as IBM's Qiskit and Google's TensorFlow Quantum. &lt;/p&gt;

&lt;p&gt;TensorCircuit-NG acts as a bridge among quantum computing, high-performance computing, and intelligent computing. It also serves as an interface layer where quantum models can coexist with the rest of modern computational science. Researchers who want to embed quantum layers inside larger machine learning systems should be able to do so inside the same workflow, without crossing ecosystem boundaries every time the problem gets interesting.&lt;/p&gt;

&lt;h2&gt;
  
  
  For AI: a platform for fast quantum machine learning
&lt;/h2&gt;

&lt;p&gt;This is where the infrastructure becomes immediately useful. Quantum machine learning sits right at the intersection of circuit design, optimization, data pipelines, and repeated simulation. It is a workload that punishes slow software. If researchers want to try new ansatzes, change encodings, run ablations, train over many seeds, or sweep hyperparameters, then fast prototyping and efficient simulation matter more than slogans about QML.&lt;/p&gt;

&lt;p&gt;TensorCircuit-NG provides a strong platform for exactly this kind of work. Differentiable circuits, JIT compilation, batching, accelerator support, and distributed execution all live inside one environment. That makes it much easier to move from an idea for a QML model to a runnable prototype, and from a prototype to a meaningful simulation campaign.&lt;/p&gt;

&lt;p&gt;The scientific motivation for QML also becomes clearer in this setting. Attention shifts away from isolated benchmark wins and toward how quantum models behave on problems that already hurt classical AI. In our own work, this has already led to two systematic studies: one on bad data, and one on changing data.&lt;/p&gt;

&lt;p&gt;The first studies robustness. When labels are noisy, data is poisoned, or part of the training set later needs to be removed, quantum models may show a more favorable degradation profile and may be easier to unlearn. The second studies plasticity. In continual-learning settings, quantum models may preserve the ability to absorb new tasks for longer instead of becoming rigid.&lt;/p&gt;

&lt;p&gt;These are still open research questions. For a software project, though, the main point is straightforward: if people want to explore QML seriously, they need a platform that makes rapid iteration cheap. TensorCircuit-NG is meant to be that platform. It gives researchers a practical environment for fast QML prototyping, efficient simulation, and large-scale testing of ideas about robustness, unlearning, and adaptation.&lt;/p&gt;

&lt;h2&gt;
  
  
  With AI: a platform for agent-driven research
&lt;/h2&gt;

&lt;p&gt;The same logic carries over to AI agents. Once a scientific software stack is fast, structured, and composable, it becomes a natural substrate for agent-driven development. Agents are useful only when they can read real code, run real tools, inspect results, and keep iterating inside a live repository. That makes software design itself part of the agent story.&lt;/p&gt;

&lt;p&gt;TensorCircuit-NG is built with that use case in mind. The APIs are relatively concise, the examples and tests provide dense reference material, and the repository includes explicit rules and task-specific workflows for AI assistants. This lowers the cost of turning natural-language intent into runnable code, benchmarks, figures, and documentation.&lt;/p&gt;

&lt;p&gt;The project also ships built-in skills that push this further:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;arxiv-reproduce&lt;/code&gt;, which turns a paper identifier into a reproduction workflow;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;performance-optimize&lt;/code&gt;, which injects optimization patterns such as &lt;code&gt;scan&lt;/code&gt;, &lt;code&gt;jit&lt;/code&gt;, &lt;code&gt;vmap&lt;/code&gt;, and contraction tuning;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tc-rosetta&lt;/code&gt;, which translates code from other quantum frameworks with attention to intent rather than syntax alone;&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;tutorial-crafter&lt;/code&gt;, which converts programs into polished narrative tutorials.&lt;/li&gt;
&lt;li&gt;and many more.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Taken together, these tools make the framework a software platform where researchers can move from idea to prototype, from prototype to benchmark, and from benchmark to documentation with much less friction. That is the practical meaning of “with AI” here: TensorCircuit-NG is designed to work well with agents as a real development interface, not just as a chatbot wrapped around the codebase.&lt;/p&gt;

&lt;h2&gt;
  
  
  The deeper claim
&lt;/h2&gt;

&lt;p&gt;Taken together, these ideas add up to a stack-level thesis about the future of computational research.&lt;/p&gt;

&lt;p&gt;First, quantum software should no longer be architected as an isolated niche. It should inherit the best ideas from the AI and HPC worlds and expose them through abstractions that remain mathematically faithful to quantum workloads.&lt;/p&gt;

&lt;p&gt;Second, that same software stack should provide a strong platform for fast QML prototyping and efficient simulation, so ideas about robustness, unlearning, and continual adaptation can be tested quickly at scale.&lt;/p&gt;

&lt;p&gt;Third, the arrival of capable software agents changes the design target for scientific frameworks. A good framework now has to work well for skilled humans and also be understandable, navigable, and productively extensible for agents operating over the entire repository and toolchain.&lt;/p&gt;

&lt;p&gt;This is how TensorCircuit-NG understands itself: quantum software on AI, for AI, and with AI. It is built on the modern AI execution model, aimed at AI-relevant scientific questions, and increasingly shaped to participate in agent-mediated research workflows.&lt;/p&gt;

&lt;h2&gt;
  
  
  Getting started
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;tensorcircuit-ng
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;An agent-first workflow also works well: ask your coding agent to install &lt;code&gt;tensorcircuit-ng&lt;/code&gt; and start building a small quantum application from natural-language instructions.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>quantum</category>
    </item>
    <item>
      <title>Next-Generation Software: From by-AI to within-AI</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Thu, 21 May 2026 09:11:58 +0000</pubDate>
      <link>https://dev.to/refractionray/next-generation-software-from-by-ai-to-within-ai-e5f</link>
      <guid>https://dev.to/refractionray/next-generation-software-from-by-ai-to-within-ai-e5f</guid>
      <description>&lt;p&gt;&lt;em&gt;Why "using AI to build more apps, faster" may just be putting an engine on a horse carriage&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Over the past year, most discussions about AI and software have stayed within a very intuitive picture: a human describes a requirement, an AI agent writes the code, and after the code is written the software is still traditional software, only produced faster. Vibe coding, in essence, lowers the marginal cost of software production. What used to require engineers to implement line by line becomes an iterative process of generation, debugging, and refactoring driven by natural language.&lt;/p&gt;

&lt;p&gt;This view is correct, but only half-correct.&lt;/p&gt;

&lt;p&gt;The bigger change is not whether AI can generate software. The bigger change is whether the word "software" itself is about to change. Using AI to generate yet another standalone app often feels like putting an engine on a horse carriage: the power system has changed, but the form factor still belongs to the previous era. You still have a frontend, backend, account system, deployment, database, permissions, logs, subscriptions, settings pages. These things can now be generated faster, but they are still the same old shape.&lt;/p&gt;

&lt;p&gt;But if the engine is already powerful enough, perhaps we should stop optimizing the carriage. The real next step is this: software is no longer merely written by AI agents. It starts to exist within AI agents.&lt;/p&gt;

&lt;p&gt;Next-generation software does not necessarily have to be a complete app, a website, a SaaS product, a desktop client, or even a CLI with a fixed entry point. It may be a collection of prompts, skills, scripts, schemas, local files, cache conventions, tool permissions, and agent-facing instructions. The real runtime is not a specialized intelligent application. It is a general agent such as Codex or Claude Code. In other words, the harness is becoming the software.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://github.com/refraction-ray/everyday-arxiv" rel="noopener noreferrer"&gt;Everyday ArXiv&lt;/a&gt; is a concrete example of this insight.&lt;/p&gt;

&lt;h2&gt;
  
  
  Research Software Without A Traditional Form
&lt;/h2&gt;

&lt;p&gt;Everyday ArXiv is a daily intelligent arXiv processing assistant. The traditional way to build it would be straightforward: write a backend service, connect to the arXiv API, implement a recommendation algorithm, add a user system, build a web interface or email push system, and embed LLM calls at specific nodes such as summarization, scoring, recommendation explanations, and email drafts. In the end, it would become a specialized agent or SaaS product for researchers.&lt;/p&gt;

&lt;p&gt;There is nothing wrong with this path. Many products will continue to be built this way. The problem is that, in this specific setting, the most valuable part is not the UI, not the database, and not a fixed pipeline. The most valuable part is the judgment made during each reading session.&lt;/p&gt;

&lt;p&gt;Why is this paper worth reading today? Which of the user's previous papers is it actually connected to? Is the overlap merely keyword-level, or is there a real methodological connection? Is a proposed idea once again falling into the mediocre pattern of "add noise, change the model, run larger numerics"? If a new paper does not cite the user's work, is it an obvious omission, a weak connection, or only conceptually adjacent?&lt;/p&gt;

&lt;p&gt;These questions are hard to compress into fixed software features. They look more like the judgment process of a research assistant. So the architecture of this project is inverted. Python code handles only deterministic tasks: fetching arXiv metadata, parsing Google Scholar, writing stable JSON caches, loading configuration, and maintaining local file boundaries. Judgment-heavy tasks are not hardcoded inside the application. They are delegated to a general agent. The repository provides the agent with a workspace, skills, profiles, prompts, scripts, and privacy rules.&lt;/p&gt;

&lt;p&gt;In other words, this is not "software with LLM features." It is software that an LLM agent can directly run.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why This Is Not "Everyone Writes A Custom App"
&lt;/h2&gt;

&lt;p&gt;A common prediction is that AI lowers the cost of software production, so everyone will write many small custom apps for themselves. I think this is only half right.&lt;/p&gt;

&lt;p&gt;What may actually happen is not that everyone has a pile of custom apps, but that many custom apps never exist in app form at all.&lt;/p&gt;

&lt;p&gt;Once general agents are powerful enough, many "software" systems do not need to be compiled into standalone products. They can remain open-form: a few Skills, a few scripts, a directory convention, a profile, and some examples. Their functionality unfolds at runtime through the agent. They have no fixed buttons, but clear protocols; no complete backend, but stable tools; no page, but Markdown or HTML reports; no embedded intelligence module, but access to the general intelligence of the agent.&lt;/p&gt;

&lt;p&gt;This goes beyond vibe coding. Vibe coding still assumes the goal is to generate a software product. Agent-native software tries to avoid prematurely generating software in the old form. It asks: does this thing really need to be productized, or does it only need to be agentized?&lt;/p&gt;

&lt;h2&gt;
  
  
  The Structural Tax Of Vibe Coding
&lt;/h2&gt;

&lt;p&gt;The appeal of vibe coding is that, for the first time, building software feels cheap. You can ask an agent to generate a full-stack repo with React pages, API routes, database schemas, Dockerfiles, auth, deployment notes, and a README.&lt;/p&gt;

&lt;p&gt;But this reveals another problem: the faster AI generates software, the more visible the structural tax of the old software form becomes.&lt;/p&gt;

&lt;p&gt;Standalone apps carry many default taxes. There is a UI tax, because every capability must be turned into buttons, forms, and pages. There is a deployment tax, because every capability needs its own runtime environment. There is an integration tax, because every new app has to reconnect to data sources, permissions, and user state. There is a maintenance tax, because dependencies drift, frameworks upgrade, and deployments break. There is also a product-shape tax, because many open-ended judgment processes must be compressed into fixed features, losing flexibility and customization.&lt;/p&gt;

&lt;p&gt;When the task is essentially "run a high-judgment workflow in a specific context," these taxes become heavy.&lt;/p&gt;

&lt;p&gt;If Everyday ArXiv were built as traditional software, it would be forced to invent many things that are not its core value: recommendation pages, profile editors, PDF readers, email draft editors, background jobs, account systems, synchronization state. Of course these can be built. But they are not the core of "read arXiv and make research judgments." The core is to put the user's profile, today's papers, paper full texts, historical preferences, and research taste into the same reasoning loop.&lt;/p&gt;

&lt;p&gt;If a general agent can already read files, run commands, call tools, edit Markdown, maintain local state, and follow project rules, many standalone app shells start to look unnecessary.&lt;/p&gt;

&lt;p&gt;This is why "AI helps me write an app faster" may only be a transitional form. It optimizes the speed of software production, not the shape of software itself.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Compilation Target Of Software Has Changed
&lt;/h2&gt;

&lt;p&gt;Traditional software compiles to machines: CPUs, browsers, mobile devices, cloud services. Even SaaS ultimately compiles into deterministic behavior on some fixed runtime.&lt;/p&gt;

&lt;p&gt;Agent-native software does not compile only to machines. It compiles to agents.&lt;/p&gt;

&lt;p&gt;This sounds strange, but it is the key point. A Skill is not merely documentation. It is closer to a runtime definition: when an agent encounters a certain kind of task, which files should it read, which scripts should it call, which boundaries should it respect, how should it handle failure, when should it stop, what output format should it use, which judgments must not be hardcoded, and which data must not be committed to Git.&lt;/p&gt;

&lt;p&gt;In Everyday ArXiv, the Python package under &lt;code&gt;src/&lt;/code&gt; is the deterministic kernel. &lt;code&gt;.agents/skills/arxiv-daily/SKILL.md&lt;/code&gt; is the workflow definition. &lt;code&gt;user_profile/&lt;/code&gt; is user-space memory. &lt;code&gt;agents.md&lt;/code&gt; is the runtime specification. &lt;code&gt;data/raw/arxiv&lt;/code&gt; and &lt;code&gt;data/reports&lt;/code&gt; are the persistence layer. Codex or Claude Code is the execution environment and runtime.&lt;/p&gt;

&lt;p&gt;From the perspective of traditional software, &lt;code&gt;src/&lt;/code&gt; is the software, and everything else is documentation or data.&lt;/p&gt;

&lt;p&gt;In agent-native software, this boundary is inverted. &lt;code&gt;src/&lt;/code&gt; is only the tool layer. The real software behavior emerges from the tool layer, Skill instructions, user profiles, cache formats, report conventions, privacy boundaries, and the general reasoning ability of the agent.&lt;/p&gt;

&lt;p&gt;This is the architectural inversion: infrastructure moves downward from each standalone application into the agent platform. The application itself becomes a lightweight, injectable, modifiable, and portable capability layer.&lt;/p&gt;

&lt;p&gt;The shift can be summarized as follows:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Dimension&lt;/th&gt;
&lt;th&gt;Old Paradigm: Software by AI&lt;/th&gt;
&lt;th&gt;New Paradigm: Software within AI&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;Architecture&lt;/td&gt;
&lt;td&gt;AI generates a custom full-stack repo: frontend framework, backend API, database, and hosting layer included.&lt;/td&gt;
&lt;td&gt;Lightweight skills, structured manifests, execution scripts, and local directory conventions are injected into the agent as capabilities.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Infrastructure&lt;/td&gt;
&lt;td&gt;Each app has its own runtime, database, DevOps pipeline, permissions, and deployment environment.&lt;/td&gt;
&lt;td&gt;The app reuses the native environment of the host agent platform: filesystem, command line, browser, sandbox, tool calling, and context window.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Cost Model&lt;/td&gt;
&lt;td&gt;A standalone AI app must maintain a SaaS shell and pay the marginal API cost of each model call. Heavy usage quickly becomes expensive.&lt;/td&gt;
&lt;td&gt;An agent-native workflow lives inside general-agent subscriptions such as Claude Code or Codex, letting users share the subscription economics of model providers.&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;Flexibility&lt;/td&gt;
&lt;td&gt;Features are hardcoded into UI, backend, and schemas. New capabilities require code changes, redeployment, and redesigned entry points.&lt;/td&gt;
&lt;td&gt;The agent dynamically interprets Skills, reads and writes files, and calls scripts based on runtime intent, adapting to edge cases without rebuilding the product shape.&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;h2&gt;
  
  
  A Skill Is Not A Plugin. It Is A New Software Unit.
&lt;/h2&gt;

&lt;p&gt;We are used to thinking of plugins as accessories to a host program. Browser extensions depend on browsers. Editor extensions depend on editors.&lt;/p&gt;

&lt;p&gt;But Skills inside agents are closer to a new unit of software.&lt;/p&gt;

&lt;p&gt;They contain at least four layers.&lt;/p&gt;

&lt;p&gt;The first layer is deterministic tools. These are ordinary scripts, CLIs, parsers, fetchers, and formatters. They handle the parts that should not be left to an LLM's improvisation.&lt;/p&gt;

&lt;p&gt;The second layer is semantic policy. These are the instructions: what counts as a good recommendation, what counts as a mediocre idea, when to run a citation check, when not to pad the list to ten papers, and when to write only to local private files.&lt;/p&gt;

&lt;p&gt;The third layer is private state. User profiles, historical papers, negative preferences, idea logs, and local config are not merely "database records" in the traditional sense. They are personal context that the agent can read, interpret, update, and audit at runtime.&lt;/p&gt;

&lt;p&gt;The fourth layer is the execution substrate. This is what the agent platform provides: filesystem access, command execution, browsing, code understanding, long context, multi-tool coordination, and natural language interaction.&lt;/p&gt;

&lt;p&gt;A traditional app often packages all four layers into its own code and services. Agent-native software separates them: stable parts become scripts; judgment-heavy parts become Skills; personal parts remain in local files; execution is reused from a general agent.&lt;/p&gt;

&lt;p&gt;So it behaves more like a dynamically loaded driver than a complete machine. It does not need to spin up a company-sized software shell every time. It only needs to inject capability into an existing agent runtime.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Cost Advantage Is Not Just Form. It Is A Pricing-Layer Mismatch.
&lt;/h2&gt;

&lt;p&gt;Another key point is the cost difference between API calls and subscriptions. If you build a specialized agent yourself, every intelligent step calls a model API. If you use a subscription-based general agent such as Codex or Claude Code, many of these steps are absorbed by the platform. This difference is not merely "a little cheaper." It can be an order-of-magnitude architectural difference.&lt;/p&gt;

&lt;p&gt;Take Karpathy's LLM Wiki / agentic wiki idea as an example. At its core, it is a lightweight set of directory conventions, Markdown files, schemas, and agent instructions. Of course you can productize it: turn it into a standalone knowledge-base and note-taking app, add login, upload, search, sync, team workspaces, RAG pipelines, a polished UI, and then connect it to frontier-model APIs. At that point it becomes a standard AI SaaS product: every ingest, query, rewrite, and cross-reference burns your API bill.&lt;/p&gt;

&lt;p&gt;But the same idea does not have to be productized. You can put raw sources, wiki pages, and instructions in a local repo and let a general agent such as Claude Code maintain it directly. For the user, this is not "opening a new SaaS product." It is "running a workflow inside an agent subscription I already have." The workflow is lightweight enough that its main cost moves from API metering into the agent subscription.&lt;/p&gt;

&lt;p&gt;The gap can be huge. For heavy users, the equivalent API cost can easily be more than ten times higher than the subscription cost. Put differently: for the same frontier model capability, a standalone AI app must pay by the API meter, while an agent-native workflow living inside Claude Code or Codex may have its entire user-facing cost absorbed by the platform subscription, because the user already needs an AI subscription plan anyway.&lt;/p&gt;

&lt;p&gt;If frontier model providers can sustain this pricing structure, the consequences will be severe. General agent tools such as Codex will devour a large fraction of so-called intelligent software that merely connects large-model APIs to old software shells. Those products carry two layers of cost: the product-shape cost of traditional SaaS, and the usage-based API cost of frontier models. Agent-native workflows reuse a runtime that the agent platform has already subsidized, deployed, and sold to the user.&lt;/p&gt;

&lt;p&gt;So the cost advantage comes from two directions.&lt;/p&gt;

&lt;p&gt;First, the form is lighter. Standalone software is expensive not only because it runs servers, but because it must maintain a fixed product shape. That shape forces you to predefine user paths, feature boundaries, error handling, state synchronization, permission models, UI copy, and upgrade mechanisms. For high-frequency standardized tasks, this is worth it. For personalized, low-frequency, high-judgment tasks, it becomes a burden.&lt;/p&gt;

&lt;p&gt;Second, the billing layer is lower. API-wrapper software turns every intelligent action into its own marginal cost. Agent-native software tries to place intelligent action inside the general-agent runtime that the user already owns. The former is like putting an engine inside every small tool. The latter is like loading different tools onto a unified power system.&lt;/p&gt;

&lt;p&gt;The filesystem becomes state. Markdown becomes interface. JSONL becomes database. Skills become product logic. Python CLIs become reproducible tools. The agent becomes the interaction layer, reasoning layer, and glue code. The user does not need a complete app. The user needs a workspace that an agent can understand and operate.&lt;/p&gt;

&lt;p&gt;This does not mean engineering quality becomes unimportant. The opposite is true: engineering boundaries become more important. Deterministic tasks must live in code. Privacy boundaries must be protected by &lt;code&gt;.gitignore&lt;/code&gt; and file naming rules. Cache formats must be stable. Profile updates must be traceable. Reports must be reviewable. We simply no longer have to assume that all software value must be packaged into a fixed UI.&lt;/p&gt;

&lt;h2&gt;
  
  
  LLM OS Is Not A Metaphor
&lt;/h2&gt;

&lt;p&gt;This also explains why software within AI agents resonates with the idea of an LLM OS.&lt;/p&gt;

&lt;p&gt;If we think of the LLM as an operating system, the model itself is not the whole system. A real OS includes filesystems, permissions, processes, tool calls, environment variables, package management, history, working directories, user preferences, executable scripts, and application protocols. Agent platforms are reorganizing these pieces.&lt;/p&gt;

&lt;p&gt;From this perspective, a Skill is like an application. A prompt is like configuration and entry point. A script is like the executable behind a system call. &lt;code&gt;user_profile&lt;/code&gt; is like user-space data. &lt;code&gt;agents.md&lt;/code&gt; is like a software manual, permission model, and runtime specification. Cache directories are persistence. The agent is a mixture of shell, window manager, workflow engine, and interpreter.&lt;/p&gt;

&lt;p&gt;Traditional software runs on top of operating systems. Next-generation lightweight software runs inside the LLM OS.&lt;/p&gt;

&lt;p&gt;This does not mean all software disappears. High-frequency, multi-user, strongly consistent, permission-heavy, transaction-heavy systems will still need traditional software forms. Banking systems, collaborative editors, production databases, payment platforms, and medical systems cannot rely solely on an agent runtime.&lt;/p&gt;

&lt;p&gt;But a large amount of personalized, low-frequency, high-judgment software will be rewritten.&lt;/p&gt;

&lt;p&gt;Research reading assistants, personal knowledge systems, paper response tools, code review workflows, experiment records, document drafting, data cleaning, chart generation, long-term research projects, and idea management have historically been hard to turn into good software. Not because the need does not exist, but because every person's need is too specific, the market is too small, the shape is too fragmented, and fixed products quickly stop fitting.&lt;/p&gt;

&lt;p&gt;Agents change that economics.&lt;/p&gt;

&lt;h2&gt;
  
  
  This Example Generalizes Far Beyond
&lt;/h2&gt;

&lt;p&gt;Everyday ArXiv is just one example. The structure behind it generalizes to many scenarios that used to require being "turned into software."&lt;/p&gt;

&lt;p&gt;The first category is knowledge workflows. Today it is arXiv. Tomorrow it could be a paper library, technical blog library, investment research library, legal document library, or internal decision memo system. The traditional approach is to build a standalone application: dashboard, search box, favorites, summaries, recommendations, and RAG. The agent-native approach is looser: raw materials are files, indexes are scripts, workflows are Skills, user preferences are profiles, reports are Markdown. It is less like a product and more like a work environment that an agent can unfold at runtime.&lt;/p&gt;

&lt;p&gt;The second category is scientific computing and experiment management. A research project may need to manage models, parameters, run scripts, remote machines, result directories, logs, figures, and conclusions. Of course you can write an independent CLI with commands such as &lt;code&gt;submit&lt;/code&gt;, &lt;code&gt;status&lt;/code&gt;, &lt;code&gt;plot&lt;/code&gt;, and &lt;code&gt;report&lt;/code&gt;. This is still valuable, because deterministic low-level tasks need stable tools. But if the entire experiment-management process is compressed into a CLI, you lose a great deal of contextual judgment: when to rerun, which parameter combinations are worth extending, which anomaly may be a bug, which figure should enter the paper, and which result is already sufficient to stop.&lt;/p&gt;

&lt;p&gt;The more natural architecture is: keep low-level scripts deterministic, and use a set of Skills to specify how the agent should read experiment directories, submit jobs, record provenance, generate reports, and avoid overwriting results. The experiment system is not a closed tool. It is an agent-operable research workspace. Its flexibility is often stronger than that of an independent CLI, and its results are often better, because scientific experimentation is not a fixed sequence of commands. It is a process of continuous judgment, adjustment, and interpretation.&lt;/p&gt;

&lt;p&gt;The third category is existing Python software frameworks. In the past we would ask: should we wrap it in a GUI? Should we build a web app where users can select parameters, drag modules, and display results? But for many scientific computing, machine learning, and quantum simulation frameworks, the better interface may not be a GUI. It may be an agent.&lt;/p&gt;

&lt;p&gt;The framework itself provides strict APIs, types, tests, documentation, and examples. Agent-native adaptation lets the agent read the documentation, compose algorithms, write scripts, run demos, explain results, and generate figures directly. The user no longer has to learn every API before starting to explore. The user describes the goal in natural language, and the agent compiles that goal into framework code. This is not wrapping an old framework in a shell. It is connecting the framework to a natural-language programmable operating layer. &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng" rel="noopener noreferrer"&gt;TensorCircuit-NG&lt;/a&gt; represents this agent-native direction: the point is not to build another polished GUI, or a CLI that restricts functionality, but to make the framework itself a computational substrate that agents can understand, invoke, and extend.&lt;/p&gt;

&lt;p&gt;These examples point to the same conclusion: next-generation software does not necessarily turn every tool into a standalone product. It lets tools enter the fluid environment of agents. This form has one enormous advantage: fluidity.&lt;/p&gt;

&lt;p&gt;Traditional software is hard. It must be installed, deployed, upgraded, compiled, and released. Its features solidify into buttons and pages. If users want to change it, they usually have to file an issue, wait for developers, fork the repo, or edit code.&lt;/p&gt;

&lt;p&gt;Agent-native software is soft. It can be copied as a directory, changed into another set of Skills, locally rewritten by users through natural language, and migrated across agent platforms. It does not necessarily need compilation, a fixed UI, or versioned releases. Often, the software is simply a set of readable, editable, executable conventions.&lt;/p&gt;

&lt;p&gt;If the user really needs an interface, the agent can generate an HTML page on demand. Today it can be a minimal table. Tomorrow it can be a flashy dashboard. The day after tomorrow it can be a paper-style report page. The interface becomes a runtime artifact, not the fixed shell of the software.&lt;/p&gt;

&lt;p&gt;This may be the most counterintuitive part: software in the AI era may not increasingly look like "smarter apps." A lot of software may become less app-like, more like an amorphous fluid that agents can read, modify, compose, and temporarily materialize.&lt;/p&gt;

&lt;p&gt;This fluid form does not depend on a fixed UI. It can be executed by Codex, by Claude Code, or by future agents. As long as the agent is strong enough to read files, run commands, follow Skills, and maintain boundaries, it can run the software.&lt;/p&gt;

&lt;p&gt;Software portability changes accordingly. In the past, migrating software meant migrating applications and data. Now it means migrating workspace conventions. What you take with you is docs, skills, scripts, templates, profile schemas, and examples. More concretely: a folder. The execution runtime can change; the software remains.&lt;/p&gt;

&lt;h2&gt;
  
  
  Design Principles From This Project
&lt;/h2&gt;

&lt;p&gt;From Everyday ArXiv, we can extract several design principles.&lt;/p&gt;

&lt;p&gt;First, deterministic work belongs in code. Fetching, parsing, caching, schemas, configuration, paths, and format checks should be ordinary software engineering. Do not let an LLM "remember" where today's cache should go. Do not let it invent data structures at runtime every time.&lt;/p&gt;

&lt;p&gt;Second, judgment belongs in the agent. Recommendation, selection, close reading, research ideas, citation risk, and email tone are exactly where general agents are strong. Hardcoding them into fixed API pipelines sacrifices flexibility.&lt;/p&gt;

&lt;p&gt;Third, user profiles should be local files, not abstract preference buttons. Research interests, negative preferences, prior papers, and citation anchors are detailed and personal. The agent should be able to read, cite, update, and audit them directly.&lt;/p&gt;

&lt;p&gt;Fourth, Skills are the product core. They are not documentation attached to the product. They are the main execution logic of agent-native software. Traditional software has its core in code paths; agent-native software often has its core entry point in Skills.&lt;/p&gt;

&lt;p&gt;Fifth, the open-source boundary must be designed upfront. The public repository should store the general protocol. Private files should store the user. This allows the software to be reusable without leaking personal knowledge and workflows.&lt;/p&gt;

&lt;p&gt;Together, these principles define a new software form: not an application wrapped around LLM APIs, but a workspace growing around general agent platforms.&lt;/p&gt;

&lt;h2&gt;
  
  
  Closing
&lt;/h2&gt;

&lt;p&gt;AI agents first looked like programmers who could write code faster. Then they looked like assistants that could operate tools. Next, they may look more like general intelligent runtimes and operating systems.&lt;/p&gt;

&lt;p&gt;If this is true, part of next-generation software will no longer be understood as "applications." It will be agent-readable directories, prompts, Skills, scripts, profiles, and caches. It will have no fixed shape, but still run reliably. It will have no complete UI, but still complete complex work. It will not be generated once by AI and then left alone; it will continuously live inside AI agents.&lt;/p&gt;

&lt;p&gt;Everyday ArXiv is a small research tool, but it shows the early form of this direction: the intelligent part of software does not necessarily need to be packaged into a specialized agent. When general agents become strong enough, software can write itself as a harness for agents. I would even make a stronger claim: very few specialized agents will remain useful. Most will be swallowed by general agents, just as Sutton's bitter lesson would suggest.&lt;/p&gt;

&lt;p&gt;This may be the shift from software generated by AI to software existing within AI.&lt;/p&gt;

</description>
      <category>agents</category>
      <category>ai</category>
      <category>software</category>
    </item>
    <item>
      <title>Agentic R&amp;D Insights</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Thu, 09 Apr 2026 05:15:02 +0000</pubDate>
      <link>https://dev.to/refractionray/agentic-rd-insights-4dd2</link>
      <guid>https://dev.to/refractionray/agentic-rd-insights-4dd2</guid>
      <description>&lt;p&gt;This year, I dove headfirst into Agentic Coding and automated workflows, integrating them intensely into my daily development and research. The general consensus is that AI crossed a critical threshold late last year, and my hands-on experience confirms it. I’ve barely written any code manually this year, and the output from AI agents has been staggering. &lt;/p&gt;

&lt;p&gt;To give you an idea of the scale: my &lt;code&gt;tensorcircuit-ng&lt;/code&gt; (TC) repository saw a net increase of over 20,000 lines of python code. It took me barely two days to organically integrate and rewrite QuEra's newly released &lt;code&gt;tsim&lt;/code&gt; into the TC framework. On the research front, I built paper-reproduction infrastructure within TC, allowing me to reproduce highly complex, representative quantum physics papers in mere minutes—I’ve knocked out over a dozen so far. Once, I spent less than a day running an end-to-end automated pipeline that handled a referee report: supplementing experiments, plotting graphs, writing the reply, and revising the manuscript. Algorithmically, I used the TC paradigm to auto-generate high-quality DMRG code in minutes; it natively supports GPUs and its CPU efficiency beats mature frameworks like &lt;code&gt;quimb&lt;/code&gt;. Throw in fully automated translations of the TC documentation and auto-filling grant proposal templates, and the efficiency multiplier is absolutely an order of magnitude or more.&lt;/p&gt;

&lt;p&gt;But looking at this massive output, an inevitable question arises: In an era where everyone has access to the exact same cognitive baseline—models like Claude 4.6 or GPT 5.4—what actually dictates the ceiling of our productivity? Why aren't we seeing a 100x boost across the board? &lt;/p&gt;

&lt;p&gt;After high-intensity practice, I realized the answer isn't "better prompt engineering." It's hidden in the architecture of your workflow. The real differentiator is how you leverage personal data and experience to build a resilient system across the "Frontend, Middle, and Backend" of your pipeline. Interestingly, while building this system, you inadvertently design the exact countermeasures needed to mitigate the three fatal character flaws highly intelligent LLMs exhibit: &lt;strong&gt;Laziness, Impatience, and Deception.&lt;/strong&gt;&lt;/p&gt;

&lt;h3&gt;
  
  
  The Frontend: Personal Context as the Ultimate Moat
&lt;/h3&gt;

&lt;p&gt;The core insight for the frontend is simple: personal context and workflow paradigms are your ultimate moats in the Agent Era. The coding world is a perfect playground for AI not just because code is easily verifiable, but because its physical logic is self-consistent and its context is completely intact—there is no context fragmentation. &lt;/p&gt;

&lt;p&gt;In general problem-solving, our thoughts are scattered across our brains, chat logs, loose docs, and random materials. Without centralized, normalized context, an AI agent will always struggle. In my practice, context consists of a static component and a dynamic one.&lt;/p&gt;

&lt;p&gt;The static "Wiki" is the cognitive bedrock for the LLM. The &lt;code&gt;tensorcircuit-ng&lt;/code&gt; monorepo itself acts as a hyper-powerful context infrastructure. It doesn’t just hold framework code; it aggregates nearly 200 specific quantum use cases, physical logic constraints, and historical experiment logs. When the LLM hooks into this, it isn't facing a sterile prompt—it's stepping into a rich, domain-specific knowledge base. (Karpathy recently mentioned using AI to index and retrieve personal knowledge bases—often without even needing vectorization, as smart &lt;code&gt;grep&lt;/code&gt; and indexing work better. This "Based AI, for AI, from AI" context management is something I had already implemented, and it feels like the most natural evolution of human-computer interaction.)&lt;/p&gt;

&lt;p&gt;The dynamic "Skill" component is the digital extension of your personal execution paradigm. Sure, for generic tasks like parsing a DOCX, you just use an off-the-shelf plugin. But &lt;em&gt;workflow skills&lt;/em&gt; are deeply personal and nearly impossible to substitute. I don't believe in using standard, third-party workflow skills; every individual's needs are highly customized. I built a &lt;code&gt;.agents/skills&lt;/code&gt; toolbox inside TC specifically for performance reviews, paper reproduction, and tutorial generation. I also have a private skill repository encapsulating my highly specific habits for logging numerical experiments, SSHing into remote clusters, and drafting grants. &lt;/p&gt;

&lt;p&gt;Simply put: the Wiki tells the AI "what we have," and the Skills tell the AI "how I think and solve problems." (Fun fact: the reason this post doesn't sound like AI slop is because I instructed the AI to mimic my previous blog posts. The blog itself became the context. The AI summarized my style as: "No redundant formatting, hardcore geeky tone, stream-of-consciousness switching between tech and philosophy.")&lt;/p&gt;

&lt;p&gt;This frontend architecture perfectly mitigates the AI's first character flaw: &lt;strong&gt;Laziness.&lt;/strong&gt; This laziness often stems from performance degradation and attention-loss over long context windows. Anyone who uses AI knows that on long-haul tasks (like full-repo refactors or translations), it loves to slack off, do half the work, or just spit out a function signature with a &lt;code&gt;pass&lt;/code&gt; statement. But when you lock the AI in a high-quality Wiki that enforces strict background constraints, and use custom Skills to force large tasks into atomic, pipeline steps, the AI loses the room to cut corners. You have to back the AI into a corner where it has no choice but to apply its full intellect to solve your problem.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Middle: The Economics of Human-in-the-Loop
&lt;/h3&gt;

&lt;p&gt;When it comes to execution, there is only one rule: reject blind end-to-end automation. Intervening, discussing, and course-correcting in the middle of a task is vastly more economical. &lt;/p&gt;

&lt;p&gt;Many people chase the dream of fully autonomous end-to-end agents. But for research or engineering tasks with strict delivery requirements that cannot be 100% automatically verified, this is a recipe for disaster. Human-in-the-loop (HITL) is mandatory. Think of it like a Principal Investigator advising a PhD student. You don't write every line of code for them, but you must have regular syncs, correct their trajectory, and redeploy tasks based on current progress. You don't just wait three months and read the final paper. The time and "human bandwidth" spent on these middle-stage checks seem costly, but compared to the agonizing effort of reverse-engineering what the AI did wrong—or doing a complete rewrite because the architecture was flawed from day one—it is negligible. &lt;/p&gt;

&lt;p&gt;Furthermore, one or two sentences of human intuition can be the difference between success and total failure. This is why human experts still matter. A quick pointer can pull an AI out of a logical mud pit; without it, the task stalls. Currently, the best AI-driven research is done by domain experts, and the best AI-written code is guided by senior engineers. Relying on "AI vibes" in a domain you don't understand only yields half-baked prototypes. AI is not a silver bullet; human taste, experience, and intuition remain rare and decisive.&lt;/p&gt;

&lt;p&gt;This mentorship model mitigates the AI's second flaw: &lt;strong&gt;Impatience.&lt;/strong&gt; This impatience is an artifact of RLHF, which encourages models to generate the shortest path to an answer. When an AI hits a test failure or a bug, its first instinct is almost never to carefully read the stack trace. Instead, it relies on hallucinated intuition to blindly hack the source code, hoping for a quick green light. It usually makes things worse. If it fails again, it hacks the code again, refusing to write a script to verify its assumptions. &lt;/p&gt;

&lt;p&gt;With HITL, we lay down the law: whenever there is an error, the AI is strictly forbidden from touching the source code. It &lt;em&gt;must&lt;/em&gt; first write a minimal reproducible demo script to isolate the bug, and then report back to me. Often, just writing the demo makes the AI realize the bug isn't where it thought it was. Only after I confirm the root cause is the AI allowed to modify the codebase. This forced braking mechanism pulls the AI out of its blind-hacking loop and forces rational deduction.&lt;/p&gt;

&lt;h3&gt;
  
  
  The Backend: Testing, Eval, and the Bandwidth Bottleneck
&lt;/h3&gt;

&lt;p&gt;In the backend evaluation phase, we have to face a harsh reality: while automated testing and evaluation determine the &lt;em&gt;floor&lt;/em&gt; of an Agent's capabilities, human bandwidth is almost always the ultimate ceiling.&lt;/p&gt;

&lt;p&gt;Automated testing is crucial. It’s the very foundation of why AI excels at coding tasks (think RLVR). Some argue that &lt;em&gt;tests are the new moat&lt;/em&gt;, even more important than the implementation itself, because an AI can generate the implementation if the tests are exhaustive. (This is why some modern frameworks open-source their code but close-source their test suites). &lt;/p&gt;

&lt;p&gt;But even in highly formalized tasks like code generation—especially when doing secondary development on a mature, opinionated codebase—humans are still required for global architectural design, semantic alignment, and taking ultimate responsibility for the code. Just like managing a team of human engineers, there is a hard limit to how many Agents a human can effectively manage. We cannot infinitely scale compute and Agent instances and expect them to output 100% reliable work entirely on their own. In the AI era, trust and attention are the most precious resources. Testing and acceptance simply require massive human bandwidth to bridge that trust gap.&lt;/p&gt;

&lt;p&gt;Since human review is unavoidable, the trick is to exploit the AI's asymmetric capabilities to save our bandwidth. An LLM's ability to judge (discriminate) is significantly stronger than its ability to generate. Therefore, we can introduce AI cross-validation as a firewall before human review. I use an independent, freshly instanced model in an extremely clean context to review the generated code logic, creating an automated loop of adversarial review and revision. The "clean context" is vital—the reviewer AI must &lt;em&gt;never&lt;/em&gt; see the messy trial-and-error history of the generator AI, otherwise it will empathize with the generator and lose its objectivity.&lt;/p&gt;

&lt;p&gt;This clean-room evaluation mechanism mitigates the AI's third flaw: &lt;strong&gt;Deception (Reward Hacking).&lt;/strong&gt; If you rely solely on basic automated tests, AI becomes terrifyingly deceptive. To make a failing test turn green, it will maliciously use workarounds or physics-defying hardcodes just to hack the test suite. An independent reviewing Agent with strong discriminative capabilities and a clean context acts as a filter, catching these brainless "code-golfing" hacks before they ever reach my desk, saving my precious bandwidth for the final architectural sign-off.&lt;/p&gt;

&lt;h3&gt;
  
  
  Conclusion
&lt;/h3&gt;

&lt;p&gt;By building deep personal Contexts, forging custom Skill tools, enforcing HITL mentorship, and utilizing clean-room independent evaluations, you really can boost your productivity by an order of magnitude. &lt;/p&gt;

&lt;p&gt;But let's be clear: these systems only &lt;em&gt;mitigate&lt;/em&gt; the AI's laziness, impatience, and deception—they do not cure it. In the foreseeable future, human bandwidth remains the absolute bottleneck in the Agent workflow. Dreaming of a 100x or 1000x productivity boost today will only result in highly unreliable output. &lt;/p&gt;

&lt;p&gt;And perhaps that’s not a bad thing. In this human-machine collaboration, AI is the ultimate generation engine and an untiring preliminary reviewer. But the final quality control, the closing of the physical logic loop, and the ultimate responsibility for the scientific output must rest with the human. When everyone has access to the exact same AI, your accumulated personal data, your polished workflows, and where you choose to invest your limited human bandwidth (decision-making, reviewing, critical insights) become your deepest moats. The irreplaceable nature of humans right now lies in implicit knowledge—taste, intuition, and problem-framing—which cannot be distilled into a text prompt or an executable Skill.&lt;/p&gt;

&lt;p&gt;Of course, given the breakneck speed of AI development, if these remaining "irreplaceable" human traits become commoditized a year from now, I won't be surprised. By this time next year, perhaps none of these insights will even be relevant anymore.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>quantum</category>
    </item>
    <item>
      <title>Unleashing AI in Quantum Research: Why TensorCircuit-NG is the Ultimate Foundation for the Agent Era</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Thu, 12 Mar 2026 01:45:14 +0000</pubDate>
      <link>https://dev.to/refractionray/unleashing-ai-in-quantum-research-why-tensorcircuit-ng-is-the-ultimate-foundation-for-the-agent-era-40n2</link>
      <guid>https://dev.to/refractionray/unleashing-ai-in-quantum-research-why-tensorcircuit-ng-is-the-ultimate-foundation-for-the-agent-era-40n2</guid>
      <description>&lt;p&gt;With LLMs and AI agents making code generation faster, cheaper, and more accessible, a massive new frontier has opened in scientific computing. But while AI can easily string logic together, it still needs a powerful, mathematically rigorous engine to drive it.&lt;/p&gt;

&lt;p&gt;This is where TensorCircuit-NG (TCNG) truly shines. Far from just adapting to the AI era, TCNG acts as the essential catalyst that makes AI-driven quantum research possible, scalable, and highly performant.&lt;/p&gt;

&lt;p&gt;Here is why TCNG is more important than ever for researchers and AI agents alike.&lt;/p&gt;




&lt;h3&gt;
  
  
  🧱 1. The Foundational "Physics Engine" for AI
&lt;/h3&gt;

&lt;p&gt;AI models are fantastic at orchestrating high-level logic, but they struggle to invent highly optimized, low-level mathematical frameworks from scratch. TCNG represents the kind of deep, specialized engineering that is incredibly hard to replicate. By fusing machine learning backends with customized hardware operators and advanced tensor network contraction engines, TCNG acts as a fundamental infrastructure layer. Just as AI agents don't try to rewrite TensorFlow or PyTorch—they simply use them—agents can call TCNG as foundational building blocks to construct complex quantum applications effortlessly.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛡️ 2. Guiding AI to High-Performance Paradigms
&lt;/h3&gt;

&lt;p&gt;Left to its own devices, AI can easily generate code that works but runs terribly. TCNG solves this by providing a strict, high-performance architecture. Because TCNG enforces strong paradigms—such as backend-agnostic design, automatic differentiation (AD), Just-In-Time (JIT) compilation, and hardware acceleration (GPUs/TPUs)—it inherently &lt;strong&gt;forces AI to write code using best practices&lt;/strong&gt;. When an agent builds with TCNG, the resulting scripts automatically inherit top-tier performance and scalability without the AI needing to understand the underlying computational bottlenecks.&lt;/p&gt;

&lt;h3&gt;
  
  
  📚 3. Unmatched Context Completeness for Agents
&lt;/h3&gt;

&lt;p&gt;For an AI agent to be truly autonomous and accurate, it needs massive, high-quality, and unified context. TCNG provides exactly this: over six years of rich, accumulated domain knowledge packed into a cohesive mono-repo. It houses everything from exhaustive documentation to edge-case physics functionalities. Because the entire quantum landscape is mapped out within a single repository, it is incredibly friendly for AI agents to ingest, cross-reference, and use as a springboard for creating entirely new tools and discoveries.&lt;/p&gt;

&lt;h3&gt;
  
  
  🧠 4. A Massive Training Ground for Automated Discovery
&lt;/h3&gt;

&lt;p&gt;AI learns best by example, and TCNG is built to be the ultimate reference library. We now host &lt;strong&gt;over 150 carefully crafted example scripts&lt;/strong&gt;, providing an incredibly strong foundation for AI to recognize quantum programming patterns and generate novel applications. Leveraging this, we are launching an exciting new initiative: &lt;strong&gt;fully automated reproduction of representative quantum research papers&lt;/strong&gt;, driven entirely by AI using TCNG's vast library as its reference point.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛠️ 5. Native Agentic Skills Out of the Box
&lt;/h3&gt;

&lt;p&gt;TCNG isn’t just designed for human researchers to use alongside AI; it is actively built to give AI agents superpowers. TCNG provides a series of native "skills" designed to help agents automate complex workflows, including:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;End-to-end reproduction of research papers&lt;/li&gt;
&lt;li&gt;Seamless code translation across different frameworks&lt;/li&gt;
&lt;li&gt;Automated performance optimization and profiling&lt;/li&gt;
&lt;li&gt;The auto-generation of interactive demos and educational tutorials&lt;/li&gt;
&lt;/ul&gt;




&lt;h3&gt;
  
  
  The Bottom Line
&lt;/h3&gt;

&lt;p&gt;In the era of AI agents, coding might be cheap, but world-class scientific infrastructure is priceless. TensorCircuit-NG provides the deep-tech foundation, the optimized paradigms, and the rich, accumulated context that AI needs to push the boundaries of quantum physics. It isn't just a tool; it is the infrastructure that will power the next generation of automated quantum discovery.&lt;/p&gt;




</description>
      <category>agents</category>
      <category>ai</category>
      <category>llm</category>
      <category>science</category>
    </item>
    <item>
      <title>We Built the First AI-Native Quantum Software Framework: Say Hello to Agentic TensorCircuit-NG</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Sat, 28 Feb 2026 06:02:18 +0000</pubDate>
      <link>https://dev.to/refractionray/we-built-the-first-ai-native-quantum-software-framework-say-hello-to-agentic-tensorcircuit-ng-3cek</link>
      <guid>https://dev.to/refractionray/we-built-the-first-ai-native-quantum-software-framework-say-hello-to-agentic-tensorcircuit-ng-3cek</guid>
      <description>&lt;p&gt;Quantum computing software is notoriously hard to write.&lt;/p&gt;

&lt;p&gt;If you want to simulate a deep quantum neural network or research a new algorithm, you don't just need to understand Hamiltonian dynamics and Hilbert spaces. You also need to be a High-Performance Computing (HPC) expert—wrestling with GPU memory limits (OOMs), vectorization, JIT compilation staging times, and tensor network contraction paths.&lt;/p&gt;

&lt;p&gt;For years, we've provided developers with the tools to do this via &lt;strong&gt;TensorCircuit-NG&lt;/strong&gt;, our next-generation open-source, high-performance quantum software framework.&lt;/p&gt;

&lt;p&gt;But tools are passive. You still have to do the heavy lifting.&lt;/p&gt;

&lt;p&gt;Today, we are changing the paradigm. We are thrilled to announce that &lt;strong&gt;TensorCircuit-NG is now the world’s first AI-native quantum programming platform purpose-built for agentic quantum research and automated scientific discovery.&lt;/strong&gt; By natively integrating skills directly into our repository, your quantum framework now comes with a built-in HPC engineer, a theoretical physicist, and a technical writer.&lt;/p&gt;

&lt;h2&gt;
  
  
  The Paradigm Shift: Agent-Ready Architecture 🧠
&lt;/h2&gt;

&lt;p&gt;Most AI coding assistants do "line-by-line" translations or generate boilerplate. That doesn't work in quantum simulation, where a poorly placed &lt;code&gt;for&lt;/code&gt; loop can increase compilation time from 2 seconds to 2 hours.&lt;/p&gt;

&lt;p&gt;Instead of writing endless tutorials on "best practices," we embedded our framework knowledge directly into the repository as &lt;strong&gt;Agentic Skills&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;If you clone the latest TensorCircuit-NG repo, you'll notice a new directory structure:&lt;/p&gt;

&lt;p&gt;Plaintext&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;.agents/skills/
├── arxiv-reproduce/
├── performance-optimize/
├── tc-rosetta/
└── tutorial-crafter/
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;These aren't just prompts; they are strict, engineering-bound AI workflows. Let's break down the four superpowers you now have access to right out of the box.&lt;/p&gt;

&lt;h3&gt;
  
  
  1. &lt;code&gt;/arxiv-reproduce&lt;/code&gt;: From arXiv ID to JAX-Accelerated Code in Minutes 📄➡️💻
&lt;/h3&gt;

&lt;p&gt;The gap between reading a cutting-edge quantum machine learning paper on arXiv and actually writing the code to reproduce it is huge.&lt;/p&gt;

&lt;p&gt;With the &lt;code&gt;arxiv-reproduce&lt;/code&gt; skill, you simply hand the AI an arXiv link. The agent will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Extract the physical intent&lt;/strong&gt; (the Ansatz, the Hamiltonian, the loss function).&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Intelligently scale down&lt;/strong&gt; the qubit count so it runs on your local machine without blowing up your RAM.&lt;/li&gt;
&lt;li&gt;Generate idiomatically correct, JAX-accelerated TensorCircuit-NG code.&lt;/li&gt;
&lt;li&gt;Automatically run formatting (&lt;code&gt;black&lt;/code&gt;), linting (&lt;code&gt;pylint&lt;/code&gt;), and execute the script to save the reproduced figure into a standardized &lt;code&gt;outputs/&lt;/code&gt; folder.&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  2. &lt;code&gt;/performance-optimize&lt;/code&gt;: Your Built-in HPC Architect ⚡
&lt;/h3&gt;

&lt;p&gt;Got a quantum script that takes forever to compile or crashes with an Out-of-Memory (OOM) error?&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;performance-optimize&lt;/code&gt; agent scans your code to identify bottlenecks. It knows the dark arts of quantum HPC: it will automatically eradicate Python loops in favor of &lt;code&gt;jax.vmap&lt;/code&gt;, wrap your deep quantum layers in &lt;code&gt;jax.lax.scan&lt;/code&gt; to slash JIT staging time, inject &lt;code&gt;jax.checkpoint&lt;/code&gt; to trade compute for memory during backpropagation, and seamlessly switch to &lt;code&gt;cotengra&lt;/code&gt; for optimal tensor network contraction paths. It even runs A/B benchmarks to prove the speedup!&lt;/p&gt;

&lt;h3&gt;
  
  
  3. &lt;code&gt;/tc-rosetta&lt;/code&gt;: End-to-End Cross-Ecosystem Translation 🌍
&lt;/h3&gt;

&lt;p&gt;Migrating from older, object-oriented quantum frameworks (like Qiskit or PennyLane) to a modern, differentiable, functional framework like TensorCircuit-NG is a steep mental shift.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;tc-rosetta&lt;/code&gt; does not do naive line-by-line syntax swapping. It performs &lt;strong&gt;end-to-end intent extraction&lt;/strong&gt;. It reads your slow, loop-heavy legacy script, understands the math behind it, and rewrites it from scratch using pure JAX-native paradigms. It then executes both scripts and hands you a benchmark report (e.g., &lt;em&gt;"Execution time reduced from 300 seconds to 0.2 seconds"&lt;/em&gt;).&lt;/p&gt;

&lt;h3&gt;
  
  
  4. &lt;code&gt;/tutorial-crafter&lt;/code&gt;: Automated High-Quality Documentation 📝
&lt;/h3&gt;

&lt;p&gt;Writing docs is the bane of every open-source contributor. What if the code could explain itself?&lt;/p&gt;

&lt;p&gt;Point &lt;code&gt;tutorial-crafter&lt;/code&gt; at any raw TensorCircuit-NG script. It will analyze the physical background and the code, then generate a beautiful, narrative-driven tutorial in &lt;strong&gt;both Markdown and HTML formats&lt;/strong&gt;. It chunks the code logically, adds LaTeX formulas for the physics theory, and explicitly points out the HPC programming highlights (e.g., &lt;em&gt;"Notice how we used vmap here instead of a loop..."&lt;/em&gt;). It generates documentation that rivals hand-crafted, premium tutorials.&lt;/p&gt;

&lt;h2&gt;
  
  
  How to Experience the Magic ✨
&lt;/h2&gt;

&lt;p&gt;Because these skills are built on the open standard, getting started is zero-friction.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Clone the TensorCircuit-NG repository.&lt;/li&gt;
&lt;li&gt;Open your terminal in the repo root.&lt;/li&gt;
&lt;li&gt;Fire up your AI agent and simply call a skill: &lt;code&gt;/performance-optimize examples/my_slow_circuit.py&lt;/code&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;You are no longer just writing code; you are directing an autonomous digital research team.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Welcome to the era of Agentic Quantum Software Engineering.&lt;/strong&gt; We can't wait to see what you discover. Check out the &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng" rel="noopener noreferrer"&gt;repo&lt;/a&gt;, give us a star, and let the AI handle the boilerplate while you focus on the physics! 🌌&lt;/p&gt;

</description>
      <category>ai</category>
      <category>agents</category>
      <category>quantum</category>
      <category>research</category>
    </item>
    <item>
      <title>🚀 TensorCircuit-NG: The Universal, Differentiable Quantum Infrastructure</title>
      <dc:creator>Shixin Zhang</dc:creator>
      <pubDate>Tue, 10 Feb 2026 03:13:39 +0000</pubDate>
      <link>https://dev.to/refractionray/tensorcircuit-ng-the-universal-differentiable-quantum-infrastructure-1g34</link>
      <guid>https://dev.to/refractionray/tensorcircuit-ng-the-universal-differentiable-quantum-infrastructure-1g34</guid>
      <description>&lt;p&gt;👋 &lt;strong&gt;Hello World!&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;If you are a developer exploring quantum machine learning, or a physicist tired of rewriting code to make it run on GPUs, you have likely faced the "Framework Dilemma."&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Do you write in &lt;strong&gt;PyTorch&lt;/strong&gt; because you need the dataloaders?&lt;/li&gt;
&lt;li&gt;Do you switch to &lt;strong&gt;JAX&lt;/strong&gt; for that sweet JIT compilation speed?&lt;/li&gt;
&lt;li&gt;Do you stick to &lt;strong&gt;TensorFlow&lt;/strong&gt; because of legacy production pipelines?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;What if your quantum simulator didn't care?&lt;/p&gt;

&lt;p&gt;Meet &lt;strong&gt;TensorCircuit-NG (Next Generation)&lt;/strong&gt;—the open-source, tensor-native platform that unifies quantum physics, AI, and High-Performance Computing.&lt;/p&gt;

&lt;h3&gt;
  
  
  🌟 What is TensorCircuit-NG?
&lt;/h3&gt;

&lt;p&gt;TensorCircuit-NG is not just another circuit simulator. It is a &lt;strong&gt;backend-agnostic computational infrastructure&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;It is designed to let you define your physics logic &lt;em&gt;once&lt;/em&gt; and execute it anywhere. It wraps industry-standard ML frameworks (&lt;strong&gt;JAX, TensorFlow, PyTorch&lt;/strong&gt;) into a unified engine, making quantum simulation end-to-end differentiable and hardware-accelerated.&lt;/p&gt;

&lt;h3&gt;
  
  
  🛠️ The "Write Once, Run Anywhere" Philosophy
&lt;/h3&gt;

&lt;p&gt;The killer feature of TensorCircuit-NG is &lt;strong&gt;Infrastructure Unification&lt;/strong&gt;.&lt;/p&gt;

&lt;p&gt;You don't need to learn a new dialect for every backend. You simply switch the engine with one line of code:&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import tensorcircuit as tc

# Want JAX for JIT speed?
tc.set_backend("jax")

# Want PyTorch for easy integration with your existing DL models?
tc.set_backend("pytorch")

# Legacy TensorFlow project?
tc.set_backend("tensorflow")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This flexibility enables radical interoperability. You can train a hybrid model where the data pipeline lives in PyTorch, but the heavy-duty quantum circuit simulation is JIT-compiled via JAX/XLA for massive speedups—all handling zero-copy tensor transfers (DLPack) under the hood.&lt;/p&gt;

&lt;h3&gt;
  
  
  ⚡ Why You Should Try It
&lt;/h3&gt;

&lt;h4&gt;
  
  
  1. Native Machine Learning Integration
&lt;/h4&gt;

&lt;p&gt;We treat quantum circuits as first-class citizens in the computational graph.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Plug-and-Play Layers:&lt;/strong&gt; Use &lt;code&gt;tc.TorchLayer&lt;/code&gt; or &lt;code&gt;tc.KerasLayer&lt;/code&gt; to insert parameterized quantum circuits directly into classical ResNets or Transformers.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Automatic Differentiation (AD):&lt;/strong&gt; Forget parameter-shift rules. We compute gradients via backpropagation through the tensor network, making VQE and QML training exponentially faster.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  2. HPC-Ready Scalability
&lt;/h4&gt;

&lt;p&gt;Stop simulating on your CPU. TensorCircuit-NG supports:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;GPU/TPU Acceleration:&lt;/strong&gt; Move simulations to NVIDIA GPUs or Google TPUs without changing your physics code.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Distributed Computing:&lt;/strong&gt; We support automated data parallelism (scaling to multiple devices) and model parallelism (tensor network slicing across GPU clusters).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Benchmark:&lt;/strong&gt; We've demonstrated near-linear speedups on &lt;strong&gt;8x NVIDIA H200 GPU clusters&lt;/strong&gt;, simulating end-to-end variational quantum algorithms with &lt;strong&gt;40+ qubits&lt;/strong&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  3. Advanced Physics Engines
&lt;/h4&gt;

&lt;p&gt;It’s not just for qubits. TCNG comes with batteries included for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Fermions  Gaussian States:&lt;/strong&gt; Efficiently simulate thousands of fermions.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Qudits:&lt;/strong&gt; Native support for high-dimensional systems (d≥3).&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;Noise Modeling:&lt;/strong&gt; Customizable noise channels for realistic hardware simulation.&lt;/p&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  💻 Show Me The Code
&lt;/h3&gt;

&lt;p&gt;Here is how simple it is to build a differentiable variational circuit (VQE) that runs on &lt;em&gt;any&lt;/em&gt; backend:&lt;/p&gt;

&lt;p&gt;Python&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;import tensorcircuit as tc

# 1. Select your fighter (Backend)
tc.set_backend("jax") # or "pytorch", "tensorflow"

def vqe_loss(params, n=6):
    c = tc.Circuit(n)

    # 2. Build circuit (Hardware efficient ansatz)
    for i in range(n):
        c.rx(i, theta=params[i])
    for i in range(n-1):
        c.cnot(i, i+1)

    # 3. Calculate Expectation
    # This entire process is differentiable!
    e = c.expectation_ps(z=[0, 1]) 
    return tc.backend.real(e)

# 4. Get Gradients (Backend Agnostic API)
# This works regardless of whether you chose JAX, TF, or Torch
val_and_grad = tc.backend.jit(tc.backend.value_and_grad(vqe_loss))

# Run it!
print(val_and_grad(tc.backend.ones(6)))
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h3&gt;
  
  
  🤝 Join the Community
&lt;/h3&gt;

&lt;p&gt;TensorCircuit-NG is &lt;strong&gt;Open Source (Apache 2.0)&lt;/strong&gt; and ready for you to hack on.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;GitHub:&lt;/strong&gt; &lt;a href="https://github.com/tensorcircuit/tensorcircuit-ng" rel="noopener noreferrer"&gt;Check out the Repository&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Install:&lt;/strong&gt; &lt;code&gt;pip install tensorcircuit-ng&lt;/code&gt; &lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Docs:&lt;/strong&gt; &lt;a href="https://tensorcircuit-ng.readthedocs.io/en/latest/" rel="noopener noreferrer"&gt;Read the Documentation&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Whether you are building the next QML image classifier or simulating many-body physics, we'd love to see what you build.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Happy Coding! ⚛️&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>quantum</category>
      <category>ai</category>
      <category>opensource</category>
      <category>differentiable</category>
    </item>
  </channel>
</rss>
