<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Nicholas Volkhin</title>
    <description>The latest articles on DEV Community by Nicholas Volkhin (@sbwerewolf).</description>
    <link>https://dev.to/sbwerewolf</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3875429%2F4b72e857-51ba-48a7-9f85-069e61293461.jpg</url>
      <title>DEV Community: Nicholas Volkhin</title>
      <link>https://dev.to/sbwerewolf</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/sbwerewolf"/>
    <language>en</language>
    <item>
      <title>How to Parse Large XML Files in PHP Without Running Out of Memory</title>
      <dc:creator>Nicholas Volkhin</dc:creator>
      <pubDate>Sun, 12 Apr 2026 20:01:05 +0000</pubDate>
      <link>https://dev.to/sbwerewolf/how-to-parse-large-xml-files-in-php-without-running-out-of-memory-234o</link>
      <guid>https://dev.to/sbwerewolf/how-to-parse-large-xml-files-in-php-without-running-out-of-memory-234o</guid>
      <description>&lt;p&gt;XML is still everywhere: supplier feeds, marketplace catalogs, partner&lt;br&gt;
exports, legacy APIs, SOAP-ish payloads, ETL jobs. None of that is &lt;br&gt;
glamorous, but plenty of production systems still depend on it.&lt;/p&gt;

&lt;p&gt;The real problem starts when the file is no longer small.&lt;/p&gt;

&lt;p&gt;At that point, the question is not really &lt;strong&gt;"How do I parse XML in &lt;br&gt;
PHP?"&lt;/strong&gt; It becomes:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;How do I process a large XML document safely, extract only the &lt;br&gt;
records I care about, and keep the rest of my application working &lt;br&gt;
with normal PHP data structures?&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is a very different problem.&lt;/p&gt;

&lt;p&gt;In many real-world integrations, you do not need the whole XML &lt;br&gt;
document in memory. You do not need to traverse every branch of the &lt;br&gt;
tree. You do not need a rich DOM-style model.&lt;/p&gt;

&lt;p&gt;You usually need something much simpler:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;scan the file efficiently;&lt;/li&gt;
&lt;li&gt;find repeated business records such as &lt;code&gt;product&lt;/code&gt;, &lt;code&gt;offer&lt;/code&gt;, or &lt;code&gt;item&lt;/code&gt;;&lt;/li&gt;
&lt;li&gt;extract those records;&lt;/li&gt;
&lt;li&gt;turn them into arrays;&lt;/li&gt;
&lt;li&gt;pass them to the rest of your pipeline.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the approach I use in modern PHP projects, and it is the one &lt;br&gt;
I recommend for large XML workloads.&lt;/p&gt;
&lt;h2&gt;
  
  
  Why naive XML parsing stops working
&lt;/h2&gt;

&lt;p&gt;For small files, the usual PHP XML tools are perfectly fine.&lt;/p&gt;

&lt;p&gt;A typical first solution looks like this:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_file&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$xml&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;products&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;product&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="c1"&gt;// process product&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;There is nothing wrong with that when the file is small and the &lt;br&gt;
document structure is simple.&lt;/p&gt;

&lt;p&gt;The trouble is that this style of code implicitly treats the XML file &lt;br&gt;
as something you want to load and work with as a whole. For large &lt;br&gt;
feeds, that is often the wrong tradeoff.&lt;/p&gt;

&lt;p&gt;If you only need repeated business records from a large XML file, &lt;br&gt;
materializing the entire document in memory is unnecessary work. It &lt;br&gt;
also makes your pipeline more fragile as feeds grow over time.&lt;/p&gt;

&lt;p&gt;This is why large-XML handling should start with a different mental &lt;br&gt;
model:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;Do not load the document. Stream through it and extract only what &lt;br&gt;
matters.&lt;/p&gt;
&lt;/blockquote&gt;
&lt;h2&gt;
  
  
  The real task is usually extraction, not XML manipulation
&lt;/h2&gt;

&lt;p&gt;In practice, most XML processing jobs in application code look like &lt;br&gt;
this:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the file contains many repeated records;&lt;/li&gt;
&lt;li&gt;you only need a subset of them;&lt;/li&gt;
&lt;li&gt;you only need some fields from each record;&lt;/li&gt;
&lt;li&gt;the result will end up in arrays, JSON, a database, or a queue.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That means the business task is usually not "work with XML as a &lt;br&gt;
document."&lt;/p&gt;

&lt;p&gt;It is:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Find the repeated records I care about and turn them into &lt;br&gt;
application-friendly data.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That distinction matters because it leads directly to the right &lt;br&gt;
low-memory approach.&lt;/p&gt;
&lt;h2&gt;
  
  
  The memory-safe foundation: XMLReader
&lt;/h2&gt;

&lt;p&gt;In PHP, the standard low-level tool for memory-safe XML traversal is &lt;br&gt;
&lt;code&gt;XMLReader&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;Instead of loading the entire document, it lets you move through the &lt;br&gt;
XML cursor-style, node by node.&lt;/p&gt;

&lt;p&gt;That is exactly what you want when the file is large.&lt;/p&gt;

&lt;p&gt;Here is a minimal baseline example:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'feed.xml'&lt;/span&gt;&lt;span class="p"&gt;))&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="k"&gt;while&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;read&lt;/span&gt;&lt;span class="p"&gt;())&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'product'&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="nv"&gt;$nodeXml&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;readOuterXML&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;

        &lt;span class="nv"&gt;$product&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;simplexml_load_string&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$nodeXml&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

        &lt;span class="nv"&gt;$data&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
            &lt;span class="s1"&gt;'id'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'name'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'price'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;float&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;price&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
            &lt;span class="s1"&gt;'available'&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;string&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="nv"&gt;$product&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;available&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="p"&gt;];&lt;/span&gt;

        &lt;span class="c1"&gt;// process $data immediately&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is already much better than loading the full file up front.&lt;/p&gt;

&lt;p&gt;It gives you the right execution model:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;sequential reading;&lt;/li&gt;
&lt;li&gt;low memory pressure;&lt;/li&gt;
&lt;li&gt;immediate processing of extracted records.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;If your XML task is simple and one-off, this may be enough.&lt;/p&gt;

&lt;p&gt;But once you do this in more than one project, the weak points show &lt;br&gt;
up quickly.&lt;/p&gt;
&lt;h2&gt;
  
  
  Where raw XMLReader starts to hurt
&lt;/h2&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; is powerful, but it is also low-level.&lt;/p&gt;

&lt;p&gt;The moment your extraction task becomes slightly more realistic, you &lt;br&gt;
start accumulating glue code:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;repeated node-selection logic;&lt;/li&gt;
&lt;li&gt;conversion of XML fragments into arrays;&lt;/li&gt;
&lt;li&gt;nested element handling;&lt;/li&gt;
&lt;li&gt;attributes versus values;&lt;/li&gt;
&lt;li&gt;optional nodes;&lt;/li&gt;
&lt;li&gt;repeated fields like multiple &lt;code&gt;&amp;lt;picture&amp;gt;&lt;/code&gt; tags;&lt;/li&gt;
&lt;li&gt;serialization to JSON-friendly structures;&lt;/li&gt;
&lt;li&gt;duplicated extraction code across projects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;At that point, memory is no longer the only concern.&lt;/p&gt;

&lt;p&gt;Maintainability becomes the real cost.&lt;/p&gt;

&lt;p&gt;This is the line I care about most in application code: not just "can &lt;br&gt;
I stream it," but "can I keep the extraction logic readable after the &lt;br&gt;
third similar integration?"&lt;/p&gt;
&lt;h2&gt;
  
  
  A more practical extraction-first approach
&lt;/h2&gt;

&lt;p&gt;This is exactly why I built &lt;strong&gt;XmlExtractKit&lt;/strong&gt; for PHP, published as &lt;br&gt;
&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The goal is not to replace &lt;code&gt;XMLReader&lt;/code&gt;, but to keep its streaming &lt;br&gt;
model while moving application code closer to the actual business task.&lt;/p&gt;

&lt;p&gt;Instead of managing the cursor manually and assembling records by &lt;br&gt;
hand, I want code that says:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;open a large XML stream;&lt;/li&gt;
&lt;li&gt;match the elements I care about;&lt;/li&gt;
&lt;li&gt;get plain PHP arrays back.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is a streaming example using the library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight php"&gt;&lt;code&gt;&lt;span class="kn"&gt;use&lt;/span&gt; &lt;span class="nc"&gt;SbWereWolf\XmlNavigator\Parsing\FastXmlParser&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="k"&gt;require_once&lt;/span&gt; &lt;span class="k"&gt;__DIR__&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="s1"&gt;'/vendor/autoload.php'&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;

&lt;span class="nv"&gt;$uri&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nb"&gt;tempnam&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nb"&gt;sys_get_temp_dir&lt;/span&gt;&lt;span class="p"&gt;(),&lt;/span&gt; &lt;span class="s1"&gt;'xml-extract-kit-'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="nb"&gt;file_put_contents&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;&amp;lt;&amp;lt;&amp;lt;'XML'
&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;
&amp;lt;catalog&amp;gt;
  &amp;lt;offer id="1001" available="true"&amp;gt;
    &amp;lt;name&amp;gt;Keyboard&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;49.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
  &amp;lt;service id="s-1"&amp;gt;
    &amp;lt;name&amp;gt;Warranty&amp;lt;/name&amp;gt;
  &amp;lt;/service&amp;gt;
  &amp;lt;offer id="1002" available="false"&amp;gt;
    &amp;lt;name&amp;gt;Mouse&amp;lt;/name&amp;gt;
    &amp;lt;price currency="USD"&amp;gt;19.90&amp;lt;/price&amp;gt;
  &amp;lt;/offer&amp;gt;
&amp;lt;/catalog&amp;gt;
XML&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;open&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$reader&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="kc"&gt;false&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;throw&lt;/span&gt; &lt;span class="k"&gt;new&lt;/span&gt; &lt;span class="nc"&gt;RuntimeException&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="s1"&gt;'Cannot open XML file.'&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;FastXmlParser&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="nf"&gt;extractPrettyPrint&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="k"&gt;static&lt;/span&gt; &lt;span class="k"&gt;fn&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="kt"&gt;XMLReader&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="p"&gt;):&lt;/span&gt; &lt;span class="kt"&gt;bool&lt;/span&gt; &lt;span class="o"&gt;=&amp;gt;&lt;/span&gt;
        &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;nodeType&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="nc"&gt;XMLReader&lt;/span&gt;&lt;span class="o"&gt;::&lt;/span&gt;&lt;span class="no"&gt;ELEMENT&lt;/span&gt;
        &lt;span class="o"&gt;&amp;amp;&amp;amp;&lt;/span&gt; &lt;span class="nv"&gt;$cursor&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="n"&gt;name&lt;/span&gt; &lt;span class="o"&gt;===&lt;/span&gt; &lt;span class="s1"&gt;'offer'&lt;/span&gt;
&lt;span class="p"&gt;);&lt;/span&gt;

&lt;span class="k"&gt;foreach&lt;/span&gt; &lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$offers&lt;/span&gt; &lt;span class="k"&gt;as&lt;/span&gt; &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="k"&gt;echo&lt;/span&gt; &lt;span class="nb"&gt;json_encode&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
        &lt;span class="nv"&gt;$offer&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="no"&gt;JSON_PRETTY_PRINT&lt;/span&gt; &lt;span class="o"&gt;|&lt;/span&gt; &lt;span class="no"&gt;JSON_UNESCAPED_SLASHES&lt;/span&gt;
    &lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="mf"&gt;.&lt;/span&gt; &lt;span class="kc"&gt;PHP_EOL&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="nv"&gt;$reader&lt;/span&gt;&lt;span class="o"&gt;-&amp;gt;&lt;/span&gt;&lt;span class="nf"&gt;close&lt;/span&gt;&lt;span class="p"&gt;();&lt;/span&gt;
&lt;span class="nb"&gt;unlink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="nv"&gt;$uri&lt;/span&gt;&lt;span class="p"&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The output is application-friendly:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1001"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"true"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Keyboard"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"49.90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;





&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"offer"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"1002"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"available"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"false"&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"name"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"Mouse"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"price"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@value"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"19.90"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"@attributes"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"currency"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"USD"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is still a streaming workflow. The difference is that the code &lt;br&gt;
is now centered on the extraction task instead of low-level cursor management.&lt;/p&gt;

&lt;p&gt;That becomes more valuable when the XML structure is nested, &lt;br&gt;
partially optional, or reused across multiple integrations.&lt;/p&gt;

&lt;h2&gt;
  
  
  Why plain arrays are often the right output
&lt;/h2&gt;

&lt;p&gt;A lot of application code does not really want XML.&lt;/p&gt;

&lt;p&gt;It wants data.&lt;/p&gt;

&lt;p&gt;Once the relevant record has been extracted, the rest of the system &lt;br&gt;
usually prefers:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;plain arrays;&lt;/li&gt;
&lt;li&gt;normalized values;&lt;/li&gt;
&lt;li&gt;JSON-ready structures;&lt;/li&gt;
&lt;li&gt;data that can be validated, transformed, and persisted.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is why I think "XML extraction" is a more useful framing than &lt;br&gt;
"XML handling."&lt;/p&gt;

&lt;p&gt;Most business systems do not want to live inside an XML tree. They &lt;br&gt;
want to move past it as quickly as possible.&lt;/p&gt;

&lt;p&gt;If the XML document is just a transport format, then the best &lt;br&gt;
workflow is usually:&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;XML stream -&amp;gt; selected nodes -&amp;gt; PHP arrays&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;That is the design center of my library.&lt;/p&gt;

&lt;h2&gt;
  
  
  When this approach makes sense
&lt;/h2&gt;

&lt;p&gt;This style of XML processing works especially well when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML file is large;&lt;/li&gt;
&lt;li&gt;the document contains many repeated records;&lt;/li&gt;
&lt;li&gt;you only need part of the document;&lt;/li&gt;
&lt;li&gt;the extracted data should be processed immediately;&lt;/li&gt;
&lt;li&gt;the rest of the application works with arrays, not DOM objects.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Typical examples include:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;supplier and marketplace feeds;&lt;/li&gt;
&lt;li&gt;product catalogs;&lt;/li&gt;
&lt;li&gt;partner imports and exports;&lt;/li&gt;
&lt;li&gt;ETL jobs;&lt;/li&gt;
&lt;li&gt;queue payload preparation;&lt;/li&gt;
&lt;li&gt;legacy integration endpoints that still speak XML.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  When you probably do not need it
&lt;/h2&gt;

&lt;p&gt;There are also cases where this is the wrong tool.&lt;/p&gt;

&lt;p&gt;You probably do not need a streaming extraction approach when:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;the XML is small;&lt;/li&gt;
&lt;li&gt;loading the whole file is acceptable;&lt;/li&gt;
&lt;li&gt;you need full-document manipulation;&lt;/li&gt;
&lt;li&gt;your task is closer to DOM transformation than record extraction;&lt;/li&gt;
&lt;li&gt;the XML structure is simple enough that a tiny one-off script is &lt;/li&gt;
&lt;li&gt;enough.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is important to say explicitly.&lt;/p&gt;

&lt;p&gt;Not every XML task needs an extraction-first workflow. But the ones &lt;br&gt;
that do usually benefit from it immediately.&lt;/p&gt;

&lt;h2&gt;
  
  
  A useful rule of thumb
&lt;/h2&gt;

&lt;p&gt;Here is the simplest practical rule I know:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;if the XML is small and you need the whole document, convenience 
APIs are fine;&lt;/li&gt;
&lt;li&gt;if the XML is large and you only need repeated records, stream it;&lt;/li&gt;
&lt;li&gt;if you keep solving the same streaming extraction problem in 
multiple projects, stop writing the same glue code over and over.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;That is the point where a focused library becomes worth it.&lt;/p&gt;

&lt;h2&gt;
  
  
  Conclusion
&lt;/h2&gt;

&lt;p&gt;Large XML files are not primarily a parsing problem.&lt;/p&gt;

&lt;p&gt;They are an extraction problem.&lt;/p&gt;

&lt;p&gt;If you treat them like full in-memory documents, you often pay too &lt;br&gt;
much in memory and complexity. If you treat them like streams of &lt;br&gt;
repeated business records, the solution becomes safer, simpler, and &lt;br&gt;
much easier to fit into modern PHP pipelines.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;XMLReader&lt;/code&gt; gives you the right low-level foundation for that model.&lt;/p&gt;

&lt;p&gt;And if your real task is not "load XML," but "extract matching &lt;br&gt;
records and turn them into plain PHP arrays," then &lt;strong&gt;XmlExtractKit&lt;/strong&gt; &lt;br&gt;
(&lt;code&gt;sbwerewolf/xml-navigator&lt;/code&gt;) was built exactly for that workflow.&lt;/p&gt;

&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;composer require sbwerewolf/xml-navigator
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



</description>
      <category>opensource</category>
      <category>php</category>
      <category>xml</category>
      <category>parsing</category>
    </item>
  </channel>
</rss>
