Nicholas Volkhin

Posted on Apr 15 • Edited on Apr 17

Converting XML Feeds to Plain PHP Arrays in Modern PHP

#xml #xmlreader #etl #integration

When people say they need to “work with XML” in PHP, that phrasing is often already slightly misleading.

In most business applications, XML is not the format you actually want to keep around.

It is just the format you received.

A supplier feed arrives as XML. A marketplace export arrives as XML. A partner integration still speaks XML. A legacy endpoint responds with XML. But once the data enters your application, the rest of the code usually does not want an XML tree.

It wants ordinary PHP data.

That is the practical framing I use in modern PHP projects:

XML is usually a transport format. The real goal is to convert the useful parts into plain PHP arrays as early as possible.

Once you look at the problem this way, a lot of implementation decisions become much clearer.

Why arrays are usually the real target

Most application code does not benefit from carrying XML semantics deeper into the stack than necessary.

Your service layer, validation logic, queue payloads, DTO mappers, logging, database writers, and JSON APIs usually work best with plain associative arrays.

That means the useful pipeline often looks like this:

XML feed → extracted records → plain PHP arrays → validation / normalization / persistence

This is one of the reasons I built XmlExtractKit for PHP, published as sbwerewolf/xml-navigator.

The package is designed around a very boring but very common need:

take XML input;
extract the records that matter;
get plain PHP arrays back;
keep the rest of the application free from low-level XML handling.

That is a better fit for modern application code than dragging cursor
logic or DOM structures through multiple layers.

A typical XML feed problem

Suppose a partner sends you a product feed like this:

<?xml version="1.0" encoding="UTF-8"?>
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>

What does the rest of your application usually want from this?

Not an XML tree.

Usually something closer to this:

[
    'feed' => [
        '@attributes' => [
            'generated_at' => '2026-03-28T09:00:00Z',
        ],
        'offer' => [
            '@attributes' => [
                'id' => '206111',
                'available' => 'true',
            ],
            'name' => 'USB-C Dock',
            'price' => [
                '@value' => '129.90',
                '@attributes' => [
                    'currency' => 'USD',
                ],
            ],
            'picture' => [
                'https://cdn.example.test/1.jpg',
                'https://cdn.example.test/2.jpg',
            ],
        ],
    ],
]

That structure is already much more useful.

You can serialize it, validate it, map it to a DTO, send it to a queue, store it, or normalize it further.

That is why I think “XML to arrays” is a much more practical category than “XML processing” for a lot of real PHP work.

The first decision: readable arrays or normalized hierarchy

One thing I like about XmlExtractKit is that it makes this tradeoff
explicit.

There are two main output styles:

readable output, via FastXmlToArray::prettyPrint();
normalized output, via FastXmlToArray::convert().

They solve related but different problems.

Readable output: best for application code

If your goal is to move XML into ordinary PHP code quickly, readable arrays are usually the right default.

Here is a direct conversion example:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>
XML;

$result = FastXmlToArray::prettyPrint($xml);

echo json_encode(
    $result,
    JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES
);

Output:

{
  "feed": {
    "@attributes": {
      "generated_at": "2026-03-28T09:00:00Z"
    },
    "offer": {
      "@attributes": {
        "id": "206111",
        "available": "true"
      },
      "name": "USB-C Dock",
      "price": {
        "@value": "129.90",
        "@attributes": {
          "currency": "USD"
        }
      },
      "picture": [
        "https://cdn.example.test/1.jpg",
        "https://cdn.example.test/2.jpg"
      ]
    }
  }
}

This output format is intentionally convenient.

It is useful when you want to:

return a structured payload from a service;
serialize data to JSON;
inspect logs or debug dumps;
pass a transformed record into validation or normalization code;
feed the result into downstream application logic.

The array shape follows a few simple rules:

attributes go under @attributes;
element text goes under @value when attributes are also present;
repeated child tags become indexed arrays.

That is exactly the kind of shape that works well in typical modern PHP code.

When normalized output is the better choice

Readable output is great for many pipelines, but sometimes you want a structure that is more explicit and more stable for traversal.

That is where FastXmlToArray::convert() comes in.

Instead of optimizing for immediate readability, it gives each node the same predictable contract:

n = element name;
v = direct value;
a = attributes;
s = child sequence.

Here is the same feed converted into normalized hierarchy form:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>
XML;

$result = FastXmlToArray::convert($xml);

var_export($result);

Output:

array (
  'n' => 'feed',
  'a' =>
  array (
    'generated_at' => '2026-03-28T09:00:00Z',
  ),
  's' =>
  array (
    0 =>
    array (
      'n' => 'offer',
      'a' =>
      array (
        'id' => '206111',
        'available' => 'true',
      ),
      's' =>
      array (
        0 =>
        array (
          'n' => 'name',
          'v' => 'USB-C Dock',
        ),
        1 =>
        array (
          'n' => 'price',
          'v' => '129.90',
          'a' =>
          array (
            'currency' => 'USD',
          ),
        ),
        2 =>
        array (
          'n' => 'picture',
          'v' => 'https://cdn.example.test/1.jpg',
        ),
        3 =>
        array (
          'n' => 'picture',
          'v' => 'https://cdn.example.test/2.jpg',
        ),
      ),
    ),
  ),
)

This output is not as immediately pleasant to read, but it is very useful when you care about consistent traversal and adapters.

That becomes valuable when you want to:

build wrappers on top of a stable node contract;
walk the structure programmatically;
distinguish explicitly between element names, values, attributes, and children;
create internal tooling that should not depend on the shape of one specific XML document.

In other words, prettyPrint() is great when the output is the destination. convert() is great when the output is an intermediate representation.

Arrays are only useful if they are easy to navigate

Sometimes a plain array is enough.

Sometimes you want something slightly higher-level without going back to low-level XML logic.

That is where XmlElement fits very nicely.

You can take the normalized hierarchy returned by FastXmlToArray::convert() and wrap it in XmlElement for convenient traversal.

Here is a simple example:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;
use SbWereWolf\XmlNavigator\Navigation\XmlElement;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<catalog region="eu">
  <offer id="1001" available="true">
    <name>Keyboard</name>
    <tag>office</tag>
    <tag>usb</tag>
  </offer>
</catalog>
XML;

$root = new XmlElement(FastXmlToArray::convert($xml));
$offer = $root->pull('offer')->current();

echo $root->name() . PHP_EOL;
echo $root->get('region') . PHP_EOL;
echo ($root->hasElement('offer') ? 'yes' : 'no') . PHP_EOL;

foreach ($offer->attributes() as $attribute) {
    echo $attribute->name() . '=' . $attribute->value() . PHP_EOL;
}

$tagValues = array_map(
    static fn (XmlElement $tag): string => $tag->value(),
    $offer->elements('tag')
);

var_export($tagValues);

This is a useful middle ground.

The data is still array-based and application-friendly, but navigation becomes clearer:

name() for the current element name;
get() for attributes;
hasElement() to check for children;
pull() or elements() to navigate down the structure.

This is often cleaner than passing around raw nested arrays with hardcoded indexes everywhere.

Why this matters in feed processing

Feed processing is usually repetitive.

You receive XML, extract records, normalize them, validate them, and push them further into the pipeline.

That means the most practical XML question is often not:

“Which library can represent XML most completely?”

It is:

“Which approach gets me from XML to application-ready records with the least friction?”

That is why plain PHP arrays are such a strong target format for feed work.

They are easy to:

inspect;
serialize;
compare in tests;
transform;
validate;
store;
hand off to other services.

By contrast, keeping XML structures alive deep into the business layer usually increases the amount of incidental complexity.

What about large feeds?

For large XML feeds, the array-conversion story should not force you back into full-document loading.

This is where the streaming entry points matter.

If you want readable application-friendly output directly from selected nodes in a large document, there is FastXmlParser::extractPrettyPrint().

Here is a compact example:

use SbWereWolf\XmlNavigator\Parsing\FastXmlParser;

require_once __DIR__ . '/vendor/autoload.php';

$uri = tempnam(sys_get_temp_dir(), 'xml-extract-kit-');
file_put_contents($uri, <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <offer id="1">
    <name>Keyboard</name>
    <price>49.90</price>
  </offer>
  <service id="s-1">
    <name>Warranty</name>
  </service>
  <offer id="2">
    <name>Mouse</name>
    <price>19.90</price>
  </offer>
</catalog>
XML);

$reader = XMLReader::open($uri);

if ($reader === false) {
    throw new RuntimeException('Cannot open XML file.');
}

$offers = FastXmlParser::extractPrettyPrint(
    $reader,
    static fn (XMLReader $cursor): bool => $cursor->name === 'offer'
);

foreach ($offers as $offer) {
    echo json_encode(
        $offer,
        JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES
    ) . PHP_EOL;
}

$reader->close();
unlink($uri);

That gives you the same extraction-first workflow as in the earlier articles, but with output that is already convenient for application
code.

So the package is not making you choose between streaming and readable arrays.

It is designed to give you both.

A useful decision rule

For practical PHP work, I think the following rule holds up well:

use prettyPrint() when you want readable arrays now;
use convert() when you want a stable internal node model;
use XmlElement when you want to traverse normalized arrays more comfortably;
use extractPrettyPrint() when the XML is large and you only want selected records in readable form;
use extractHierarchy() when the XML is large and you want selected records in normalized form.

This is a much more actionable way to think about XML work than asking for a single “best XML library.”

One more practical point: XML should not leak everywhere

I think one of the easiest mistakes in integration code is to let transport concerns leak too far.

A feed arrives as XML, so suddenly everything downstream starts thinking in XML terms:

node trees;
cursor state;
fragment parsing;
nested traversal rules.

That is usually unnecessary.

A much cleaner architecture is:

receive XML;
convert it into a representation your application actually likes;
keep business logic focused on plain PHP data.

This is exactly why array-first conversion is so useful. It creates
a boundary.

The XML stays near the integration edge, where it belongs.

Conclusion

In modern PHP projects, XML is often not the thing you want to work with. It is the thing you need to get past.

That is why converting XML feeds to plain PHP arrays is such a practical strategy.

Readable arrays are ideal when you want immediate application-friendly data. Normalized arrays are ideal when you want a stable traversal model. And for large feeds, streaming extraction lets you keep the memory-safe approach without sacrificing useful output.

That combination is what I wanted from XmlExtractKit:

XML as input;
arrays as output;
streaming when needed;
low friction in the application layer.

If that is the kind of PHP XML workflow you deal with, sbwerewolf/xml-navigator is built for exactly that use case.

Try it

composer require sbwerewolf/xml-navigator

Explore the demo project

git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
cd xml-extract-kit-demo-repo
composer install

DEV Community