DEV Community

Nicholas Volkhin
Nicholas Volkhin

Posted on

Converting XML Feeds to Plain PHP Arrays in Modern PHP

When people say they need to “work with XML” in PHP, that phrasing is
often already slightly misleading.

In most business applications, XML is not the format you actually
want to keep around.

It is just the format you received.

A supplier feed arrives as XML. A marketplace export arrives as XML.
A partner integration still speaks XML. A legacy endpoint responds
with XML. But once the data enters your application, the rest of the
code usually does not want an XML tree.

It wants ordinary PHP data.

That is the practical framing I use in modern PHP projects:

XML is usually a transport format. The real goal is to convert the
useful parts into plain PHP arrays as early as possible.

Once you look at the problem this way, a lot of implementation
decisions become much clearer.

Why arrays are usually the real target

Most application code does not benefit from carrying XML semantics
deeper into the stack than necessary.

Your service layer, validation logic, queue payloads, DTO mappers,
logging, database writers, and JSON APIs usually work best with plain
associative arrays.

That means the useful pipeline often looks like this:

XML feed → extracted records → plain PHP arrays → validation / 
normalization / persistence
Enter fullscreen mode Exit fullscreen mode

This is one of the reasons I built XmlExtractKit for PHP,
published as sbwerewolf/xml-navigator.

The package is designed around a very boring but very common need:

  • take XML input;
  • extract the records that matter;
  • get plain PHP arrays back;
  • keep the rest of the application free from low-level XML handling.

That is a better fit for modern application code than dragging cursor
logic or DOM structures through multiple layers.

A typical XML feed problem

Suppose a partner sends you a product feed like this:

<?xml version="1.0" encoding="UTF-8"?>
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>
Enter fullscreen mode Exit fullscreen mode

What does the rest of your application usually want from this?

Not an XML tree.

Usually something closer to this:

[
    'feed' => [
        '@attributes' => [
            'generated_at' => '2026-03-28T09:00:00Z',
        ],
        'offer' => [
            '@attributes' => [
                'id' => '206111',
                'available' => 'true',
            ],
            'name' => 'USB-C Dock',
            'price' => [
                '@value' => '129.90',
                '@attributes' => [
                    'currency' => 'USD',
                ],
            ],
            'picture' => [
                'https://cdn.example.test/1.jpg',
                'https://cdn.example.test/2.jpg',
            ],
        ],
    ],
]
Enter fullscreen mode Exit fullscreen mode

That structure is already much more useful.

You can serialize it, validate it, map it to a DTO, send it to a
queue, store it, or normalize it further.

That is why I think “XML to arrays” is a much more practical category
than “XML processing” for a lot of real PHP work.

The first decision: readable arrays or normalized hierarchy

One thing I like about XmlExtractKit is that it makes this tradeoff
explicit.

There are two main output styles:

  • readable output, via FastXmlToArray::prettyPrint();
  • normalized output, via FastXmlToArray::convert().

They solve related but different problems.

Readable output: best for application code

If your goal is to move XML into ordinary PHP code quickly, readable
arrays are usually the right default.

Here is a direct conversion example:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>
XML;

$result = FastXmlToArray::prettyPrint($xml);

echo json_encode(
    $result,
    JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES
);
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "feed": {
    "@attributes": {
      "generated_at": "2026-03-28T09:00:00Z"
    },
    "offer": {
      "@attributes": {
        "id": "206111",
        "available": "true"
      },
      "name": "USB-C Dock",
      "price": {
        "@value": "129.90",
        "@attributes": {
          "currency": "USD"
        }
      },
      "picture": [
        "https://cdn.example.test/1.jpg",
        "https://cdn.example.test/2.jpg"
      ]
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

This output format is intentionally convenient.

It is useful when you want to:

  • return a structured payload from a service;
  • serialize data to JSON;
  • inspect logs or debug dumps;
  • pass a transformed record into validation or normalization code;
  • feed the result into downstream application logic.

The array shape follows a few simple rules:

  • attributes go under @attributes;
  • element text goes under @value when attributes are also present;
  • repeated child tags become indexed arrays.

That is exactly the kind of shape that works well in typical modern
PHP code.

When normalized output is the better choice

Readable output is great for many pipelines, but sometimes you want a
structure that is more explicit and more stable for traversal.

That is where FastXmlToArray::convert() comes in.

Instead of optimizing for immediate readability, it gives each node
the same predictable contract:

  • n = element name;
  • v = direct value;
  • a = attributes;
  • s = child sequence.

Here is the same feed converted into normalized hierarchy form:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<feed generated_at="2026-03-28T09:00:00Z">
  <offer id="206111" available="true">
    <name>USB-C Dock</name>
    <price currency="USD">129.90</price>
    <picture>https://cdn.example.test/1.jpg</picture>
    <picture>https://cdn.example.test/2.jpg</picture>
  </offer>
</feed>
XML;

$result = FastXmlToArray::convert($xml);

var_export($result);
Enter fullscreen mode Exit fullscreen mode

Output:

array (
  'n' => 'feed',
  'a' =>
  array (
    'generated_at' => '2026-03-28T09:00:00Z',
  ),
  's' =>
  array (
    0 =>
    array (
      'n' => 'offer',
      'a' =>
      array (
        'id' => '206111',
        'available' => 'true',
      ),
      's' =>
      array (
        0 =>
        array (
          'n' => 'name',
          'v' => 'USB-C Dock',
        ),
        1 =>
        array (
          'n' => 'price',
          'v' => '129.90',
          'a' =>
          array (
            'currency' => 'USD',
          ),
        ),
        2 =>
        array (
          'n' => 'picture',
          'v' => 'https://cdn.example.test/1.jpg',
        ),
        3 =>
        array (
          'n' => 'picture',
          'v' => 'https://cdn.example.test/2.jpg',
        ),
      ),
    ),
  ),
)
Enter fullscreen mode Exit fullscreen mode

This output is not as immediately pleasant to read, but it is very
useful when you care about consistent traversal and adapters.

That becomes valuable when you want to:

  • build wrappers on top of a stable node contract;
  • walk the structure programmatically;
  • distinguish explicitly between element names, values, attributes, and children;
  • create internal tooling that should not depend on the shape of one specific XML document.

In other words, prettyPrint() is great when the output is the
destination. convert() is great when the output is an intermediate
representation.

Arrays are only useful if they are easy to navigate

Sometimes a plain array is enough.

Sometimes you want something slightly higher-level without going back
to low-level XML logic.

That is where XmlElement fits very nicely.

You can take the normalized hierarchy returned by
FastXmlToArray::convert() and wrap it in XmlElement for
convenient traversal.

Here is a simple example:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;
use SbWereWolf\XmlNavigator\Navigation\XmlElement;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<catalog region="eu">
  <offer id="1001" available="true">
    <name>Keyboard</name>
    <tag>office</tag>
    <tag>usb</tag>
  </offer>
</catalog>
XML;

$root = new XmlElement(FastXmlToArray::convert($xml));
$offer = $root->pull('offer')->current();

echo $root->name() . PHP_EOL;
echo $root->get('region') . PHP_EOL;
echo ($root->hasElement('offer') ? 'yes' : 'no') . PHP_EOL;

foreach ($offer->attributes() as $attribute) {
    echo $attribute->name() . '=' . $attribute->value() . PHP_EOL;
}

$tagValues = array_map(
    static fn (XmlElement $tag): string => $tag->value(),
    $offer->elements('tag')
);

var_export($tagValues);
Enter fullscreen mode Exit fullscreen mode

This is a useful middle ground.

The data is still array-based and application-friendly, but
navigation becomes clearer:

  • name() for the current element name;
  • get() for attributes;
  • hasElement() to check for children;
  • pull() or elements() to navigate down the structure.

This is often cleaner than passing around raw nested arrays with
hardcoded indexes everywhere.

Why this matters in feed processing

Feed processing is usually repetitive.

You receive XML, extract records, normalize them, validate them, and
push them further into the pipeline.

That means the most practical XML question is often not:

“Which library can represent XML most completely?”

It is:

“Which approach gets me from XML to application-ready records with
the least friction?”

That is why plain PHP arrays are such a strong target format for feed
work.

They are easy to:

  • inspect;
  • serialize;
  • compare in tests;
  • transform;
  • validate;
  • store;
  • hand off to other services.

By contrast, keeping XML structures alive deep into the business
layer usually increases the amount of incidental complexity.

What about large feeds?

For large XML feeds, the array-conversion story should not force you
back into full-document loading.

This is where the streaming entry points matter.

If you want readable application-friendly output directly from
selected nodes in a large document, there is
FastXmlParser::extractPrettyPrint().

Here is a compact example:

use SbWereWolf\XmlNavigator\Parsing\FastXmlParser;

require_once __DIR__ . '/vendor/autoload.php';

$uri = tempnam(sys_get_temp_dir(), 'xml-extract-kit-');
file_put_contents($uri, <<<'XML'
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
  <offer id="1">
    <name>Keyboard</name>
    <price>49.90</price>
  </offer>
  <service id="s-1">
    <name>Warranty</name>
  </service>
  <offer id="2">
    <name>Mouse</name>
    <price>19.90</price>
  </offer>
</catalog>
XML);

$reader = XMLReader::open($uri);

if ($reader === false) {
    throw new RuntimeException('Cannot open XML file.');
}

$offers = FastXmlParser::extractPrettyPrint(
    $reader,
    static fn (XMLReader $cursor): bool => $cursor->name === 'offer'
);

foreach ($offers as $offer) {
    echo json_encode(
        $offer,
        JSON_PRETTY_PRINT | JSON_UNESCAPED_SLASHES
    ) . PHP_EOL;
}

$reader->close();
unlink($uri);
Enter fullscreen mode Exit fullscreen mode

That gives you the same extraction-first workflow as in the earlier
articles, but with output that is already convenient for application
code.

So the package is not making you choose between streaming and
readable arrays.

It is designed to give you both.

A useful decision rule

For practical PHP work, I think the following rule holds up well:

  • use prettyPrint() when you want readable arrays now;
  • use convert() when you want a stable internal node model;
  • use XmlElement when you want to traverse normalized arrays more comfortably;
  • use extractPrettyPrint() when the XML is large and you only want selected records in readable form;
  • use extractHierarchy() when the XML is large and you want selected records in normalized form.

This is a much more actionable way to think about XML work than
asking for a single “best XML library.”

One more practical point: XML should not leak everywhere

I think one of the easiest mistakes in integration code is to let
transport concerns leak too far.

A feed arrives as XML, so suddenly everything downstream starts
thinking in XML terms:

  • node trees;
  • cursor state;
  • fragment parsing;
  • nested traversal rules.

That is usually unnecessary.

A much cleaner architecture is:

  1. receive XML;
  2. convert it into a representation your application actually likes;
  3. keep business logic focused on plain PHP data.

This is exactly why array-first conversion is so useful. It creates
a boundary.

The XML stays near the integration edge, where it belongs.

Conclusion

In modern PHP projects, XML is often not the thing you want to work
with. It is the thing you need to get past.

That is why converting XML feeds to plain PHP arrays is such a
practical strategy.

Readable arrays are ideal when you want immediate
application-friendly data. Normalized arrays are ideal when you want
a stable traversal model. And for large feeds, streaming extraction
lets you keep the memory-safe approach without sacrificing useful
output.

That combination is what I wanted from XmlExtractKit:

  • XML as input;
  • arrays as output;
  • streaming when needed;
  • low friction in the application layer.

If that is the kind of PHP XML workflow you deal with,
sbwerewolf/xml-navigator is built for exactly that use case.

composer require sbwerewolf/xml-navigator
Enter fullscreen mode Exit fullscreen mode

Top comments (0)