Nicholas Volkhin

Posted on Apr 17

When to Use XmlExtractKit Instead of General XML Tools in PHP

#opensource #php #xml #parsing

One of the easiest ways to make XML work painful in PHP is to start with the wrong question.

A lot of developers ask:

“What is the best XML library for PHP?”

That sounds reasonable, but it is usually the wrong framing.

There is no single best XML tool for every job.

The real question is narrower:

What kind of XML task am I solving?

That matters because XML work in PHP usually falls into very
different categories:

load a small document and read a few values;
manipulate the full document tree;
stream through a large file safely;
extract repeated business records from large XML;
validate or transform XML as XML.

Those are not the same problem, so they should not lead to the same tool choice.

This is the distinction I care about when I use or build XML tooling for PHP.

My package, XmlExtractKit (sbwerewolf/xml-navigator), is not trying to win every XML scenario. It is built for one narrower and very common class of work:

large XML → selected nodes → plain PHP arrays

If that is your actual task, it can be a much better fit than a more general XML tool. If it is not your task, you should probably use something else.

First: what XmlExtractKit is actually for

Before comparing tool categories, it helps to be explicit about the package goal.

XmlExtractKit is built for the boring XML jobs that show up in real systems:

feeds;
imports and exports;
marketplace catalogs;
partner integrations;
ETL pipelines;
SOAP-ish payloads;
legacy endpoints where XML is still the transport format.

In those systems, the application usually does not want to live inside an XML tree.

It usually wants to:

read XML safely;
extract only matching records;
convert them to arrays;
continue with validation, normalization, persistence, or queue publishing.

That is why the package is centered around entry points such as:

FastXmlToArray::prettyPrint();
FastXmlToArray::convert();
FastXmlParser::extractPrettyPrint();
FastXmlParser::extractHierarchy();
XmlElement for traversal of normalized arrays.

So the right comparison is not “Is this package better than every XML library?”

The right comparison is:

Is my task primarily full-document XML work, or is it extraction-oriented application work?

Category 1: small XML, simple reads

Sometimes the job is tiny.

You receive a small XML payload, you need two or three values, and that is it.

Typical cases:

read a config-like XML file;
parse a short API response;
inspect a small test fixture;
run a one-off maintenance script.

For work like that, convenience usually matters more than architecture.

A simple API that loads the whole document can be perfectly fine because:

the file is small;
memory usage is not a concern;
the code is short-lived or trivial;
you do not need a reusable extraction workflow.

This is not where I would reach for XmlExtractKit first.

If the XML is small and the task is simple, the cheapest solution is often the right one.

Category 2: full-document manipulation

There is another class of XML work that is very different from extraction.

Sometimes you really do need the whole document tree.

Examples:

insert or remove nodes across different branches;
reorder sections of the document;
update attributes in multiple places;
build or rewrite XML as XML;
perform document-level transformations.

That is a full-document problem.

In that case, tree-oriented tools and more general XML tooling make much more sense than an extraction-first package.

Why?

Because the center of gravity is different.

You are not trying to stream through repeated records and emit arrays. You are trying to work with the XML document itself as a structured tree.

XmlExtractKit is not trying to be:

an XML editor;
a full XML query language;
a schema validation framework;
a document transformation engine;
a large abstraction layer over every XML concern.

If your real work is document-wide manipulation, use tools designed for document-wide manipulation.

Category 3: large XML, but low-level control is enough

Now we get closer to the problems XmlExtractKit is meant to address.

Suppose the XML file is large.

You know loading it fully is a bad idea, so you switch to XMLReader and stream through it node by node. That is already the correct direction.

For some projects, raw XMLReader is enough.

That is true when:

the extraction rule is very simple;
the script is one-off;
the output shape is minimal;
you do not expect to reuse the logic;
you are comfortable writing and maintaining cursor-level code.

In those cases, a hand-written loop is often fine.

A minimal baseline might look like this:

$reader = XMLReader::open('feed.xml');

while ($reader->read()) {
    if (
        $reader->nodeType === XMLReader::ELEMENT
        && $reader->name === 'offer'
    ) {
        $xml = $reader->readOuterXML();
        $offer = simplexml_load_string($xml);

        $data = [
            'id' => (string) $offer['id'],
            'name' => (string) $offer->name,
            'price' => (string) $offer->price,
        ];

        // process $data
    }
}

$reader->close();

There is nothing wrong with this if the task stays small.

The problem is that many XML integrations do not stay small.

Sooner or later, you accumulate:

more fields;
optional nodes;
attributes and values in different places;
nested structures;
repeated child elements;
normalization rules;
multiple feeds with similar logic;
duplication across projects.

That is the point where low-level control stops being the main concern.

The main concern becomes maintainable extraction.

Category 4: large XML and repeated extraction tasks

This is the sweet spot for XmlExtractKit.

If your task looks like this:

open a large XML stream;
select only matching elements;
convert them to arrays;
hand those arrays to application code;
repeat this pattern across projects;

then a focused extraction toolkit is often the better choice.

The value is not just performance. The value is the shape of the code.

A streaming extraction example with XmlExtractKit looks like this:

use SbWereWolf\XmlNavigator\Parsing\FastXmlParser;

require_once __DIR__ . '/vendor/autoload.php';

$reader = XMLReader::open('feed.xml');

foreach (
    FastXmlParser::extractPrettyPrint(
        $reader,
        static fn (XMLReader $cursor): bool =>
            $cursor->nodeType === XMLReader::ELEMENT
            && $cursor->name === 'offer'
    ) as $offer
) {
    // process $offer as a plain PHP array
}

$reader->close();

Or, when you want a stable normalized structure for traversal and not just a pretty printed array:

use SbWereWolf\XmlNavigator\Parsing\FastXmlParser;

require_once __DIR__ . '/vendor/autoload.php';

$reader = XMLReader::open('feed.xml');

foreach (
    FastXmlParser::extractHierarchy(
        $reader,
        static fn (XMLReader $cursor): bool =>
            $cursor->nodeType === XMLReader::ELEMENT
            && $cursor->name === 'offer'
    ) as $offer
) {
    // process normalized hierarchy
}

$reader->close();

This is still a streaming model. It still relies on XMLReader underneath. But the application code is now centered on the real task:

target the elements you care about;
get arrays back;
continue with your business pipeline.

That is the exact problem the package is trying to solve.

Category 5: XML as XML versus XML as transport

This distinction is more important than it sounds.

Some teams work with XML as a primary document format. In that world, XML structure itself is the thing they care about most.

Other teams work with XML only because an external system forces them to.

In those projects, XML is just a transport envelope.

The application does not want to “stay in XML.” It wants to get out of XML as early as possible.

That is usually what happens in:

feed processing;
integration middleware;
import jobs;
back-office syncs;
data ingestion pipelines.

If that describes your system, you will usually benefit more from:

XML stream → selected nodes → arrays

than from a broad, document-centric XML toolkit.

That is why FastXmlToArray::prettyPrint() and FastXmlToArray::convert() are important entry points in XmlExtractKit. They help you turn XML into application-friendly structures early instead of making the rest of your code care about cursor state or DOM traversal.

Category 6: normalized traversal after conversion

There is one more case where a focused extraction/conversion toolkit helps.

Sometimes you do not want a raw “pretty” array for immediate processing. You want a stable internal shape you can traverse predictably.

That is where FastXmlToArray::convert() and XmlElement fit nicely.

For example:

use SbWereWolf\XmlNavigator\Conversion\FastXmlToArray;
use SbWereWolf\XmlNavigator\Navigation\XmlElement;

require_once __DIR__ . '/vendor/autoload.php';

$xml = <<<'XML'
<product sku="KB-1001">
    <name>Mechanical Keyboard</name>
    <price currency="USD">129.90</price>
</product>
XML;

$root = new XmlElement(FastXmlToArray::convert($xml));

$name = $root->pull('name')->current()?->value();
$currency = $root->pull('price')->current()?->get('currency');
$price = $root->pull('price')->current()?->value();

This is useful when you want:

a stable normalized hierarchy;
traversal without re-parsing XML repeatedly;
a representation that is still close to XML structure, but easier to work with than raw cursor logic.

That is another area where a focused toolkit can be more practical than either a tiny one-off parser or a much broader XML stack.

A simple decision matrix

Here is the practical version.

Your task	Best starting point
Small XML, quick read, no reuse expected	A simple full-document approach
Full-document manipulation or transformation	A document/tree-oriented XML tool
Large XML, one-off extraction, you are comfortable with low-level code	Raw `XMLReader`
Large XML, repeated record extraction, output should be arrays	`FastXmlParser::extractPrettyPrint()`
Large XML, repeated record extraction, but you want normalized hierarchy	`FastXmlParser::extractHierarchy()`
Convert XML to arrays for later traversal	`FastXmlToArray::convert()` + `XmlElement`
Convert XML to readable PHP arrays immediately	`FastXmlToArray::prettyPrint()`
You need custom key names in the output structure	`XmlConverter` or `XmlParser`

This is the most useful way to think about the package.

Not as a universal XML winner, but as the right answer for a very specific class of jobs.

When XmlExtractKit is probably the better fit

I would reach for XmlExtractKit when most of these are true:

the XML can be large;
I only need some of the document;
the file contains repeated business records;
I want arrays, not DOM-heavy application code;
I expect similar extraction tasks in more than one project;
I want to keep the rest of the system unaware of XML cursor mechanics.

Typical examples include:

supplier catalog imports;
marketplace feed ingestion;
partner exports;
ETL pipelines;
XML payload normalization before queueing or persistence;
old integrations being consumed by otherwise modern PHP systems.

When a general XML tool is probably the better fit

I would not pick XmlExtractKit first when most of these are true:

the XML is small;
I need full-document traversal and rewriting;
the end result should remain XML, not arrays;
I need schema-heavy or transformation-heavy tooling;
the work is more about XML documents than application data extraction.

That is not a weakness of the package. It is exactly what a focused tool should look like.

A sharp tool is useful because it knows what it is not trying to be.

The practical mistake to avoid

The biggest mistake is to force every XML task into the same mental model.

Developers often do one of these two things:

use a full-document approach for a large extraction problem;
use low-level cursor code for a recurring application-level extraction problem that really wants a better abstraction.

Both create unnecessary cost.

The first creates avoidable memory pressure and awkward processing flows.

The second creates avoidable glue code and long-term maintenance pain.

XmlExtractKit exists in the space between those mistakes.

It is for the case where:

XMLReader is the right low-level engine,
but raw XMLReader is too close to the metal for the amount of
extraction work you actually do.

Conclusion

The useful question is not:

“What is the best XML tool in PHP?”

The useful question is:

“Am I manipulating XML documents, or extracting application data from XML streams?”

If your task is primarily document-centric, general XML tools are the right place to start.

If your task is primarily extraction-centric — especially for large feeds, repeated records, and array-based application pipelines — then XmlExtractKit can be a much better fit.

That is the core positioning of the package:

stream XML, extract only what matters, and keep working with plain PHP arrays.

If that is the problem you keep solving, then a focused tool is often more useful than a general one.

Try it

composer require sbwerewolf/xml-navigator

Explore the demo project

git clone https://github.com/SbWereWolf/xml-extract-kit-demo-repo.git
cd xml-extract-kit-demo-repo
composer install

DEV Community