Why Getting an XML Invoice
Doesn't End Manual AP Data Extraction
In Belgium, over one million businesses registered on the Peppol network in the first weeks of 2026. Croatia processed four million e-invoices in its first twenty-eight days. Thirteen EU countries now enforce mandatory e-invoicing, and seven more will join them before the end of 2027. The narrative accompanying these rollouts has been consistent: structured XML invoices will eliminate manual data entry, reduce errors, and deliver straight-through processing. But when Ardent Partners surveyed 204 AP organizations for its 2025 State of ePayables report, the numbers told a different story. Only 51.4% of invoices arrived electronically. Just 35.4% processed straight through without human touch. And 66% of AP teams still manually key invoice data into their ERP — a figure that rose from the previous year, not fell. This article examines the gap between what e-invoicing promised and what AP teams actually experience, and why that gap is structural, not transitional.
Key Takeaways
- E-invoicing mandates promise to end manual data entry — but 66% of AP (accounts payable) teams still key invoice data into their ERP by hand, and that number went up in 2025.
- Four structurally different XML schemas can land in the same inbox and all be EN 16931 compliant — because the standard was written for tax authority interoperability, not for what your ERP actually expects.
- Running one content-reading extraction pipeline collapses German XRechnung, French Factur-X, Italian FatturaPA, and emailed PDFs into the same six columns — none of them need a per-country XML mapping, and ImageToTable.ai handles the entire mixed-format batch in one run.
The Black Box: What Happens After an XML Invoice Lands in Your Inbox
An e-invoice is not just a digital document. It is a data payload — a structured XML or UBL file carrying invoice information in machine-readable fields according to the European standard EN 16931, published in 2017 by the European Committee for Standardization (CEN/TC 434). The standard defines over 160 semantic data fields: invoice number, issue date, seller and buyer tax identifiers, line-item quantities, unit prices, VAT categories, payment terms, delivery addresses, and on. In theory, this structured payload should flow directly from the supplier's system into the buyer's ERP — no human eyes, no keyboard, no copy-paste.
The theory breaks at the first step: your ERP.
Most ERP systems — SAP, Oracle NetSuite, Microsoft Dynamics 365, Workday — do not ingest raw EN 16931 XML natively. They expect invoice data in their own internal format, mapped to their own field names, through their own API or import template. A Peppol BIS 3.0 invoice arrives as UBL 2.1 XML with a tag structure like <cbc:InvoiceTypeCode>380</cbc:InvoiceTypeCode>. Your ERP expects a field called Invoice Type with a value of Commercial Invoice. Someone — or something — needs to translate. That translation layer is what the e-invoicing narrative treats as a solved problem. For many AP teams, it is not solved. It is where the manual work moves, not where it ends.
According to the Billentis/EESPA market report covering 2019-2025, less than 20% of businesses send structured electronic invoices via EDI or equivalent networks. Roughly two-thirds issue PDF invoices by email. Even among the businesses that receive XML invoices, the data rarely reaches the ERP without an intermediary step: a middleware platform, a Peppol access point, an integration layer, or — in the case of the 66% of AP teams still doing manual entry — a pair of human eyes reading from a visual representation of the XML and typing into a screen. The e-invoice is structured. The last mile is not.
An e-invoice is machine-readable by design. Your ERP is machine-readable by design. The problem is that they were not designed to read each other. Between them sits an integration layer that someone has to build, configure, and maintain — and for a surprising number of AP teams, that layer is still a person.
One Invoice, 164 Fields, 6 You Actually Need
EN 16931 specifies what an electronic invoice must or may contain, and the list is expansive: seller and buyer legal entities, tax representative, payee, delivery information, payment instructions, allowance and charge details, line-item-level tax breakdowns, invoice period, order reference, contract reference, project reference, and more. A fully populated EN 16931 invoice contains well over 100 discrete data points across multiple hierarchical nesting levels.
Your AP team needs six of them.
The fields your ERP actually requires for posting — supplier name, invoice number, invoice date, net amount, VAT amount, due date — are a small subset of what the XML contains. The other 150+ fields are noise. They are there for tax authority validation, for the Peppol network's routing logic, for the supplier's archival compliance. They are not there for you. But every integration that does a full XML import pulls all of them in, and someone needs to map, validate, and maintain those mappings for every supplier, every country, and every XML schema variant.
This reality points to a counterintuitive economics problem that most e-invoicing ROI models miss. The cost of setting up and maintaining full XML schema mappings across dozens or hundreds of suppliers can exceed the cost of simply extracting the six fields you need from any document format — XML, PDF, or image. The e-invoicing infrastructure was built to close the tax authority's information gap. It was not built to close your AP team's data extraction gap. Those are two different problems, and solving the first does not automatically solve the second.
The Vertex 2025 e-invoicing research report surveyed businesses across mandated markets and found that technology integration ranks as the number-one pain point for 55% of respondents, rising to 63% among companies operating in multiple countries. Half of all respondents flagged data governance as a significant concern. These are not companies that have failed to adopt e-invoicing. These are companies that have adopted it and are now dealing with what comes after.
The number of fields in an e-invoice that your AP team does not need is not a technical curiosity. It is the reason full XML import is often more expensive than selective extraction. Every field you don't need is a field you pay to map, validate, and maintain — across every supplier, every schema, and every ERP upgrade cycle.
Four Countries, Four XML Dialects, One AP Inbox
EN 16931 is a standard, not a format. It defines the semantic meaning of invoice fields and allows two syntaxes: UBL 2.1 and UN/CEFACT Cross Industry Invoice (CII). Each country then publishes a CIUS — a Core Invoice Usage Specification — that tailors the standard to national tax rules, adding or tightening field requirements. The result is a landscape where four structurally different XML schemas can arrive in the same inbox and all be "EN 16931 compliant" while being incompatible with each other's import mappings.
| Country | Schema / Format | Syntax Base | Container | What Makes It Different |
|---|---|---|---|---|
| Germany | XRechnung | UBL 2.1 or CII | Pure XML | No visual layer. Mandatory for B2G, expanding to B2B through 2028. The field for service date is not explicitly mandatory but the recipient must validate it under §14 UStG (German VAT Act) — a compliance gap that stops touchless processing. |
| Germany & France | ZUGFeRD / Factur-X | CII D22B | PDF/A-3 with embedded XML | Hybrid format. Five profiles from MINIMUM (header data only) to EXTENDED (full line-item detail). Discrepancies between the visual PDF layer and the embedded XML are a documented operational hazard. |
| Italy | FatturaPA | Custom XML | Pure XML via SdI | Pre-dates EN 16931. Mandatory since 2019 for all B2B, B2C, B2G. Uses its own XML schema with Italian-specific fields (CIG, CUP procurement codes) that have no equivalent in other national schemas. |
| Poland | KSeF FA(3) | Proprietary XML | Pure XML via national platform | Real-time clearance model. The tax authority validates every invoice before delivery. The XML schema is the FA(3) format — successor to FA(2) — and is not aligned with UBL or CII syntax. |
If your company operates across Germany, France, Italy, and Poland — a footprint that describes thousands of mid-market European businesses — your AP inbox receives four structurally different XML schemas that all call themselves e-invoices. You need four separate import mappings, four sets of validation rules, and four maintenance pipelines that break whenever a national tax authority updates its schema. The update cadence is not theoretical. Poland's KSeF migration from FA(2) to FA(3) required every integrated system to re-map its field definitions. France updated its PPF requirements between the 2025 pilot phase and the 2026 go-live. Germany's XRechnung specification is on version 3.0.1 as of early 2026.
This is not an argument against e-invoicing. It is an argument against the assumption that receiving structured data means receiving data in your structure. The EN 16931 standard was designed for interoperability between tax authorities, not between your supplier's ERP and yours. The national CIUS layer exists precisely because each country's tax code requires different fields, different codes, different validations. Interoperability at the tax level does not deliver interoperability at the AP workflow level — and the gap between those two things is what AP teams live in every day.
If you are building a separate XML import pipeline for each country your suppliers operate in, you are solving a problem that multiplies with every new mandate. The alternative — reading what's actually on the invoice rather than what schema generated it — collapses all four countries into one extraction pipeline.
The PDF That Won't Go Away
The e-invoicing mandate timeline is accelerating. Belgium went live in January 2026 with near-universal scope. Poland followed in February for large taxpayers. France activates in September 2026 for large and mid-sized companies. Germany's B2B mandate phases in through 2028. For a detailed breakdown of each deadline and its legal basis, see our Europe e-invoicing mandate timeline.
But every mandate contains exclusions, and those exclusions produce a PDF residue that no upcoming deadline will eliminate. The pattern is consistent across jurisdictions:
- Cross-border suppliers are exempt. A German company receiving invoices from a US or Chinese supplier is not covered by any EU e-invoicing mandate. Those suppliers will continue sending PDFs, email attachments, and paper invoices indefinitely.
- B2C transactions are excluded. Consumer invoices, receipts, and retail transactions fall outside structured e-invoicing scope entirely — and yet these documents frequently land in AP workflows for expense reconciliation.
- Small businesses get delayed timelines or permanent exemptions. France defers issuance obligations for micro-enterprises to September 2027. Germany's threshold-based phase-in means businesses below certain turnover levels have no obligation at all. These are often exactly the long-tail suppliers whose invoices already take the most processing time.
- Existing supplier relationships don't convert overnight. A supplier integrated for EDI in 2015 may have no incentive to migrate to Peppol BIS 3.0. Their PDF workflow works. Your mandate doesn't change their systems — it changes your obligation to report, not their obligation to format.
Ardent Partners' data confirms the scale: only 51.4% of invoices arrive electronically, and that figure reflects two decades of e-invoicing progress. The remaining 48.6% — PDF attachments, scanned paper, email bodies, faxes — represent a structural half of the invoice volume that no mandate timeline will bring to zero. Even in Italy, where the SdI system has been mandatory since 2019 and processes over 2 billion e-invoices annually, cross-border PDF invoices continue to arrive daily. The mandate guarantees government reporting. It does not guarantee a clean AP inbox.
Gennai's 2026 State of Invoice Automation report puts the full automation figure at 8% of finance teams. Eight percent. After two decades of e-invoicing development, after billions in market investment, after thirteen live European mandates. The gap is not a transitional inconvenience. It is the permanent operating condition of a global AP function.
E-invoicing mandates close the tax authority's information gap. Your AP team's data extraction gap persists through an entirely different set of channels — cross-border trade, B2C spillover, supplier inertia, and the long tail of businesses that your mandate does not cover. These channels are not closing. They are structural features of global commerce.
One Pipeline for Both Worlds
If your AP team runs one workflow for XML invoices and another for PDF invoices, you have not automated invoice processing. You have doubled the number of workflows your team needs to maintain, each with its own break points, its own integration surface, its own training requirement. The alternative is not to abandon e-invoicing compliance. It is to run a single extraction pipeline that handles both structured XML and unstructured PDF through the same lens, producing the same output schema regardless of what format arrived.
This is the approach that the column-name extraction model was designed for. Instead of building per-country XML schema mappings, you define the fields your ERP actually needs — Supplier, Invoice #, Date, Net, VAT, Due Date — once. Those six column names become the extraction target for every document that enters the pipeline, whether it is a Peppol BIS 3.0 XML from a Belgian supplier, a Factur-X hybrid PDF from a French vendor, a scanned paper invoice from a Chinese manufacturer, or an emailed PDF from a domestic SME not yet covered by the mandate.
The mechanism matters. Unlike schema-based import, which requires precise knowledge of each XML tag structure, Custom Column Extraction reads the content of the document — the actual invoice data — and locates the values that match your column definitions by understanding what each field means, not where it sits in an XML hierarchy. A UBL invoice that writes the invoice number as <cbc:ID>INV-2026-0451</cbc:ID> and a PDF that prints "Invoice INV-2026-0451" in the top-right corner produce the same extraction result into your Invoice # column. No schema mapping. No country-specific configuration. One pipeline.
For a deeper look at how this approach works across different invoice formats, languages, and number conventions, see our guide on extracting data from invoices with different formats into one unified table.
Files are processed securely and not stored.
FAQ
Doesn't e-invoicing eliminate the need for data extraction entirely?
It eliminates one category of data extraction — the kind where a human reads a PDF and types values into an ERP. It does not eliminate the need for a data translation layer between the supplier's XML schema and your ERP's field structure. For companies that have built and maintain that translation layer across all their suppliers and all their operating countries, e-invoicing does deliver straight-through processing. The Ardent Partners data shows that only 8% of finance teams have reached that state. For the other 92%, an extraction layer that reads both XML and PDF through the same mechanism replaces two separate manual workflows with one automated one.
Can't I just build XML import mappings once per country and be done?
You can, and some organizations do. The maintenance cost is what most initial estimates underestimate. National tax authorities update their schemas — Poland migrated from FA(2) to FA(3), Germany's XRechnung spec is on version 3.0.1, France's PPF requirements evolved between pilot and go-live. Each change requires regression testing across the supplier base. For a company operating in four countries with 200 suppliers, the mapping maintenance program is a recurring operational expense, not a one-time IT project. A visual extraction approach sidesteps this by not depending on any XML tag structure — it reads the data itself, not the schema that delivered it.
What about suppliers that send both XML and PDF versions of the same invoice?
This is common with ZUGFeRD/Factur-X hybrid formats, which embed an XML data layer inside a PDF/A-3 container. The PDF layer and the XML layer can diverge — the PDF might contain a complete line-item breakdown while the XML is a MINIMUM profile with no line items, or the XML might reflect a corrected version while the PDF shows the original. A visual extraction approach reads the actual rendered content, which is the version your AP team would see and verify against. It also catches discrepancies that a blind XML import would miss.
How does batch processing work when I have a mix of XML and PDF invoices?
With a unified extraction pipeline, batch processing treats XML and PDF as two input formats for the same job. Upload a folder containing 20 Peppol XMLs from Belgian suppliers, 15 emailed PDFs from domestic vendors, and 5 scanned paper invoices from cross-border suppliers — define your columns once, process the entire batch in a single run, and receive one spreadsheet with all 40 invoices in consistent columns. There is no pre-sorting by format, no separate workflows, no manual re-entry for the PDF portion of the batch.
Does this approach work with Peppol specifically?
Yes. Peppol is a transport network, not an invoice format. The actual file format is UBL 2.1 XML structured according to Peppol BIS Billing 3.0. A visual extraction approach reads the invoice data from the content layer, regardless of whether it arrived via Peppol, email, supplier portal, or any other channel. The Peppol network solves the delivery problem — getting the invoice from the supplier to you. The extraction layer solves the data problem — getting the invoice data into your ERP in the structure your ERP expects.
The Metric That Matters
The e-invoicing industry measures progress by mandate coverage: how many countries, how many businesses, how many invoices pass through government platforms. Those metrics measure tax compliance — a legitimate and important goal. They do not measure what AP teams care about: how many invoices posted to the ERP today without a human touching a keyboard.
If that second number is lower than you expected after your e-invoicing investment, the problem is not that you chose the wrong e-invoicing platform. It is that e-invoicing platforms were designed to solve a different problem. Yours is not the gap between paper and digital. It is the gap between "arrived in the right format" and "arrived in the right fields." Those are two separate gaps. Closing the first was never going to close the second.
The extraction layer that sits between your e-invoicing platform and your ERP is not a temporary bridge to a fully automated future. It is the permanent infrastructure of a world where supplier invoices arrive in multiple formats from multiple jurisdictions under multiple regulatory regimes — and always will. The question is whether that extraction layer is a person, a collection of brittle per-country XML mappings, or a single pipeline that reads what's on the invoice regardless of how it got there.
Test it on your own invoices. XML and PDF, in the same batch, against the columns your ERP actually needs. See if the gap between "received" and "posted" shrinks to what the e-invoicing mandate always implied it would.