How to Extract Data from Spanish Invoices (Facturas)
Into Excel
A Spanish invoice (factura) looks like a standard EU invoice at first glance — supplier name, date, line items, total. But three fields make it structurally different from a German, French, or Italian invoice: a tax ID in NIF format that follows an 8-digits-plus-check-letter pattern, a tax breakdown that can carry up to three IVA rates on the same page (21%, 10%, and 4%), and an IRPF withholding line that subtracts from the total instead of adding to it. If you type these fields into Excel manually, the format differences create friction at every step. If you use a template-based extraction tool trained on generic EU invoice layouts, the NIF validation fails, the IVA breakdown lands in the wrong columns, and the IRPF field goes entirely unrecognized. This article walks through every field on a Spanish invoice, explains what each one means for extraction, and shows how to get clean, reconciled data into a spreadsheet without manual entry.
Key Takeaways
- Your extraction tool works perfectly on German and Italian invoices — and will silently produce wrong totals on a Spanish factura.
- Three IVA rates (21%, 10%, 4%) on the same page break template-based tools that expect one VAT line, and IRPF retention — a tax line that subtracts from the total instead of adding to it — goes unrecognized entirely.
- A semantic reader like ImageToTable.ai extracts every IVA rate, IRPF withholding, and NIF check letter regardless of layout, so one reconciliation formula (Total = Base + ΣIVA − IRPF) catches every mismatched row in the spreadsheet.
What Makes a Spanish Invoice Different from Other EU Formats
The EU's VAT Directive (2006/112/EC) sets a common framework for invoice content across all member states. Every EU invoice must show a supplier VAT number, a date, a description, and the VAT amount. That minimum baseline is the same from Lisbon to Helsinki. What makes a Spanish invoice different is what Spain adds on top.
Spain's invoicing obligations are governed by Real Decreto 1619/2012, which specifies mandatory fields beyond the EU minimum. The Agencia Estatal de Administración Tributaria (AEAT, Spain's tax authority) enforces these rules through quarterly tax filings. An invoice that misses a required field can block IVA (VAT) deduction for the recipient and trigger penalties of 1-2% of the transaction value.
Four structural differences separate a Spanish invoice (factura) from the generic European template:
| Field | Generic EU Invoice | Spanish Invoice (Factura) |
|---|---|---|
| Tax ID format | VAT number (country code + alphanumeric, variable length) | NIF/CIF (Número de Identificación Fiscal): 8 digits + 1 check letter for individuals, 1 letter + 7 digits + 1 check for companies |
| VAT rates | Usually 1-2 rates (standard + reduced) | Up to 3 IVA rates on the same invoice: 21% (general), 10% (reducido), 4% (super-reducido) — each with its own Base Imponible and Cuota |
| Income tax withholding | Not present on most EU invoices | IRPF (Impuesto sobre la Renta de las Personas Físicas) retention — typically 15% or 7% subtracted from the total. Appears as a negative tax line |
| Invoice numbering | Sequential number, no series requirement | Serie + Número: invoices organized into named series (e.g. 2026-F-0001), with separate mandatory series for corrective invoices (Rectificativas) |
These aren't cosmetic differences. A template-based extraction tool that expects a single VAT rate per invoice will split a Spanish invoice's 21% and 10% IVA lines across two separate rows — or worse, merge them into a single column without preserving which base corresponds to which rate. A tool trained on German or UK invoice layouts will have no concept of a retention line, because in those markets the invoice total is simply net + VAT. On a Spanish invoice, the total is often net + IVA − IRPF, and getting the arithmetic wrong means the extracted figure won't match what was actually paid.
The extraction challenge isn't that Spanish invoices are harder to read. It's that they carry data categories most extraction tools weren't built to recognize. Once you know what those categories are, the extraction step becomes predictable.
Spanish Invoice Fields You Need to Extract — and What They Mean
Under Real Decreto 1619/2012, Article 6, a full invoice (factura completa) must contain at minimum eleven data points. A simplified invoice (factura simplificada), used for amounts under €400 or in sectors like hospitality and transport, omits the recipient's NIF and address. For extraction purposes, you need to account for both variants: a batch of 30 supplier invoices will almost certainly contain a mix.
| Field (Spanish Name) | What It Contains | Extraction Note |
|---|---|---|
| Número y Serie (Number and Series) | Sequential invoice number within a named series. Example: 2026-F-000123 or R-2026-045 | Series prefix indicates type (F = factura, R = rectificativa). Extract both as separate columns to track series continuity |
| Fecha de Expedición (Issue Date) | Date the invoice was issued. For B2B, must be within 16 days of the month following the service | Spanish date format is DD/MM/YYYY. AI extraction normalizes to your locale automatically |
| Fecha de Operación (Operation Date) | Date the goods or services were supplied, if different from issue date | Present on roughly 40% of invoices. If blank, operation date = issue date |
| NIF/CIF Emisor (Issuer Tax ID) | Supplier's tax identification number. Individuals: 8 digits + 1 letter (e.g. 12345678Z). Companies: 1 letter + 7 digits + 1 check (e.g. B12345678) | The letter suffix is a check digit calculated from the numeric portion. Invalid check digits flag either a data entry error or a fraudulent invoice |
| Razón Social (Legal Name) | Full registered business name of both issuer and recipient | May differ from trading name. Always extract Razón Social rather than the logo brand name |
| Dirección Fiscal (Tax Address) | Registered fiscal address for both parties | Required on factura completa, absent on factura simplificada for the recipient |
| Descripción (Description) | Description of goods or services, including quantity and unit price | Must be detailed enough to identify the transaction. Vague descriptions increase audit risk |
| Base Imponible (Taxable Base) | Net amount before tax, broken down per IVA rate applied | A single invoice may contain multiple Base Imponible lines — one per IVA rate plus one for exempt operations |
| Tipo de IVA (IVA Rate) + Cuota IVA (IVA Amount) | Percentage (21%, 10%, or 4%) and corresponding euro amount for each rate | Extract as pairs: each rate must stay associated with its base and its amount |
| Retención IRPF (IRPF Withholding) | Percentage (15% or 7%) and euro amount withheld. Only on B2B service invoices from individual professionals (autónomos) | This is a negative line. Total Payable = Base Imponible + IVA − IRPF. Template-based tools often fail to parse a negative tax line |
| Importe Total (Total Amount) | Final amount due, including all taxes and withholdings | When IRPF is present, the total does NOT equal Base + IVA. Cross-verify: Total = Base + IVA − IRPF |
The most common extraction error is treating the Importe Total as Base Imponible + IVA and ignoring the IRPF deduction. On a €1,000 service invoice with 21% IVA and 15% IRPF, the total is €1,060 — not €1,210. A tool that adds instead of subtracts IRPF will overstate the payable amount by €150 per invoice. At 30 invoices a month, that's a €4,500 monthly discrepancy in your accounts payable ledger.
The IVA Breakdown Problem: Why One Invoice Can Have Three Tax Rates
Spain applies three IVA rates under Ley 37/1992 (LIVA): 21% general, 10% reduced, and 4% super-reduced. The 10% rate covers food products, passenger transport, hospitality, and certain professional services. The 4% rate applies to essential goods: bread, milk, books, medicines. A single invoice from a wholesale food distributor might include both 10% items (prepared foods) and 4% items (basic staples) plus 21% on packaging or delivery charges.
When this happens, the invoice must show each rate on its own line with a separate Base Imponible (taxable base) and Cuota (tax amount). Lumping different rates together is non-compliant under Real Decreto 1619/2012. For extraction, this means a single invoice can produce multiple tax rows:
| IVA Rate | Base Imponible | Cuota IVA | Applicable Category |
|---|---|---|---|
| 21% (General) | €200.00 | €42.00 | Packaging, delivery, non-food items |
| 10% (Reducido) | €500.00 | €50.00 | Prepared foods, transport |
| 4% (Super-reducido) | €300.00 | €12.00 | Bread, milk, basic staples |
| Total | €1,000.00 | €104.00 |
Most generic invoice extraction tools handle this poorly. They either capture only the first IVA line and ignore the rest, or they sum all IVA amounts into one column without preserving the rate-specific breakdown. That missing breakdown matters: when you file Modelo 303 (the quarterly IVA return), the AEAT requires IVA soportado (input VAT) to be reported by rate category. If your extraction output shows €104.00 of IVA with no rate attribution, you cannot correctly fill boxes 28-31 of the return.
This is where the extraction method matters. Template-based OCR tools look for a label like "VAT" and grab the number next to it. On a Spanish invoice with three IVA lines, "VAT" appears three times with three different numbers. A semantic extraction approach — one that understands what each rate means rather than where it sits on the page — captures all three rates as distinct data points with their bases and amounts intact. You define columns like "Base Imponible 21%", "Cuota IVA 21%", "Base Imponible 10%", and so on. The AI reads the whole page, identifies each IVA block, and places the correct value in each column regardless of layout.
When the AEAT cross-references your Modelo 303 against your supplier invoices, the IVA breakdown is the first thing they check. A mismatch between reported input VAT and documented IVA by rate is a red flag that triggers an inspection faster than any other discrepancy.
IRPF Retention (Retención IRPF): The Field That Makes Your Totals Not Add Up
IRPF retention is the single most confusing field on a Spanish invoice for anyone outside Spain. It is an income tax advance — not a sales tax. When an individual professional (autónomo) invoices a Spanish business for services, the client must withhold a percentage of the base amount and pay it directly to the AEAT on the professional's behalf. The professional receives the net amount after withholding and recovers the withheld sum when filing their annual IRPF return (Modelo 100 or Declaración de la Renta).
The withholding rates are set by Artículo 101 de la Ley del IRPF:
| Rate | Applies To | Duration |
|---|---|---|
| 15% | Professional services (consultants, lawyers, architects, IT, designers) — standard rate | From the third full calendar year of activity onward |
| 7% | New professionals in their first three calendar years of activity | Year of registration plus the following two years |
| 19% | Intellectual property royalties and certain artistic income | Ongoing |
| 0% | Agricultural and livestock activities; invoices to individuals or non-resident clients; companies (S.L./S.A.) invoicing other businesses | Ongoing — IRPF does not apply to company-to-company invoices |
On the invoice itself, IRPF appears as a negative line item below the IVA calculation. A typical layout:
Base Imponible: €1,000.00
IVA (21%): +€210.00
IRPF (15%): −€150.00
Total a Pagar: €1,060.00
For extraction, the IRPF line presents a specific challenge: it looks like a discount but behaves like a tax. Some invoices label it "Retención IRPF," others abbreviate it as "IRPF" or "Ret. IRPF." The percentage and amount may appear on the same line or split across two. A template expecting all monetary fields to be positive values will misread −150 as 150 and produce a total of €1,360 instead of €1,060.
The cross-check is simple: once extracted, verify that Importe Total = Base Imponible + ∑Cuota IVA − Retención IRPF. If the numbers don't reconcile, the extraction missed either an IVA rate or the IRPF line.
Invoice Numbering Rules and Why They Matter for Data Reconciliation
Spanish invoice numbering under Real Decreto 1619/2012 follows a Serie + Número structure. A series is a named prefix (like "2026-F" for standard invoices or "R-2026" for corrective ones), and numbers run sequentially within that series with no gaps allowed. The series system exists because Spanish law requires separate numbering streams for different invoice types: ordinary invoices, simplified invoices, and corrective invoices (facturas rectificativas) must each have their own series.
A corrective invoice (factura rectificativa) does not simply cancel the original. It references the original invoice by number and date, states the reason for correction, and shows the delta between the original and corrected amounts. The corrected amount can be positive (you undercharged) or negative (you overcharged or the client returned goods). Audit rules under Real Decreto 1619/2012 require corrective invoices within four years of the original's tax accrual date. After four years, corrections through rectificativas are no longer available.
For data reconciliation, the series prefix acts as a document type classifier. A batch of invoices that mixes F-series (standard) and R-series (rectificativa) needs different treatment: an R-series invoice does not represent a new payable — it adjusts an existing one. Failing to separate them means counting the same expense twice or, in the case of a negative rectificativa, reconciling a payable that never existed.
The extraction strategy: define two columns — "Invoice Series" and "Invoice Number" — and extract them independently. Then set up a computed column or Excel formula that flags any R-series prefix for manual review. The AI reads the series prefix exactly as printed, whether it's "2026-F-000123," "R-2026/045," or "REC-001."
Step-by-Step: Extracting Spanish Invoice Data into Excel
Here is the workflow for extracting data from a set of Spanish invoices (facturas) into a single Excel spreadsheet. The approach uses semantic AI extraction — the tool reads the invoice by understanding what each field means rather than where it sits on the page. This matters because Spanish invoices from different suppliers can place the same field in completely different positions: one supplier puts NIF in the top-right corner, another places it in a footer block below the line items, and a third embeds it in a QR code. Positional extraction breaks when layout changes. Semantic extraction does not.
Upload your Spanish invoices
Drag and drop PDFs, scanned images, or screenshots of facturas into the upload area. The tool accepts PDFs, JPGs, PNGs, and web screenshots. Batch upload all invoices at once — 10, 50, or more in a single session. No need to pre-sort by supplier, format, or invoice type.
Define your extraction columns
Type the column names matching the Spanish invoice fields you need: "NIF Emisor," "Razón Social," "Número de Factura," "Serie," "Fecha de Expedición," "Base Imponible 21%," "Cuota IVA 21%," "Base Imponible 10%," "Cuota IVA 10%," "Retención IRPF (%)," "Retención IRPF (€)," "Importe Total." The column names you type become the headers of your output spreadsheet. You can also add a computed column: for example, "Verification (OK if Base Imponible 21% + Base Imponible 10% + IVA − IRPF = Importe Total, otherwise DIFF)" to auto-flag reconciliation errors.
Process and review
Click process. The AI reads every invoice, locates each field by understanding its semantic meaning (not its pixel position), and populates the output table. Review the results in the browser. The verification column flags any invoice where the IVA breakdown and IRPF don't reconcile, so you can spot-check those first. Processing takes 5-10 seconds per page.
Export to Excel and reconcile
Download the complete dataset as an XLSX file. Each row is one invoice, each column is one field. The data is pre-normalized: dates in consistent format, amounts as numeric values (not text strings), and verification columns pre-computed. From here, the spreadsheet feeds directly into your accounting software, Modelo 303 preparation, or AP reconciliation workflow.
Files are processed securely and not stored.
Format Variations: Factura Completa, Simplificada, Rectificativa, and Proforma
Not every invoice you receive from a Spanish supplier will be a factura completa with all eleven mandatory fields. Spanish tax law recognizes several invoice types, and a typical AP inbox contains a mix:
| Invoice Type | Use Case | Key Extraction Difference |
|---|---|---|
| Factura Completa (Ordinaria) | Standard B2B invoice. All mandatory fields present. | Full extraction: all 11 fields available. Contains both issuer and recipient NIF, full address, and complete tax breakdown. |
| Factura Simplificada | Amounts under €400, or up to €3,000 in hospitality, transport, parking. B2C transactions. | Missing recipient NIF and address. IVA rate still required but no per-rate Base Imponible breakdown — total with IVA suffices. Cannot be used for input VAT deduction by recipient. |
| Factura Rectificativa | Correcting a previously issued invoice. Must reference original invoice and state reason for correction. | Carries a unique series prefix (R- or REC-). Contains two sets of numbers: the corrected data plus the original values. Extraction must capture the referenced invoice number and the delta. |
| Factura Recapitulativa | Bundles multiple operations for the same client into a single monthly bill. | May contain line items from different dates and different IVA rates. Each line may have its own tax treatment. |
| Factura Proforma | Preliminary invoice, not a tax document. Used for quotations or advance payment requests. | Not a valid tax invoice. Should include "Proforma" label prominently. Do not include in IVA declarations. Extract for reference only. |
The extraction strategy adapts to whichever type lands in your inbox. For a simplificada without recipient NIF, the NIF column simply stays blank and the row remains valid. For a rectificativa, the series prefix triggers a flag for manual review. The AI reads what's on the page rather than expecting a fixed schema.
Beyond Single-Invoice Extraction: What Comes Next
Extracting one Spanish invoice into Excel solves the data entry problem for that one document. The larger operational challenge — processing 30, 50, or 200 supplier invoices at once, consolidating IVA across all of them for Modelo 303, and aggregating IRPF withholdings for cross-checking Modelo 111 — is the subject of batch processing Spanish supplier invoices into an accounts payable spreadsheet, which covers multi-supplier workflows and quarterly tax consolidation.
For businesses processing Spanish invoices alongside invoices from other markets, the same extraction approach works across formats. A tool that reads a page visually rather than parsing XML does not care whether the invoice is a Spanish factura, a Mexican CFDI, or a French facture — it reads the numbers and labels the same way a person would. For a comparison of extraction costs across Spanish-speaking markets, see Spanish-speaking markets document extraction on a budget and affordable CFDI extraction for Mexican small business.
FAQ
Does AI extraction work with handwritten Spanish invoices?
Yes. Handwriting recognition covers both printed and handwritten Spanish invoices, including cursive script. The same column-name approach works: the AI identifies "NIF" and "Base Imponible" by their semantic meaning regardless of whether the text is typed or handwritten. Accuracy on clear handwriting is comparable to printed text; heavily stylized or faint handwriting may require spot-checking.
Can the tool handle FacturaE XML files, or only PDFs?
The extraction tool reads the visual layer of documents — PDFs, images, and screenshots. It does not parse XML directly. If you have a FacturaE XML file, the data is already structured and machine-readable; you do not need extraction. The tool is designed for the far more common scenario: the supplier sent a PDF version of the invoice, not the XML, and you need to get the data out of the visual representation.
What happens when IVA rates change or a new rate is introduced?
Since the extraction is semantic rather than template-based, it adapts to any IVA rate printed on the invoice. If the Spanish government introduces a new rate or changes an existing one, the AI reads whatever rate appears on the page. No template updates or retraining required. The same applies to IRPF rate changes.
How does the tool handle invoices from non-Spanish EU suppliers with reverse-charge IVA?
When a Spanish business receives an invoice from another EU country under the reverse-charge mechanism (inversión del sujeto pasivo), the invoice shows "IVA: Inversión del sujeto pasivo" or "0% IVA — Art. 84 LIVA" with no tax amount. The AI extracts the 0% IVA notation and the legal article reference. Since reverse-charge IVA is self-assessed by the recipient (not charged by the supplier), the extracted data correctly reflects zero input VAT on this invoice. For intra-EU transactions, Modelo 349 (not Modelo 303) covers the reporting.
How accurate is the extraction on multi-page Spanish invoices?
The AI processes all pages of a multi-page document as a single unit. Line items spanning multiple pages are captured in sequence. Running headers, footers, and page numbers are recognized as repeated elements and not duplicated in the output. Accuracy on multi-page invoices is comparable to single-page documents.
The Extraction Step Spanish Invoices Actually Need
Spanish invoices carry more structural complexity than their EU counterparts: multi-rate IVA, IRPF retention that reverses the usual tax arithmetic, and a series-based numbering system with separate streams for corrections. Most extraction tools were built for a simpler invoice model. The result is either incomplete data (missing IVA rates, missing IRPF) or incorrect totals (a total that doesn't match what was paid, because the IRPF subtraction was ignored). Semantic extraction changes the equation by reading the invoice the way a Spanish contable would: identifying each field by what it means, not by where a template expects to find it. The output is a clean spreadsheet where Base Imponible 21% sits in one column, Base Imponible 10% in another, IRPF in its own column, and the total reconciles.