How to Extract French Invoice
(Facture) Data to Excel
Most extraction tools can find a "total amount" on a PDF. But when a French supplier invoice (facture fournisseur) arrives from a wholesaler in Lyon — displaying three VAT rates on different line items, a SIREN buried in the footer, and a TVA intracommunautaire that ends in "FR" plus a two-digit check key — a generic tool's single "tax" column turns into something your comptable (accountant) sends back for manual correction. The extraction problem isn't about getting some data into a spreadsheet. It's about getting the right data into the columns that match France's accounting structure — the structure your CA3 declaration and your software expect.
Key Takeaways
- Most extraction tools produce a 'Tax' column and assume one tax rate — but a standard French supplier invoice carries three different VAT percentages and 13 legally mandatory fields.
- Collapse three TVA rates into one 'Tax' column and the CA3 VAT return becomes a manual reconciliation your comptable sends back — at €15 per missing or incorrect item under the Code Général des Impôts.
- ImageToTable.ai names each column to match what French accounting expects — the company ID (SIREN), the EU VAT number (TVA Intracommunautaire), and a separate VAT column per rate — so the spreadsheet feeds directly into Pennylane or Sage without a single manual fix.
What Makes a French Invoice Different for Data Extraction
A French facture is not a translated US invoice with a "French" label. It is a legal document governed by two statutes: Article L441-9 of the Code de commerce, which lists mandatory fields for all commercial invoices between professionals, and Article 289 of the Code Général des Impôts (CGI), which adds VAT-specific requirements — collectively detailed in Article 242 nonies A of Annexe II to the CGI. A missing field is not an inconvenience. Under Article 1737 of the CGI, the penalty is €15 per missing or incorrect item, capped at one-quarter of the invoice amount.
This legal framing creates extraction challenges that generic invoice OCR tools — built for English-language layouts with a single tax rate and a simple vendor name — cannot handle:
- SIREN vs. TVA intracommunautaire. A French supplier shows two identification numbers. The SIREN (9 digits, unique company identifier in the Sirene register) and the TVA intracommunautaire (FR followed by a 2-digit check key and the SIREN). The extraction tool must capture both, and they belong in different spreadsheet columns for different compliance purposes.
- Multi-rate VAT on a single invoice. A Metro cash-and-carry invoice for a restaurant might show food items at 5.5% TVA, non-alcoholic drinks at 10%, and a piece of kitchen equipment at 20%. A generic extraction that outputs one "tax" column collapses three distinct TVA bases into a number your comptable cannot post to the correct accounts.
- Date de facturation vs. date de livraison. The invoice date (when the document was issued) and the delivery or service date (when the transaction occurred) can differ and both have legal significance under Article 289 of the CGI.
These are not edge cases. They are standard on any invoice from a French professional supplier to another French business.
The Mandatory Fields That Define Your Extraction Columns
Before designing extraction columns, you need to know what the law requires on every French invoice. Under Article L441-9 of the Code de commerce and Article 242 nonies A of Annexe II to the CGI, a valid facture must contain these mandatory fields (mentions obligatoires). Each one corresponds to a column you will set up in your extraction spreadsheet:
| # | Mandatory Field | Extraction Column Name | Purpose |
|---|---|---|---|
| 1 | Numéro de facture | Invoice Number | Unique, sequential, no gaps permitted. This is the reference for audit trails and duplicate detection. |
| 2 | Date d'émission | Invoice Date | Determines the tax period for TVA deduction (compte 44566). |
| 3 | Date de la livraison / prestation | Service Date | May differ from the invoice date. Required when distinct. |
| 4 | Identité du vendeur (dénomination sociale, adresse) | Supplier Name | Legal name and siège social (registered office) address of the supplier. |
| 5 | Numéro SIREN/SIRET du vendeur | Supplier SIREN | 9-digit unique identification number. The SIRET adds 5 digits for the establishment. Both appear on the invoice. |
| 6 | N° TVA intracommunautaire | Supplier VAT ID | Format: FR + 2-digit key + SIREN (e.g., FR12 345 678 901). Required for intra-EU transactions and VAT deduction validation. |
| 7 | Identité de l'acheteur (dénomination sociale, adresse) | Buyer Name | Your company's legal name and address. Since July 2024, the buyer's SIREN is also required on B2B invoices per the e-invoicing reform. |
| 8 | Désignation des biens ou services | Description | Precise description — nature, quantity, unit of each item or service. |
| 9 | Prix unitaire HT | Unit Price (excl. tax) | The hors taxes (HT) price per unit. Any discounts, rebates, or deductions must appear explicitly. |
| 10 | Taux de TVA applicable | VAT Rate (%) | If multiple rates apply (20%, 10%, 5.5%, 2.1%), each taxable base and its corresponding tax amount must appear separately. |
| 11 | Montant total HT | Subtotal (excl. tax) | Total before tax. Used to post the expense to the appropriate compte de charges (class 6 account). |
| 12 | Montant total TVA | Total VAT | Total VAT amount. Posts to compte 44566 (TVA déductible sur autres biens et services) for deductible VAT. |
| 13 | Montant total TTC | Total (incl. tax) | The toutes taxes comprises (TTC) total — the amount actually paid. Posts to compte 401 (fournisseurs — trade payables). |
The table above represents the legal minimum. In practice, you will also want to extract payment terms (conditions de paiement), the IBAN/BIC for payment, and pénalités de retard (late payment penalties), which are mandatory on French business invoices under Article L441-10 of the Code de commerce.
This list also explains why tooling matters. A field-specific invoice extraction that lets you define exactly which columns to capture — rather than outputting every token it finds — produces a spreadsheet your comptable can post without rework. When you define "Supplier SIREN" as a column, the extraction tool should return the 9-digit SIREN from the invoice, not the TVA intracommunautaire, not the RCS registration number, and not every 9-digit number it encounters on the page.
Handling Multiple TVA Rates in a Single Invoice
The single biggest source of extraction errors on French invoices is multi-rate TVA. A single facture from a food wholesaler like Metro or Transgourmet routinely carries three VAT rates — 5.5% for basic food items, 10% for prepared foods or non-alcoholic beverages, and 20% for equipment or non-food items. To compound the problem, different suppliers display the TVA breakdown differently: some use a summary block at the bottom, others embed the rate next to each line item, and still others present a separate TVA annex page.
The correct approach is to extract at the line-item level with per-row VAT rates:
| Description | Qty | Unit Price HT | VAT Rate | VAT Amount | Line Total TTC |
|---|---|---|---|---|---|
| Filet de poulet (kg) | 10 | 8.50 | 5.5% | 4.68 | 89.68 |
| Eau minérale 1.5L (pack de 6) | 4 | 3.20 | 5.5% | 0.70 | 13.50 |
| Soda cola 33cl (carton de 24) | 2 | 9.90 | 10% | 1.98 | 21.78 |
| Film alimentaire professionnel | 1 | 45.00 | 20% | 9.00 | 54.00 |
This line-item approach has a downstream advantage that makes your comptable's month-end faster. Each row independently carries its own TVA rate and amount. When the data enters your accounting software — whether Pennylane, EBP Comptabilité, Cegid, or Sage — the software posts the total TVA déductible to compte 44566 (TVA déductible sur autres biens et services) and the HT base to the appropriate expense account (compte 607 for food purchases, compte 602 for consumable supplies). The alternative — extracting a single total VAT line and then reverse-engineering which items fell under which rate — is exactly the kind of spreadsheet rework that extraction is supposed to eliminate.
If your supplier invoice also carries items at the 2.1% rate (applicable to certain pharmaceutical products and press publications) or items that are TVA-exempt (like certain financial services or educational services), the same principle applies: each line carries its own rate. A correctly structured extraction output requires zero manual recalculation at month-end.
For larger operations processing dozens or hundreds of supplier invoices per period, a batch extraction workflow applies the same column structure to every invoice in a folder, producing a consolidated spreadsheet with consistent TVA columns regardless of how differently each supplier formats its factures.
Step by Step: Extract French Invoice Data into Excel
Here is the extraction workflow from a supplier PDF to an accounting-ready spreadsheet. Each step addresses a specific French-invoice requirement identified in the sections above.
The extraction below processes invoices at the document level — capturing the header fields listed in the mandatory fields table above. Try it with a French facture:
Files are processed securely and not stored.
For line-item-level extraction — where you need each row of a facture broken into its own spreadsheet row with individual TVA rates per line — define "Description," "Quantity," "Unit Price HT," "VAT Rate," "VAT Amount," and "Line Total TTC" as your column names alongside the header fields. The output is a complete accounting-ready import file.
Where the Extracted Data Goes: Mapping to French Accounting Entries
Extraction is worth the effort only if the data flows into your accounting without rework. Here is how spreadsheet columns map to the French Plan Comptable Général (PCG) entries that your accounting software will post:
| Extracted Column | PCG Account | Account Name | Entry Type |
|---|---|---|---|
| Total (TTC) | 401 | Fournisseurs — Trade payables | Credit (what you owe the supplier) |
| Subtotal (HT) | 607 / 602 / 606* | Purchases of goods / consumable supplies / external services | Debit (expense booking, account varies by purchase type) |
| Total VAT | 44566 | TVA déductible sur autres biens et services | Debit (recoverable VAT on purchases) |
| Service Date | — | Determines tax period for TVA deduction | — |
| Supplier SIREN | — | TVA deduction validation (VIES check) | — |
* The class 6 expense account varies by the nature of the purchase. Food raw materials → compte 601. Consumable supplies → compte 602. Subcontracting → compte 604. External services → compte 606. Office supplies → compte 6063. Your chart of accounts (plan comptable) determines the precise mapping.
This structure means the extraction output can feed directly into the CA3 TVA declaration at period-end. The total from column compte 44566 across all supplier invoices becomes the TVA déductible figure on the CA3 form. The Subtotal (HT) figures by expense category determine the taxable base. A single extraction workflow, done consistently across all incoming factures fournisseurs, replaces the manual process of reading each invoice and typing values into separate accounting screens.
For teams already using French accounting software, the final step is straightforward. Pennylane accepts structured data imports. EBP Comptabilité imports Excel files into its journal d'achats (purchase journal). Cegid and Sage both support CSV/Excel imports for supplier invoices. The key is that the spreadsheet columns are named and formatted consistently — which is exactly what a structured invoice processing workflow provides.
The 2026 E-Invoicing Reform and What It Means for Your Extraction Workflow
France's e-invoicing reform (réforme de la facturation électronique), formalized by Article 91 of the 2024 Finance Law and Décret n°2024-266 of March 25, 2024, introduces two obligations on a phased timeline. September 1, 2026: every business registered for TVA in France must be capable of receiving electronic invoices from their suppliers — whether through a Plateforme de Dématérialisation Partenaire (PDP, a certified private platform) or the Portail Public de Facturation (PPF, the free state-run central portal). September 1, 2027: TPE and micro-enterprises must also begin issuing electronic invoices.
The practical impact on your extraction workflow depends on your supplier mix. Large suppliers and government entities — who already use Chorus Pro for B2G invoices — will transition early, sending you Factur-X hybrid PDFs (PDF/A-3 with embedded XML) or UBL/CII structured invoices through a PDP. Small artisan suppliers and micro-enterprises will continue sending standard PDFs well into 2027 and beyond. Your extraction system needs to handle both: structured data that arrives pre-extracted from a PDP, and unstructured PDFs that still need AI-driven field extraction.
Receiving an electronic invoice is a compliance checkbox. Extracting its structured data into your accounting software — in columns your comptable can use — remains your workflow problem. The reform solves the transmission channel. It does not solve the integration.
This is also why investing in an extraction layer that sits between invoice reception and accounting software makes structural sense. A PDP routes the invoice to your inbox. An AI tool reads the invoice — whether it arrives as a Factur-X, a standard PDF, or a photographed paper copy — and lands the data in your spreadsheet. Your accounting software imports the spreadsheet. None of these three components needs to change when the other two evolve. You can read more about managing this transition cost-effectively as a small French business.
FAQ
Can AI extract data from handwritten French invoices?
Yes — to a point. ImageToTable.ai can read printed text and handwriting on French invoices using vision model AI, including handwritten amounts and scribbled supplier notes that appear on smaller artisan invoices. Accuracy on handwriting is lower than on printed text. If an invoice is entirely handwritten with cursive French script, expect lower fidelity than on a printed Metro or Transgourmet facture. The tool works by understanding the meaning of fields, not by template-matching characters, so it can identify a handwritten "Montant total" even when the penmanship varies. But it is not a substitute for requesting a legible invoice from your supplier.
Does the extraction handle invoice line items or only header fields?
Both. You define the columns. For header-only extraction, list fields like "Invoice Number," "Supplier Name," "Subtotal HT," "Total VAT," "Total TTC." For line-item extraction, add "Description," "Quantity," "Unit Price HT," "VAT Rate," "VAT Amount," and "Line Total TTC." The tool recognizes line items as repeating data structures within the invoice and populates one spreadsheet row per line item, with the header fields repeated for each row.
Can it validate the TVA intracommunautaire number format?
ImageToTable.ai extracts the VAT ID as it appears on the invoice — e.g., "FR12 345 678 901." It does not perform real-time VIES (VAT Information Exchange System) validation within the extraction interface. However, once the data is in your spreadsheet, you can run a VIES check using the European Commission's validation service or your accounting software's built-in validation. The extraction step gives you the structured data to validate; the validation step happens in your accounting workflow.
How does the tool handle the different French TVA rates on a single invoice?
When you define a line-item column structure — with a dedicated "VAT Rate" column — the AI reads the per-item TVA rate and populates it for each row, distinguishing between 20%, 10%, 5.5%, and 2.1% rates on the same invoice. For invoices that show only a summary TVA block (e.g., "TVA 5.5%: €23.40, TVA 20%: €45.00"), define columns like "Total HT at 5.5%," "TVA 5.5%," "Total HT at 20%," "TVA 20%," and the tool will extract the sub-totals per rate. This is more manual than line-item extraction but handles summary-format supplier invoices correctly.
What if my French suppliers send invoices in different formats — PDF, scanned paper, email body?
ImageToTable.ai accepts PDF, JPG, PNG, WebP, and AVIF. For email-body invoices, save the email as a PDF or take a screenshot and upload it. The tool does not require uniform formatting or a template per supplier — it reads each invoice by understanding what the fields mean, not by matching a layout. If you have a variety of supplier formats, the same column definition produces consistent output across all of them.
What about Chorus Pro invoices — can the tool extract data from those?
Chorus Pro invoices destined for the public sector (B2G) are typically available as PDFs that you can download from the Chorus Pro portal. Upload those PDFs to ImageToTable.ai and extract fields like any other invoice. The tool does not connect directly to the Chorus Pro API. For high-volume B2G invoice processing, most organizations use their PDP's integration to route data into their accounting software. ImageToTable.ai covers the gap when that integration is not available or when you need a quick extraction from a one-off Chorus Pro invoice.
A Spreadsheet Your Comptable Can Actually Use
The difference between a generic extraction and one built for French invoices is visible in the spreadsheet itself. A generic tool produces columns named "Tax" and "Vendor" — forcing your comptable to decide which value on a 13-field facture lands where. A French-aware extraction produces "Supplier SIREN," "TVA Intracommunautaire," "Subtotal (HT)," "TVA 20%," "TVA 10%," "TVA 5.5%" — columns that map directly to the PCG accounts and CA3 declaration lines your accounting software expects. The extraction step is the same click. The month-end rework is what disappears.
Upload a French facture and see what comes out. No login, no setup — three minutes to a spreadsheet your comptable will not send back.