How to Extract French Payslip
Data to Excel
When a French accounting firm (cabinet d'expertise comptable) takes over a new client's payroll, the first document they request is not the general ledger. It is 12 months of bulletins de paie (payslips) for every employee. Each PDF is a mirror of the monthly DSN (Déclaration Sociale Nominative) — the single electronic declaration transmitted to URSSAF, CNAV, CPAM, and Pôle Emploi since January 2017 under Décret n°2016-611. If the salaire brut on the December bulletin does not match the DSN total, the discrepancy must be located and explained before the firm can certify the books (arrêter les comptes). The bottleneck is not reading the payslips. It is the 3 minutes per bulletin spent manually typing Gros (gross salary), CSG deductible (deductible social contribution), and Net à payer (net pay) into a spreadsheet row — and then doing it again for 49 more employees, across 12 months.
Key Takeaways
- At three minutes per bulletin de paie, a 50-employee company spends 30 hours every year typing the sixteen legally mandated fields into Excel — before any verification even begins.
- 30 hours of typing, and the accountant still cannot sign off — because DSN reconciliation demands Salaire Brut, CSG deductible, and Net Imposable match across the bulletin PDF and the electronic declaration, and one misclassified cotisation breaks the entire month.
- One computed column — CSG Check = Brut × 98.25% × 9.2% − extracted CSG — lets ImageToTable.ai flag rows above ±€1 during extraction, turning a 600-row spreadsheet into a three-row investigation list.
What Makes the French Bulletin de Paie One of Europe's Most Complex Documents to Extract
The French payslip is not built for data extraction tools. It is built for compliance — and France's social protection system, funded by some of the highest employer social contribution rates in Europe, demands that every euro of contribution be traced. Sixteen fields are mandatory under Article R3243-1 of the Code du travail. Another five lines are prohibited under Article R3243-4 — an employer cannot show strike hours as such, nor distinguish between worked hours and union representation time. A non-compliant bulletin de paie carries a fine of up to €450 per document.
This legal density creates three layers of extraction difficulty that generic OCR tools — designed for English-language payslips with a handful of deduction lines — cannot handle:
- The three-section layout. A French bulletin de paie divides into top (employer and employee identity — SIRET, NAF code, convention collective), body (gross earnings → cotisations → net), and bottom (annual cumuls, leave balances, net social amount). Each section uses different typography conventions, and the same field name — "Total" — appears in the body as a subtotal, in the bottom as a cumul annuel (annual total), and in the header as a reference number. A coordinate-based template that picks up "the second Total" from the top will get the wrong number on a Silae-generated PDF versus a PayFit one.
- Five mandatory cotisation groups. The 2018 simplification reform grouped the previously ~50 lines of social contributions into five categories: Santé (health), Accidents du travail (workplace accidents), Retraite (pension — sécurité sociale plafonnée, sécurité sociale déplafonnée, and complémentaire AGIRC-ARRCO), Famille (family), and Chômage (unemployment). But each group still contains both a part salariale (employee share) and a part patronale (employer share), displayed in separate columns. The CSG (Contribution Sociale Généralisée) and CRDS (Contribution au Remboursement de la Dette Sociale) sit in their own section with their own calculation base — 98.25% of gross salary, not 100%. An extraction that treats all "cotisation" lines as a single tax column will blend employee and employer contributions into a number that means nothing to either party.
- Net imposable ≠ Net à payer. The amount that appears on the employee's annual tax declaration (net imposable — taxable net) is not the amount transferred to their bank account (net à payer — net pay). Net imposable = salaire brut − cotisations sociales déductibles + CSG non déductible (2.4% portion) + CRDS (0.5%). Net à payer = net imposable − impôt sur le revenu prélevé à la source (income tax withheld at source, or PAS) + reimbursements (transport, meal vouchers). A generic extraction that outputs one "Net Pay" column collapses two legally distinct figures. This distinction matters because the DSN uses net imposable, while the employee's bank statement reflects net à payer.
These are not edge cases. They are the standard anatomy of every bulletin de paie issued by a French employer since January 2018, when the bulletin de paie clarifié (simplified payslip) format became mandatory for all enterprises. The reform made payslips shorter — from ~50 lines to ~20 — but it did not make them simpler to extract. Grouping contributions under five headings rearranged the layout without standardizing it across payroll software providers.
The Sixteen Mandatory Fields Under Article R3243-1 — Mapping Each One to Your Spreadsheet Columns
Before building extraction columns, you need the legal inventory. Article R3243-1 lists exactly what must appear on every bulletin de paie. Each item below maps to a column name you will define in your extraction workflow:
| R3243-1 Item | Field | Recommended Column Name | DSN Verification Role |
|---|---|---|---|
| 1° | Employer name and address | Employer Name (Nom Employeur) | Must match SIRET register |
| 2° | NAF/APE code + SIRET | SIRET | Primary DSN employer identifier |
| 3° | Convention collective (collective bargaining agreement) | Convention Collective | Determines cotisation rates |
| 4° | Employee name and position | Employee Name (Nom Salarié) | Must match NIR (social security number) |
| 5° | Social security number (NIR) | NIR | DSN employee block identifier |
| 6° | Classification level (coefficient) | Classification (Coefficient) | Determines base salary grid |
| 7° | Pay period and hours worked | Pay Period (Periode), Hours Worked (Heures Travaillees) | DSN: heures for contribution calc |
| 8° | Overtime hours + premium rates | Overtime Hours (Heures Sup), Overtime Rate (Taux Majore) | Exonération fiscale on overtime |
| 9° | Gross salary (salaire brut) | Salaire Brut | Foundation of all DSN contribution calcs |
| 10° | Nature and amount of salary accessories | Accessories (Accessoires Salaire) | Bonuses, commissions, in-kind benefits |
| 11° | Employee cotisations by group | Cotisations Salariales | DSN employee contributions block |
| 12° | Employer cotisations by group | Cotisations Patronales | DSN employer contributions block |
| 13° | Net imposable (taxable net) | Net Imposable | DGFiP (tax authority) annual feed |
| 14° | Net à payer (net pay to employee) | Net a Payer | Final DSN reconciliation checkpoint |
| 15° | Prélèvement à la source (PAS) — rate and amount | PAS Rate (Taux PAS), PAS Amount (Montant PAS) | Individualized rate from DGFiP |
| 16° | Date of payment | Payment Date (Date Paiement) | DSN monthly period reference |
For an accounting firm reconciling a full-year payroll, these sixteen fields form 600 rows of data for 50 employees — 9,600 data points. One mismatched SIRET or a single misaligned Cotisations Salariales column cascades into hours of manual correction. The extraction tool needs to capture each field by what it means, not by where it sits on the page.
Five additional fields are explicitly prohibited under Article R3243-4: any mention of strike activity, any distinction between worked hours and union representative hours, and any reference to the employee's right to disconnect or to the exercise of other social rights. These prohibitions do not affect extraction directly, but they mean a compliant bulletin de paie will use neutral labels like "Absence non rémunérée" (unpaid absence) rather than "Grève" (strike) — which changes the text string the AI sees on the PDF.
The Payroll Software Landscape — Why Silae, PayFit, and ADP Generate Structurally Different PDFs
Article R3243-1 mandates what must appear, not how it must be laid out. There is no government-prescribed template — and five vendors dominate the French payroll software market, each with its own PDF rendering engine.
| Software | Market Position | PDF Export Format | Extraction Challenge |
|---|---|---|---|
| Silae | Leader — used by chartered accountants for 30%+ of French private-sector payroll | Compact 2-column layout, dense grouping | Cotisations merged into a single block; CSG and CRDS share a row label that differs by collective bargaining agreement (convention collective) |
| PayFit | Modern SaaS for SMEs, single-column responsive design | Single-column, wide spacing, web-font rendering | Net social amount positioned in a sidebar column that template tools miss; PAS line sometimes on a separate page |
| Sage Paie | SMEs in the Sage ecosystem (Sage 50, Sage 100), strong in retail and services | Multi-section with sectional headers, conventional tabular layout | Employer cotisations rendered below the main table in a section that some OCR engines skip as "footer content" |
| ADP | Large enterprises and multinationals | Multi-page detailed breakdown, separate annexes for specific regimes | Net imposable and net à payer often appear on different pages; supplementary pages for specific employee groups (cadres vs non-cadres) |
| Cegid RH | Mid-market with full HRIS ambition | Standardized blocks, consistent across Cegid product versions | Leave balance tables inserted between body and footer sections, breaking the vertical flow that linear OCR depends on |
The operational consequence: an accounting firm that serves 10 clients, each using a different payroll provider — or the same provider with different configuration — cannot build one template per software and expect it to survive a version upgrade. Silae, PayFit, and Sage each change their PDF layout periodically. Template-based extraction breaks. Semantic extraction — where the AI looks for "Salaire Brut" by what the label means, not pixel coordinates — does not.
This is the same challenge that makes Korean payslip extraction difficult: the law defines the content, but Douzone, ECOUNT, and PAYZON each render it differently. France's situation is more acute because the DSN adds a verification layer — the extracted data must reconcile with a monthly electronic declaration that has been transmitted to five government agencies.
Step-by-Step: Extracting French Payslip Data to a Verifiable Excel File
This workflow is built around a single premise: extraction is not finished when you have columns. It is finished when the columns can be cross-checked against the DSN. The steps below assume you have a folder of bulletin de paie PDFs — whether exported from Silae, PayFit, Sage, ADP, or Cegid — and your spreadsheet needs to be structured for a comptable (accountant) to verify, not just to read.
Files are processed securely and not stored.
Upload the Bulletin de Paie Files — Batch Is the Default
Drag and drop all PDFs — whether 12 monthly bulletins for one employee or 50 monthly bulletins across a full workforce. The tool accepts PDF, JPG, and PNG. For a cabinet d'expertise comptable conducting an annual payroll review, the typical upload is 600 files (50 employees × 12 months). Each is processed individually but exported to a single spreadsheet — one row per bulletin.
Define the Columns That Match the DSN Structure
Type the column names as they appear in your target checklist. For DSN verification specifically, the minimum set is: Employee Name, SIRET, Salaire Brut, Cotisations Salariales Total, Cotisations Patronales Total, CSG Deductible, CSG Non Deductible, CRDS, Net Imposable, Net a Payer, PAS Rate, PAS Amount. These twelve columns give enough signal to cross-check any DSN monthly block. The tool reads each column name for its semantic meaning — "Salaire Brut" finds the gross salary field regardless of whether Silae placed it top-left or PayFit placed it center.
Add a Computed Verification Column
Create a column called CSG Check (Brut × 98.25% × 9.2% − Extracted CSG). This is a computed column — the tool runs the calculation during extraction and outputs the difference between the expected CSG amount and the extracted value. A result within ±€1 per bulletin indicates both the gross extraction and the CSG extraction are likely correct. A result exceeding €5 signals either an extraction error, a DSN entry error, or an edge case (exonération, specific convention collective adjustment). Computed columns turn extraction from a data-capture task into a data-verification task in the same pass.
Export and Cross-Check with the DSN
Export to Excel (XLSX). The spreadsheet now contains one row per bulletin de paie with columns that mirror the DSN structure. Import the monthly DSN export from the payroll software or Net-entreprises.fr. Cross-reference: Salaire Brut on the bulletin must match the DSN employee block gross; Net Imposable on the bulletin must match the DSN net fiscal; PAS Amount must match the DSN PAS block. Any row where the CSG Check column deviates by more than €1 is your shortlist for manual investigation — before certifying the annual accounts.
From Extraction to Verification — Using CSG, CRDS, and PAS Ratios to Catch DSN Discrepancies
The extraction step gives you data. The verification step gives you confidence. French payroll law — specifically the CSG and CRDS rates published annually by URSSAF — provides built-in cross-check formulas that convert raw extraction into auditable output.
Here are the three verification ratios every payroll reconciliation spreadsheet should contain, with the computed column formula you can configure:
| Verification | Formula | Acceptable Deviation | What a Deviation Indicates |
|---|---|---|---|
| CSG total | Brut × 98.25% × 9.2% | ±€1 | Gross extraction error, exonération not accounted for, or incorrect base calculation |
| CSG deductible split | CSG Total × (6.8/9.2) | ±€1 | Misclassification of CSG deductible vs non-deductible — affects Net Imposable directly |
| CRDS | Brut × 98.25% × 0.5% | ±€0.50 | CRDS miscalculation or extraction picked up a different contribution line labeled similarly |
The 98.25% factor is not arbitrary — it is the assiette CSG (CSG calculation base), set by French law to exclude employer contributions to complementary health insurance (mutuelle) and retirement (prévoyance) that fall below the exemption threshold. For a salarié brut of €3,000, the CSG base is €2,947.50. CSG at 9.2% = €271.17. If the extracted CSG reads €245, the discrepancy of €26 tells you something is wrong — either the extracted Brut is incorrect, the employee has a specific exonération, or the DSN entry was miscalculated. You know that a problem exists before you know what the problem is. That is the difference between extraction and verification.
These computed checks are the payroll equivalent of what accountants do when extracting French invoices (factures): cross-referencing the TVA rates against the line-item totals to confirm the extraction captured the correct tax breakdown, not just a generic total.
The DSN-Payslip Reconciliation Workflow That Accounting Firms Actually Use
Once the extraction is done and the verification columns are in place, the spreadsheet becomes a working document — not a final answer, but a structured comparison between two data sources (bulletin de paie PDF and DSN export). Here is the reconciliation workflow that French accounting firms follow:
Export DSN Data from the Payroll Software
Every compliant French payroll platform — Silae, PayFit, Sage Paie, ADP, Cegid — can export a DSN data extract. This extract contains, per employee per month, the same fields that appear on the bulletin de paie. Export it as CSV and open it alongside your extraction spreadsheet.
Match Rows by Employee NIR + Pay Period
The social security number (NIR, Numéro d'Inscription au Répertoire) is the unique key. Match each bulletin de paie row to its corresponding DSN row using NIR and the pay period (période de paie). If a bulletin has no matching DSN row for a given month, that is a red flag — either the DSN was not transmitted (penalty: 5% of contributions due per month of delay) or the bulletin is from a different period.
Flag Deviations Above €1
Compare Salaire Brut, Cotisations Salariales, Net Imposable, and Net à Payer between the bulletin and DSN columns. Flag any deviation above €1. Sort the spreadsheet by the CSG Check computed column (descending by absolute difference). The rows at the top are your investigation list. For most months, the list will be empty — and that silence is the point. An empty deviation list means the accountant can sign off on payroll reconciliation in minutes rather than hours.
The Five-Year Retention Obligation — and Why Structured Excel Beats a PDF Folder
Under Article L3243-4 of the Code du travail, an employer must retain a copy of every bulletin de paie for five years. For a 50-employee company, that is 3,000 PDFs. A folder of 3,000 PDFs has zero searchability. An Excel file with 3,000 rows — extracted from those same PDFs — is searchable, sortable, auditable, and filterable by date, employee, or contribution group.
This is where extraction serves a purpose beyond reconciliation. When a former employee from 2022 requests their cumul annuel for pension verification (reconstitution de carrière), the HR department does not dig through a PDF archive. They filter the Excel sheet by NIR, check the December row for the 2022 annual totals, and respond in under a minute. The extraction spreadsheet becomes the company's own digital payslip register — structured, searchable, and compliant with the five-year retention mandate in a format that is actually usable for data retrieval.
FAQ — French Payslip Data Extraction
Can AI extraction handle both the old detailed format and the simplified 2018 format?
Yes. The bulletin de paie clarifié format introduced in January 2018 grouped contributions into five categories — Santé, Accidents du travail, Retraite, Famille, Chômage — but the underlying field labels (Salaire Brut, Net Imposable, Net à Payer) are the same in both formats. A semantic extraction tool that reads field labels by meaning, not position, works on both formats. The pre-2018 detailed format, with ~50 individual contribution lines, actually provides more granular data for extraction — the simplification reform reduced visual clutter for employees but did not remove data points that the DSN already transmits.
Does the extraction work with payslips that include the net social amount (montant net social)?
Yes. Since July 2023, French bulletins de paie must display a net social amount — the reference figure used to determine eligibility for social benefits like the RSA (Revenu de Solidarité Active) and prime d'activité. This amount sits between net imposable and net à payer on the payslip. The extraction tool captures it as a distinct field if you define a column for it. The net social amount is not used in DSN verification — it serves a separate administrative purpose — but capturing it in the same spreadsheet row keeps all payslip data in one place for future reference.
Can I extract only specific months — for example, just the December bulletins for the annual cumuls?
Yes. If your goal is annual reconciliation, you only need the December bulletin de paie — which lists all annual cumuls (cumul annuel) at the bottom. Upload only the December files and define columns for the cumul fields: Cumul Brut, Cumul Net Imposable, Cumul Heures, Cumul PAS. This gives you the full-year totals in a single row per employee without processing 12 months of data.
What about specific employee regimes — cadres (executives), VRP (sales representatives), or apprentis (apprentices)?
French payroll has distinct contribution regimes for cadres (additional AGIRC-ARRCO tranche B cotisations, different prévoyance rates), VRP (specific URSSAF risk code 511TG), and apprentis (exonérations on most cotisations). If your extraction spans multiple employee categories, define separate columns for cadre-specific fields — for example, Cadre Retraite Complementaire Tranche B — and leave those cells empty for non-cadre employees. The extraction tool will only populate a cell when it finds the corresponding field on the bulletin. Blank cells are not errors; they accurately reflect the absence of that contribution category.
Does this work for handwritten or scanned payslips — not just native PDFs?
Yes — the AI reads visual layout, not embedded text layers. A scanned bulletin de paie (printed and re-digitized), a photo taken with a smartphone, or a JPEG screenshot of a payroll portal all go through the same semantic extraction pipeline. Handwritten annotations on a printed bulletin — such as a manager's note — add visual noise but do not prevent the AI from finding the printed field labels. However, heavily degraded scans (creased paper, extreme skew, water damage) reduce accuracy. The same principle applies to pay stub extraction from any payroll provider — format matters less than legibility.
How does extraction compare to exporting a payroll register directly from Silae or PayFit?
A payroll register export from Silae or PayFit contains the data inside the software. But that export reflects what the software calculated — not necessarily what the employee's PDF bulletin physically shows. These can diverge: a manual adjustment made after the bulletin was generated, a correction applied in a subsequent month, or a version discrepancy between the payroll database and the PDF archive. Extracting data directly from the PDF bulletins de paie gives you the document that was actually delivered to the employee — which is the legal record under Article L3243-1. The software export is the calculation; the bulletin PDF is the evidence. For payroll register extraction, the same verification logic applies — the register is the aggregate, the bulletins are the proof.
A French payslip carries 30+ data points mandated by one of the world's most tightly regulated payroll systems. Extraction gets you the data. Computed verification columns — anchored to URSSAF rates — tell you whether it is right. Both belong in the same workflow.
Try It on a Bulletin de Paie