Tax Document Extraction

Convert VAT Return Form to Excel — and Cross-Check Box Arithmetic Before You File

Tax forms are the only document type where fields reference each other — Box 3 = Box 1 − Box 2, Section C Total = sum of rows 1–8. Traditional OCR extracts numbers from boxes in isolation but doesn't verify whether the extracted Box 3 actually equals Box 1 minus Box 2. Computed Columns extract each box independently and then verify arithmetic consistency — flagging discrepancies before the data enters your accounting system. This is your last line of defense against a typo that costs the difference between Box 1 − Box 2 and whatever the OCR misread.

Encrypted processing · Automatic data deletion after conversion

PDF & Scans
Arithmetic Check
XLSX/CSV

What You Can Extract from VAT Return Forms

Type the column names you need — the AI finds these values on every tax form by understanding what each box label means, whether the form is an HMRC VAT100 with Box 1–Box 9 numbering or an EU return with a completely different box layout. It reads the tax period, registration number, all box values, and supplementary totals from any tax authority's form without template setup.

VAT Period Start
VAT Period End
VAT Registration Number
Box 1 (Output VAT)
Box 2 (Input VAT)
Box 3 (Net VAT Payable)
Box 4 (Sales excl. VAT)
Box 5 (Purchases excl. VAT)
Total Sales
Total Purchases
Declaration Date
Declaration Method

The tool uses Custom Column Extraction: you type the column names you want — "Box 1 (Output VAT)," "Box 2 (Input VAT)," "VAT Registration Number" — and the AI locates the matching values on each form by understanding what each box label means, not by matching a fixed template or reading coordinates. This means one set of column names works across HMRC VAT100, German Umsatzsteuervoranmeldung, French CA3, and Dutch BTW aangifte simultaneously, even though each has different box numbering and layout. You can also define Computed Columns — for example, a column named "Net VAT Verification (Box 1 − Box 2)" — and the AI calculates whether the extracted Box 3 matches Box 1 minus Box 2, flagging any row where they differ. This cross-verification happens during extraction so your spreadsheet arrives with discrepancies already highlighted, not discovered later during reconciliation.

Tax Forms Have Arithmetic Built Into Their Structure — Traditional OCR Ignores It

Invoices and receipts contain standalone values — an invoice total doesn't need to equal anything else on the page. Tax forms are different. Box 3 is defined as Box 1 minus Box 2. Section C Total is the sum of lines 1 through 8. These arithmetic relationships aren't formatting — they're part of the legal definition of the return. Traditional OCR reads each box in isolation and never checks whether the numbers add up. An accountant on Reddit describes finding transcription errors during review that would have been caught if the software verified the arithmetic — but it didn't, so the error reached the client's filing.

01

Traditional OCR reads each box independently — Box 1, Box 2, and Box 3 are just three separate numbers on the page. There is no cross-field verification. If the OCR misreads Box 1 as £45,280 instead of £45,230, it enters £45,280 into your spreadsheet. Box 3 is also extracted — and the system has no way to ask: does the extracted Box 3 actually equal Box 1 minus Box 2? The £50 error enters your accounting system silently, and nobody notices until an audit or a mismatch with HMRC's own records. At scale, an accountant processing 30 client returns per quarter has to manually verify 20+ arithmetic relationships per form — 600 checks — on top of the data entry itself.

02

A single misread digit creates a cascade of downstream errors in your accounting system. Box 1 goes in wrong by £50. Your accounting software accepts it because there's no validation step between extraction and import. The quarterly VAT liability in your ledger no longer matches what you filed. By the time you reconcile — often weeks later, when HMRC sends a statement or when you run your own quarterly review — you're tracing back through multiple systems to find the origin of the mismatch. The cost of finding the error exceeds the cost of the error itself, and the clock is ticking before the filing deadline.

03

Multi-box dependencies compound the problem — and most returns have dozens of them. It's not just Box 3 = Box 1 − Box 2. Box 5 (Net VAT) must match across the summary section and the detail section. The Total Outputs must equal the sum of individual output lines. Section totals must equal the sum of their rows. A form like HMRC VAT100 has nine boxes with relationships between them; a full German Umsatzsteuervoranmeldung has over 60 fields with multiple interlocked arithmetic constraints. Manual cross-checking all of these for every client return every filing period is not feasible at any scale. So most firms skip it — and trust the extraction. That trust is what produces the reconciliation problem later.

01

Custom Column Extraction reads every box independently — then Computed Columns verify the arithmetic. You define columns for Box 1, Box 2, and the printed Box 3. Then you define a Computed Column: "Box 3 Check (Box 1 − Box 2)." The AI extracts Box 1 and Box 2 from the form, calculates the difference, and compares it against the printed Box 3 it also extracted. If the printed Box 3 matches Box 1 − Box 2, the Computed Column outputs "OK." If it doesn't — meaning either Box 1, Box 2, or Box 3 was misread — the Computed Column outputs the discrepancy. Every row with a non-zero discrepancy is flagged for review before the data enters your accounting system.

02

The £50 misread in Box 1 never reaches your accounting system because the discrepancy is caught during extraction. If Box 1 is extracted as £45,280 but the real value is £45,230, the Computed Column calculates Box 1 − Box 2 and gets a result that doesn't match the printed Box 3. The discrepancy column shows the mismatch immediately — you review that row, spot the digit error, correct it, and the corrected value flows into your ledger. The error is caught at the extraction boundary, not weeks later during reconciliation. Across 30 client returns per quarter, the Computed Columns run all arithmetic checks automatically — you only review the flagged rows, not all 600 relationships.

03

Multiple Computed Columns verify every arithmetic relationship on the form simultaneously — one extraction pass, all checks run. Define Computed Columns for "Section C Total Check (sum of lines 1-8 vs printed total)," "Box 5 Cross-Check (summary Box 5 vs detail section Box 5)," and "Total Outputs Check (sum of individual output lines vs printed total)." The AI extracts all box values across the entire return — including multi-page forms with continuation sheets — and runs every arithmetic verification in the same extraction pass. Your downloaded Excel file arrives with all data extracted and all discrepancies already flagged. You review only the flagged rows and import everything else with confidence. This is your last line of defense against a transcription error that would otherwise survive all the way to the filing.

How a Quarter's Worth of VAT Returns Gets Extracted with Arithmetic Verification in One Pass

Upload — whatever returns you received, as-is

Upload a batch that includes a digitally submitted HMRC VAT100 PDF for Q4, a scanned paper return from a client who still files by post (slightly skewed scan with a crease through Box 5), a multi-page German Umsatzsteuervoranmeldung with continuation sheets, and a French CA3 for a subsidiary. Formats vary — clean digital PDF, scanned paper with artifacts, multi-page with cross-page totals. No pre-sorting by jurisdiction, no splitting multi-page returns into individual files. The AI processes all forms in a single batch. If you also receive supporting schedules or supplementary declarations alongside the returns, upload them together — the tool handles mixed document types in the same batch.

Define columns — what your accounting system needs, plus arithmetic verification

Type the column names for your output spreadsheet: VAT Period Start, VAT Period End, VAT Registration Number, Box 1 (Output VAT), Box 2 (Input VAT), Box 3 (Net VAT Payable), Box 4 (Sales excl. VAT), Box 5 (Purchases excl. VAT). Then add the verification columns: Box 3 Verification (Box 1 − Box 2; output difference if non-zero), Cross-Page Total Check (sum of detail sections vs summary total). For the German multi-page return, the AI reads all continuation sheets and aggregates values. For the French CA3, it reads the French field labels and maps them to your English column names — same column definitions, no per-country template setup. For the scanned paper return with the crease, the AI reads around the crease and extracts the values; the Computed Column then verifies whether Box 3 matches Box 1 − Box 2, flagging the row if the crease caused a misread.

Output — one spreadsheet, arithmetic checks already run, only flagged rows need review

Download an Excel file where each row represents one VAT return. Box values are extracted with box numbers preserved as column data — Box 1, Box 2, Box 3, Box 4, Box 5, along with the period and registration number. The Computed Columns have already run: a column shows the difference between the printed Box 3 and the calculated Box 1 − Box 2. For the four returns in the batch, three show "0" in the discrepancy column — those rows are verified and ready for import. If the scanned return with the crease produced a misread, the discrepancy column shows a non-zero value — you review that row, compare the extracted numbers against the original form, correct the digit, and the entire batch is audit-ready. Export as XLSX, CSV, or JSON for direct import into Xero, QuickBooks, Sage, or your accounting system — with confidence that no arithmetic error survived the extraction boundary.

When VAT Return Extraction Works Best — and Where to Verify

VAT return extraction is highly reliable for structured government forms. A few conditions are worth understanding before you process a large batch — particularly those that affect box-value accuracy, since a misread box flows into the arithmetic cross-check.

Reliably extracts

Government-issued tax forms with standard box layouts — extract with near-perfect accuracy including box numbers as field labels.

Forms with the same layout but different country-specific box numbering — same column definitions work across jurisdictions.

Digitally submitted PDFs (HMRC VAT100, MTD-compatible returns, EU electronic filing PDFs) — clean source produces clean extraction.

Multi-page returns with continuation sheets — all pages extracted; verify cross-page totals with a Computed Column that sums detail sections and compares against the summary page.

Verify these cases

Handwritten corrections on printed tax forms — accuracy depends on handwriting legibility. If a preparer crossed out a typed value and wrote a correction by hand in a small box, the AI reads the visible corrected figure. Flag these returns with a Computed Column and verify the corrected value against supporting schedules before filing.

Amended or corrective returns where both original and corrected values appear on the same form — the AI may extract both sets of numbers if both are visible. Define explicit column names for the corrected values and cross-check with the original values where your retention obligations require both.

Non-standard regional forms outside UK/EU (e.g., certain state-level sales tax returns, local municipal tax filings) — box numbering may differ or use non-numeric labels. The AI can still extract labeled values, but column names in your extraction prompt should match the actual labels on the form. Run a single test form first to verify field mapping.

This tool extracts numbers from boxes but does NOT perform tax calculation or determine tax liability — it reads what's on the form. Computed Columns verify that the extracted numbers are internally consistent, but they do not recalculate your VAT liability according to tax law. The arithmetic check confirms the form was read correctly — not that the form itself was filled out correctly.

Frequently Asked Questions

How does Computed Column arithmetic verification work for VAT returns — and why does it matter?

Tax forms are the only document type where fields reference each other by definition — Box 3 is Box 1 minus Box 2. Traditional OCR extracts each box value independently and has no mechanism to ask whether the extracted numbers satisfy these relationships. Computed Columns solve this: you define a column like "Box 3 Check (Box 1 − Box 2)" and the AI calculates the expected net VAT from the extracted Box 1 and Box 2, then compares it against the printed Box 3 it also extracted. If they differ, the discrepancy is output in that column — flagging the row for review before it enters your accounting system. You can define multiple Computed Columns to verify all arithmetic relationships on the form simultaneously. The output spreadsheet arrives with discrepancies already identified, so your review time is spent only on flagged rows rather than manually verifying every arithmetic relationship on every return.

Does this work with VAT return forms from countries other than the UK?

Yes. The AI handles standard VAT/GST return formats from multiple jurisdictions — UK HMRC VAT100, German Umsatzsteuervoranmeldung, French CA3, Dutch BTW aangifte, Indian GSTR-3B, and others. Each country uses different box numbering and labels, but the AI reads each form's structure contextually rather than matching a fixed template. The same column names — "Box 1 (Output VAT)," "Box 2 (Input VAT)," "VAT Registration Number" — work across forms because the AI understands what each box label means in context. For less common regional forms with unusual labeling, we recommend running one test form first to verify that your column names map correctly to the form's specific labels. The Computed Column arithmetic checks work regardless of jurisdiction — Box 3 = Box 1 − Box 2 is the same math in any country.

Can I batch process VAT returns from multiple clients or tax periods in one go?

Yes. Upload VAT returns from multiple clients, multiple quarters, or multiple jurisdictions in a single batch. The AI processes each form individually and compiles all results into one Excel spreadsheet — one row per return — with the VAT Period Start, Period End, and Registration Number extracted so you can filter by client or quarter. The Computed Column arithmetic checks run on every row independently, so each return in the batch gets its own verification. This is ideal for accounting firms managing several clients during filing season: upload all returns at once, review only the flagged rows, and export the verified data per client. For recurring processing, the Collection Link feature lets clients upload their own returns to your processing queue — no account required for them — so returns arrive ready for batch processing without email attachments or file transfers.

How accurate is the extraction for scanned paper returns compared to digital PDFs?

For cleanly scanned paper returns at 200+ dpi with standard box layouts, extraction accuracy is comparable to digital PDFs. The accuracy ceiling is set by scan quality, not the AI's reading capability. A flat, well-lit scan of a cleanly printed return extracts reliably — box numbers, values, and supplementary fields are all preserved. A skewed scan, a scan with page creases or shadows, or a multi-generation copy (printed → faxed → scanned) may produce lower accuracy on the values nearest the physical artifact. This is exactly where the Computed Column verification provides its value: even if a crease through Box 5 causes a misread, the arithmetic check catches it because the extracted Box 3 won't match Box 1 − Box 2 if any of the three values was misread. For paper returns that arrive by post, we recommend scanning at 200+ dpi on a flatbed scanner rather than using a phone photo to maximize extraction reliability.

Is my tax data secure during processing?

All file transfers use TLS 1.3 encryption. Your documents are processed in an isolated session and automatically deleted from our servers within 24 hours. Your tax data is never used to train or improve our AI models — it remains your data alone. For accounting firms with specific data residency or retention requirements, the processing is designed to minimize data persistence: upload, extract, download, and the source documents are purged. The extracted spreadsheet stays on your machine — we don't retain extracted data beyond the processing window.

📮 contact email: [email protected]