Can AI Extract Tables from Images?
Yes — How Well It Works
Yes. AI can extract table data — rows, columns, headers, and cell values — from images of bordered and borderless tables. Bordered tables achieve near-perfect structure recognition, often exceeding 95%. Borderless tables with clear column spacing work well in the 85–95% range. Dense borderless tables with merged cells and hierarchical headers remain the hardest case, typically landing at 60–85% and requiring some manual cleanup. The gap isn't about whether AI "understands" tables — it does — but about the inherent ambiguity a table image presents: when there are no visual boundaries, even a human reader has to guess which cell a value belongs to.
Key Takeaways
- AI's table extraction accuracy crashes from 96% on bordered tables to 60% on merged-cell tables — and the fault isn't in the model, it's in the input image that already erased the hierarchical structure.
- A merged cell spanning three rows means "this category covers the next three items" — obvious to a human reader, but to an AI operating on a flat image, it's an inference problem where the tree structure no longer exists in the data.
- Define your output columns explicitly — Item Description, Quantity, Unit Price — and the AI locates values by semantic meaning rather than reconstructing the table grid, sidestepping the merged-cell ambiguity entirely.
How Well It Works, by Table Type
The question "can AI extract tables from images" has no single answer. It depends entirely on the table in your image — not on the AI's general capability. Decades of computer vision research, culminating in the TableBank benchmark (417,234 labeled tables) and PubTabNet (568,000+ table images), have produced a clear picture of what works and what doesn't. Here's the breakdown:
| Table Type | Structure Accuracy (S-TEDS) | Content + Structure (TEDS) | What Makes It Work / Fail |
|---|---|---|---|
| Bordered tables | 96–98% | 90–95% | Grid lines provide unambiguous cell boundaries. Vision AI detects lines as separators; column detection is near-perfect. |
| Borderless, clear spacing | 88–95% | 85–93% | Whitespace between columns is sufficient when wide and consistent. AI infers column boundaries from alignment patterns. |
| Borderless, dense layout | 70–85% | 65–80% | Narrow gaps between columns blur boundaries. Values like "2,400,000" next to "12.5%" with thin spacing get merged into one cell about a third of the time. |
| Merged cells | 60–80% | 55–75% | Rowspan/colspan break the grid assumption. AI must infer which rows a merged cell spans — trivial for humans, structurally ambiguous for algorithms. |
| Handwritten tables | 50–70% | 40–65% | Double challenge: recognize handwriting and infer table structure from irregular alignment. Even human data entry operators slow down significantly on these. |
These numbers come from the TEDS metric (Tree-Edit-Distance-based Similarity) used across academic benchmarks. S-TEDS measures structural fidelity — are the right number of rows and columns present, with correct cell spanning? TEDS adds content accuracy — are the values inside each cell correct? A 2025 benchmark of vision-language models on PubTabNet found that general-purpose VLMs achieve 74–85% TEDS on raw extraction, while specialized table-focused models with pre-processing pipelines push into the 93%+ range (NGTR framework, IJCAI 2025).
The practical implication: if you're extracting data from clean, bordered invoice tables or structured reports, AI is mature and production-ready. If you're dealing with a scanned contract appendix containing a dense, borderless pricing table with merged category headers — expect to spend time on manual verification. For a broader look at what AI document extraction can and can't do across all document types, see our guide on what AI document extraction actually is.
What AI Gets Right
Three table scenarios where modern vision AI consistently delivers production-grade results:
Bordered tables from any source. Whether it's a PDF invoice, a screenshot from an accounting portal, or a photo of a printed purchase order — if the table has visible grid lines, AI detects the cell boundaries with near-perfect precision. The reason is straightforward: horizontal and vertical lines create an unambiguous graph that an AI model can parse deterministically, similar to how it would parse a spreadsheet grid. Traditional OCR tools like Camelot achieve 90%+ on well-bordered tables too, but AI adds the ability to handle curved or skewed borders that break rule-based line detectors.
Cleanly spaced borderless tables. Modern invoices from SaaS platforms, professional services firms, and design-forward vendors often use whitespace-based table layouts — no grid lines, just generous spacing between columns. These tables are designed to be readable by humans through alignment alone, and AI models trained on millions of table images have learned the same visual cue. When column gaps are consistent and wider than the gaps between words within a cell, AI correctly identifies boundaries 90%+ of the time.
Multi-page tables with consistent structure. When a table spans multiple pages with the same column layout on every page — common in bank statements, financial reports, and utility bills — AI can process each page independently and merge the results into a single continuous spreadsheet. This is where the batch-first design of modern extraction tools becomes critical: you upload all pages at once and get one unified output table, not N separate extracts that need manual stitching.
Where It Struggles
Being specific about failure modes builds more trust than claiming 99% accuracy on everything. Here are the scenarios where AI table extraction still requires human oversight:
Merged cells with directional semantics. A cell spanning three rows in a column header means "this is the parent category for the next three rows." To a human, that's obvious. To an AI, it's a structural inference problem: reconstructing a tree from a flat grid. When merged cells cover four or more rows, or when merged cells appear in both row and column headers simultaneously, accuracy drops sharply. A 2024 comparative study of PDF parsing tools found that parser accuracy degraded the most on documents with non-standard layouts and complex cell spanning.
This is not a failure of AI intelligence — it's a failure of the input format. A table image with merged cells has lost the hierarchical structure that would exist in the original source (HTML with rowspan attributes, or a spreadsheet with merged regions). The AI is being asked to reconstruct information that was removed when the table was rendered as a flat image. That's an inference problem, not a recognition one.
Hierarchical headers. Tables where column headers have parent-child relationships (e.g., "Q1 2025" spanning "Jan," "Feb," "Mar") and row headers also have parent groups create a two-dimensional hierarchy. Most AI models output a flat table — one header row, then data rows. They don't natively preserve the hierarchical relationship unless explicitly prompted to. The result is often a technically correct flat table that has lost the multi-level structure the original author intended. Our Custom Column Extraction approach sidesteps this by letting you define the output schema upfront, rather than asking AI to infer it from the image.
Dense borderless grids with variable cell widths. When a table has no borders, narrow gaps between columns, and cells containing varying amounts of text (some short values, some long descriptions), the whitespace boundaries become ambiguous. A cell containing "Invoice #2405-001" next to a cell containing "Office Supplies — Stationery (Bulk Order)" might be interpreted as three separate columns if the AI misjudges the whitespace thresholds.
Handwritten tables. Even when the handwriting itself is legible (which vision AI handles at 85–95% accuracy, as covered in our guide to AI handwriting recognition), the structural problem compounds. Handwritten tables have irregular column alignment — values drift left or right, row heights vary, and lines are rarely straight. The AI must solve two hard problems simultaneously: text recognition and structure inference from an irregular grid.
How Traditional Methods Compare
Before vision AI, extracting tables from images meant stitching together multiple fragile tools. Understanding the old approach explains why AI's table extraction capabilities are a genuine step change.
| Method | How It Works | Bordered Table Accuracy | Borderless Table Accuracy | Merged Cells |
|---|---|---|---|---|
| Camelot (lattice mode) | Detects visual lines in the PDF/image and computes cell intersections | ~68% overall (across document types) | Fails entirely — lattice mode requires visible borders | Fails — no line detection means no grid |
| Tabula | Extracts text positions from PDF, groups by spatial proximity | ~73% overall | 50–70% — stream mode guesses column boundaries from whitespace | Copies merged cell value into arbitrary adjacent cells, losing semantics |
| pdfplumber | Character-level text extraction with explicit whitespace analysis | ~72% overall | 55–75% — more configurable than Tabula but same fundamental approach | No merged-cell handling; outputs flat cells |
| Vision AI / VLM | Reads the table as a visual scene — understands structure, text, and relationships simultaneously | 90–98% | 85–95% (spaced) / 65–80% (dense) | 60–80% — infers spanning from context but not perfectly |
The traditional approach has a fundamental architectural problem: it separates text recognition from structure recognition. First, OCR extracts text and positions. Then, a separate algorithm — often hand-tuned heuristics — tries to reconstruct the table grid from those positions. If OCR misreads a character (common on low-resolution images) or mispositions a word (common with skewed documents), the structure inference fails downstream with no way to recover. The errors compound.
Vision AI avoids this entirely. It reads the table image as a visual scene — the same way you do — understanding that a number under the "Total" header belongs to that column not because it's at pixel coordinate X, but because it semantically aligns with everything else in the "Total" column. This isn't just a better OCR — it's a fundamentally different approach to the problem, which our AI vs traditional OCR comparison explores in detail.
How to Get the Best Results
Five practices that consistently improve AI table extraction accuracy, regardless of which tool you use:
1. Start with the highest resolution available. AI models see the image as a grid of pixels — more pixels means finer distinction between adjacent cells. A 2025 analysis of vision LLMs on the PubTabNet benchmark found that image upscaling was the most common pre-processing improvement, invoked in 64% of successful extractions on low-quality inputs. If you're photographing a printed table, use the highest resolution your phone camera supports and hold the phone parallel to the document to avoid perspective distortion.
2. Crop to the table region. Vision AI works better when the table fills most of the frame. Extra content around the table — surrounding text, logos, page headers — adds noise that can confuse column detection. Crop your image to just the table area before extraction.
3. Define your output columns explicitly. The most reliable approach isn't asking AI "extract everything" — it's telling AI what to extract. When you specify column names like "Item Description," "Quantity," "Unit Price," and "Line Total," the AI knows exactly what fields to look for and where they belong in the output. This is the principle behind Custom Column Extraction: the AI matches data to your schema by understanding the document's content, not by guessing the table structure. For the full explanation of how this works, see how schema-driven extraction differs from full-table parsing.
4. For borderless tables, pre-process with contrast enhancement. If your table has no visible borders and thin column spacing, increasing image contrast can help AI distinguish column boundaries. Even a simple levels adjustment in any image editor — darkening text, brightening the background — improves whitespace detection.
5. Verify merged-cell outputs. This is the one step you should never skip. When a table has merged cells, scan the extracted spreadsheet for rows where a value seems to be missing or repeated incorrectly. The AI's structural inference on merged cells is good enough to save you enormous time — it gets 80% of cases right — but the remaining 20% can introduce errors that cascade through downstream analysis if unchecked. Treat AI extraction as a first draft that needs a 60-second human scan, not a black box that requires no oversight.
Real Examples: What to Expect
Example 1: A printed purchase order with bordered line-item table. You photograph a PO from a supplier. The table has clear borders, standard columns (Item, Description, Qty, Unit Price, Total), and no merged cells. AI will extract this with near-perfect accuracy — every row, every cell value, correctly aligned. You'll spend zero time on cleanup. This is the sweet spot where AI table extraction is genuinely faster and more accurate than manual data entry.
Example 2: A bank statement PDF with a borderless transaction table. Bank statements typically use whitespace-based table layouts: date, description, debit, credit, and balance columns separated by consistent gaps. AI handles this well — 90–95% accuracy on structure and content. The most common error is misaligning long transaction descriptions that spill into the adjacent debit/credit column. A quick scroll through the output catches these in under a minute.
Example 3: A scanned contract appendix with a dense pricing grid. This is the hardest case: no borders, thin column spacing, merged category headers spanning multiple sub-columns, and data values of varying lengths. Expect 65–80% structural accuracy. The AI will get most data points right but may jumble the relationship between merged header categories and their sub-columns. Plan for 5–10 minutes of manual correction on a 20-row table.
FAQ
Can AI extract tables from a photo taken with my phone?
Yes, and often surprisingly well — provided the photo is sharp, well-lit, and taken straight-on (not at an angle). The main failure mode with phone photos is perspective distortion: a table photographed from an angle creates slanted lines that confuse both traditional OCR and AI structure recognition. Hold the phone parallel to the document surface and the results will be comparable to a flatbed scan. For document types that are commonly photographed rather than scanned, see our guide on extracting data from screenshots and photos.
Does AI work better with PDFs or images?
It depends on the PDF. A text-native PDF (where you can select and copy text) contains positioning data that AI can use as an additional signal, often improving accuracy by 5–10 percentage points over a pure image. A scanned image-only PDF is equivalent to an image. AI handles both — but if you have a choice, provide the original text-native PDF rather than a screenshot of it.
Can AI handle tables with multi-line text inside cells?
Yes, and this is actually an area where AI outperforms traditional methods significantly. When a cell contains a paragraph of text — common in contract exhibits, specification sheets, and clinical reports — traditional OCR loses track of row boundaries because line breaks within a cell look like row breaks. Vision AI reads the cell as a whole entity and preserves the text within it, understanding that a line break inside "Scope of Work: The contractor shall..." doesn't start a new row.
How does AI handle tables with different currencies or number formats?
AI reads numeric values in context — it recognizes "1.500,00" as a European-formatted number (1,500.00) and "$1,500.00" as US-formatted, even if both appear in the same table. This works because vision AI doesn't rely on pattern-matching numeric strings; it understands the document's language, the surrounding column context, and the likely meaning of the value. Cross-format tables — such as a commercial invoice with mixed currency formats — are handled correctly in most cases.
Can AI extract tables that span multiple pages?
Yes. Modern vision AI can detect when a table continues onto the next page and merge the results into a single output spreadsheet. A 2025 study using the PubTables-v2 dataset achieved 99.5% recall on identifying cross-page table continuations. The practical requirement: all pages must be uploaded together as a batch so the AI can see the continuity. Processing pages one at a time loses the cross-page context.
Do I need to train the AI on my table format first?
No. This is a common misconception carried over from template-based OCR tools like Docparser or Parseur, where you must define parsing zones or rules for each new document layout. Vision AI uses semantic understanding — it reads your table the way a human would, without needing prior exposure to your specific format. The trade-off: template-based tools can achieve higher accuracy on formats they've been explicitly trained on, but they break when the format changes. AI handles format variation automatically but with lower peak accuracy on any single fixed format. For a detailed breakdown of this trade-off, see traditional OCR vs AI extraction.
What happens when a table contains both text and checkboxes or symbols?
Vision AI reads checkboxes and symbols contextually — a checked box next to "Expedited Shipping" is understood as "shipping method = expedited," not as an isolated symbol. This works because the AI sees the checkbox and the label text together as one semantic unit, similar to how it processes Key-Value pairs elsewhere on the page. The accuracy on checkbox data is generally 85–95%, comparable to printed text in bordered tables.
The bottom line: AI is ready for bordered and well-spaced tables today. It saves enormous time even on hard cases — because editing a mostly-correct extraction is faster than typing everything from scratch. And as vision models improve, the "hard" category shrinks each year. The data backs this up: S-TEDS scores on PubTabNet have risen from ~65% in 2020 to ~93%+ in 2025, and the trend line hasn't flattened yet.
For a hands-on comparison of how AI extraction performs against manual data entry on real-world documents, see our time-and-accuracy breakdown of AI vs manual transcription. Or, explore our roundup of the best table extraction tools in 2026 to see how different tools compare on the metrics that matter for your workflow.