Custom Column Extraction for Handwritten Documents: Define Your Fields Once, Process Any Penmanship
Template-based extraction fails on handwriting. Custom Column Extraction lets you define field names once and extract them from any handwritten form — no per-document setup.
Why Template-Based Extraction Was Never Going to Work on Handwriting
Template-based extraction tools operate on a simple premise: draw a box around the invoice number on one page, and the software reads whatever sits inside that same box on every subsequent page. For printed forms from a single source — a known vendor, a standardized government document — this works. The layout doesn't change. The "Invoice Number" field always sits at x=340, y=120.
Handwriting breaks this assumption at every level. A contractor's handwritten invoice has no box — the total might be scrawled in the bottom-right corner, circled twice, with "$" added as an afterthought. A nurse's handwritten patient form might squeeze the date into a margin because the printed date field was too small. A warehouse receiver's handwritten count on a delivery note sits wherever there was white space left on the carbon copy. Templates require positional consistency. Handwriting guarantees positional variability.
This isn't a minor inconvenience — it's a fundamental category mismatch. Template-based extraction treats every document as a spatial puzzle: find the right coordinates. Handwriting is a semantic puzzle: find the right meaning. The two approaches are solving different problems. You can't draw enough bounding boxes to cover every possible location where someone might write "Total Due" — and if you could, the tool would still misread the handwriting because it's matching shapes, not understanding context. To understand why shape-matching alone falls short, see our breakdown of how AI handwriting recognition differs from traditional OCR.
What "Custom Column Extraction" Actually Means — and Why It's a Different Paradigm
Custom Column Extraction reverses the workflow. Instead of telling the tool where to look (coordinates, templates, bounding boxes), you tell the tool what you want — and let it figure out where on each page that information lives.
Here's what that looks like in practice. You open a blank interface and type the field names you need, exactly the way you'd type column headers in a spreadsheet:
| Column Name | What the AI Understands |
|---|---|
Invoice Number | "Find the value that looks like an invoice reference — it might be labeled 'Inv #', 'Reference No.', or just appear as a number near the top" |
Date | "Find a date value — it could be handwritten as '5/12' or 'May 12, 2026' or '12.05.26', anywhere on the page" |
Total Amount | "Find the final monetary total — look for the largest number near the bottom, often preceded by '$', 'Total', or 'Amount Due'" |
You're not programming a template. You're not training a model. You're naming the data points you care about — and the AI uses its understanding of document structure, field semantics, and visual context to locate each value. The column names you typed become the headers of your output spreadsheet. The AI fills each row with the matching values it finds on each page.
This is where the paradigm shift lives. Template-based tools ask you to adapt your documents to the tool's rigid coordinate system. Custom column extraction adapts the tool to your documents — any handwriting style, any layout, any number of pages. The interface is a column name. The output is a spreadsheet. Everything in between — the visual parsing, the handwriting decoding, the field matching — is the AI's job, not yours.
The mental model shift: Template-based extraction says "the value lives at this coordinate." Custom column extraction says "the value is whatever answers this question." One requires you to know the document before processing it. The other requires you to know what information you need — regardless of what the document looks like.
Define Once, Process Any Penmanship: How the AI Finds Your Fields Across Documents
The hardest problem in handwriting extraction isn't reading individual letters — it's identifying which handwritten scribble corresponds to which field when every page looks different. A printed invoice from a known vendor has predictable structure: the invoice number sits in the top-right, the total sits in the bottom-right, and line items fill the middle. A handwritten document from a different person each time has none of this predictability. The "Total" could be anywhere.
This is why column-name extraction depends on semantic anchoring rather than positional anchoring. When you type "Total Amount" as a column name, the AI doesn't start scanning from a fixed set of coordinates. It processes the entire page as a visual scene and asks: "what on this page represents a final monetary total?" It considers multiple signals simultaneously:
This multilayered approach is what makes "define once, process any penmanship" possible. The column name provides the semantic target. The AI's vision model provides the flexibility to hit that target regardless of where or how the answer is written. The same column definition that extracts "Invoice Number" from a neat block-print invoice in blue ink also finds it on a messy cursive receipt in pencil — because it's not looking for a shape, it's looking for an answer to a question.
If you've ever needed to extract only specific fields from a form while ignoring everything else, the column-name approach extends naturally — you define only the data points you need and let the AI filter. For a deeper look at this selective-extraction workflow specifically, read our guide on how to extract only the specific data fields you need from handwritten forms.
Real-World Workflow: From a Stack of Mixed Handwriting to a Single Spreadsheet
Here's what a full custom column extraction workflow looks like, start to finish. The scenario: you're an accountant at a small construction firm. Every Friday, seven subcontractors drop off their handwritten timesheets. Each subcontractor has a different handwriting style. Each fills out the form in a slightly different way — some write the date in the corner, some in a designated box, some don't write a date at all and just note the week number. You need four data points from each timesheet: Worker Name, Date, Hours Worked, and Job Site.
Worker Name, Date, Hours Worked, Job Site. That's it. No field mapping, no coordinate boxes, no training samples. These four names are now your permanent extraction template for all handwritten timesheets going forward.Next Friday, the same seven subcontractors drop off another set of timesheets — possibly the same handwriting, possibly a new subcontractor with handwriting you've never seen before. You use the same four column names. The AI handles the rest. The columns persist across sessions, so you're not redefining your fields every week. The extraction template becomes part of your workflow infrastructure, not a per-batch configuration chore.
Files are processed securely and not stored.
Frequently Asked Questions
Do I need to define a separate set of columns for each person's handwriting?
No. That's the entire point. A column name like "Total Amount" works across any handwriting style because the AI isn't recognizing the shape of the handwritten word "Total" — it's understanding that this position on the page contains a monetary sum, regardless of how it's written. The column definitions are handwriting-agnostic. Define them once, use them for every batch.
What if two people write the same field differently — one uses "5/12" and the other writes "May 12"?
The AI normalizes date formats during extraction. Whether someone writes "5/12", "12 May 2026", "05/12/26", or "May 12th", the output lands in a consistent format in your spreadsheet. This normalization applies to dates, currency amounts, and other structured data types — you don't need to clean up formatting variations manually.
How many columns can I define?
There's no hard limit, but the practical sweet spot is between 5 and 30 columns. Define too few and you might miss data you later need. Define too many and you increase the chance that some columns won't have corresponding values on every document — which is fine, the AI leaves those cells empty rather than making up data. The system is built for realistic extraction scopes: not "every possible field on the page," but "the fields you actually need for your downstream process."
Can I define columns that don't appear explicitly on the document?
Yes. This is called an inferred column — a column where the AI reasons about the document rather than finding a pre-existing value. For example, you could define a column called "Category (options: Meals/Transport/Office/Other)" and the AI would examine a handwritten receipt, determine it's from a restaurant, and fill in "Meals" — even though the word "Meals" appears nowhere on the receipt. Inferred columns work for classification, flagging, and any data point where the answer is derivable from context rather than directly written.
What happens if the AI can't find a field on a particular page?
The cell is left blank. The AI doesn't guess or invent values to fill gaps — an empty cell means "I couldn't confidently find this field on this page." You can then manually review that specific document. This is a deliberate design choice: a blank cell is actionable (you know to check), while a hallucinated value is dangerous (you might not catch it until it causes a downstream error).
Custom column extraction starts with a question — "what do you actually need from these documents?" The rest is the AI's interpretation of your handwritten pages through that lens. Try it on a batch of your own documents and see how the same column names hold up across different handwriting styles.