Can AI Read COI Documents?
Yes — From ACORD Forms to Compliance Data
Yes. AI can read and extract data from ACORD 25 Certificate of Insurance forms — including policy numbers, coverage types, limits, effective and expiration dates, and additional insured status. AI handles both digital and scanned COIs from any insurance agency, even when agencies modify the standard ACORD layout. The accuracy on clean digital certificates reaches 95-99% for structured fields. Hand-filled paper forms and decades-old typewriter certificates are where accuracy drops — but for the COIs that arrive in most construction and property management inboxes today, AI is production-ready.
Key Takeaways
- Twenty hours a month retyping subcontractor COI data makes you wonder if compliance work is really just expensive typing.
- Position-based COI tools silently misread coverage limits the moment a different insurance agency formats its ACORD 25 form differently, and a single mistyped aggregate limit across 200 certificates becomes a seven-figure liability gap nobody spotted.
- Eliminate the transcription entirely and your job shifts from retyping numbers to verifying them — ten seconds of compliance review per certificate where you used to spend five minutes of data entry.
How Well AI Reads COI Documents Today
The ACORD 25 — "Certificate of Liability Insurance" — is the standard form used across U.S. commercial insurance. It packs an entire policy's coverage details onto a single page: policy numbers, five general liability sub-limits, auto liability, workers' compensation, umbrella coverage, and the parties involved — certificate holder, additional insured, producer, carrier. For a construction project manager verifying subcontractor compliance, each of those fields matters. A mistyped aggregate limit is a seven-figure liability gap.
AI handles the ACORD 25 structure well because the form is dense but predictable. Every certificate carries the same field categories in roughly the same arrangement — even when individual agencies shift margins, change fonts, or add their own headers. This is where the mechanism behind the extraction matters. Position-based tools that draw bounding boxes around expected field locations break the moment an agency uses different form software. The fields shift by half an inch and the tool reads the wrong text.
Semantic extraction — the approach behind modern AI document reading — works differently. Instead of looking at where a field sits on the page, it reads the entire document and identifies each value by what it means. It knows that "GEN'L AGGREGATE LIMIT" from one agency and "GENERAL AGGREGATE" from another refer to the same coverage type — regardless of abbreviation, position, or font size. This is Custom Column Extraction: you define the output columns you need ("Policy Number," "GL Each Occurrence Limit," "Expiration Date"), and the AI locates each value by understanding the document's content — not by matching coordinates. For a deeper look at how this mechanism compares to older approaches, see how COI data extraction works.
The coverage grid on an ACORD 25 deserves special attention because it's where most extraction tools stumble. The general liability section alone contains up to five sub-limits displayed in a compact table:
| Sub-Limit | Common Label Variations | AI Extraction Accuracy (Digital PDF) |
|---|---|---|
| Each Occurrence | EACH OCCURRENCE, PER OCC, EA OCC | 96–99% |
| Damage to Rented Premises | DAMAGE TO RENTED PREM, FIRE DAMAGE, RENTED PREMISES | 94–98% |
| Medical Expense | MED EXP, MEDICAL PAYMENTS, MED EXP (Any one person) | 95–98% |
| Personal & Advertising Injury | PERS & ADV INJURY, PERSONAL INJURY, PI | 93–97% |
| General Aggregate | GEN'L AGGREGATE, GENERAL AGGREGATE, PRODUCTS-COMP/OP AGG | 96–99% |
The accuracy numbers above apply to clean, digitally generated ACORD 25 PDFs — the format that dominates modern insurance agency output. When column alignment shifts between agencies — one places "Each Occurrence" left-aligned, another center-aligns all sub-limit labels — a position-based tool reads dollar amounts from the wrong row. Semantic extraction avoids this because it reads the label-to-value relationship by meaning: "$1,000,000" next to a label that means "each occurrence" is correctly attributed regardless of grid alignment.
Files are processed securely and not stored.
What AI Gets Right on COI Documents
Digital ACORD 25 PDFs from major carriers. This is the baseline case and where AI performs at its best. When an insurance agency generates a certificate from its management system — whether Applied Epic, Vertafore, or another platform — the output is a clean, machine-generated PDF with consistent fonts, clear field boundaries, and predictable data placement. AI reads these at 95-99% accuracy for the structured fields that drive compliance decisions: policy numbers, dollar amounts, dates, and named entities. The remaining 1-5% are edge cases like unusually formatted NAIC numbers or agency-specific abbreviations that fall outside common patterns.
Scanned COIs emailed from subcontractors. In construction, subcontractors rarely send original digital certificates. They email scanned copies — sometimes clean flatbed scans, sometimes smartphone photos from a job-site trailer. AI handles clean scans with minimal accuracy loss (2-5 percentage points). The vision model compensates for slight skew and resolution variation the same way it handles mixed-agency formats — by reading semantically rather than relying on pixel-perfect alignment.
Batch processing across different insurance agencies. A general contractor managing 40 subcontractors receives COIs from a dozen different agencies, each with its own ACORD 25 formatting quirks. Because semantic extraction does not depend on per-agency templates, you can dump all the certificates into one batch — PDFs from State Farm, Liberty Mutual, Travelers, and regional carriers — and extract the same fields from all of them in a single processing run. The output is one spreadsheet with one row per COI, regardless of how each agency laid out its form. Batch subcontractor COI tracking becomes practical only when the extraction layer handles format variation without configuration.
Additional insured and certificate holder identification. The bottom section of an ACORD 25 — certificate holder name, additional insured status, description of operations — is text-dense and varies more between agencies than the coverage grid. AI handles this section by understanding it as natural language rather than a fixed-position field. Whether the additional insured is listed as "XYZ General Contractors, Inc." or "XYZ GC" with a CG 20 10 endorsement reference, the AI extracts the entity name and any attached endorsement codes. What it does not do — and no extraction tool claims to do — is evaluate whether the listed endorsement language meets your contractual requirements. That is a compliance professional's judgment call.
Where AI Reading COI Documents Still Struggles
The three scenarios where AI accuracy drops meaningfully share one root cause: degraded visual quality that makes characters ambiguous even to a human reader.
Hand-filled paper ACORD forms. Before digital certificate issuance became standard, insurance agents filled out physical ACORD 25 forms by hand — ballpoint pen on carbon-copy paper. These forms still circulate, particularly from smaller regional agencies and for older policies. The problem is compound: handwriting reduces character recognition accuracy, and carbon-copy degradation (fading, smudging, bleed-through from the copy beneath) adds visual noise. On a hand-filled COI with average handwriting, expect 70-80% field-level accuracy — usable for data entry acceleration but requiring systematic review of every extracted value. On messy handwriting with faded carbon copies, accuracy drops below 70%, and manual re-entry may be faster than verification.
Decades-old typewriter certificates. Typewriter-struck COIs present a different challenge. Typewriter characters are printed, not handwritten — but old typewriter forms have inconsistent strike pressure, misaligned baselines, and ink that has faded unevenly over 15-20 years. The AI can read most characters but misreads individual letters where the strike was light. A "3" with a faint top curve becomes an ambiguous character that the model may guess wrong. These forms also tend to have been scanned on older equipment at low resolution, compounding the legibility problem.
Heavily modified non-standard agency layouts. Some agencies issue certificates on their own letterhead rather than on the standard ACORD 25 form — particularly for niche coverage types like professional liability or pollution liability. These non-standard certificates retain the same data categories (policy number, limits, dates, parties) but rearrange them into agency-specific layouts with custom section headers. AI still reads them — that's the advantage of semantic extraction — but the accuracy drops 3-8 percentage points because the model's training distribution is weighted toward the standard ACORD structure. If your subcontractor pool includes many small agencies issuing non-standard certificates, test a sample batch to calibrate expectations.
Photographed paper certificates from job sites. A subcontractor snaps a photo of their COI on the dashboard of a truck and texts it to the project manager. The photo is skewed, poorly lit, taken at an angle, and captured at low resolution. AI can still attempt extraction — and modern vision models correct for moderate skew and lighting variation — but accuracy on field-site photos drops to 60-75%. The fix is procedural, not technical: require subcontractors to submit digital copies or clean flatbed scans as a condition of contract compliance.
AI reads what the certificate says — not what the underlying insurance policy contains. It extracts policy numbers and coverage limits from the ACORD form itself. It does not verify whether the policy actually exists, whether the limits are current, or whether the additional insured endorsement meets your contractual language requirements. Extraction is a data entry accelerant, not a compliance audit replacement.
How to Get the Best Results from AI COI Reading
1. Request digital ACORD 25 forms from subcontractors. The single highest-impact action you can take is procedural: include a clause in your subcontractor agreements requiring insurance certificates to be submitted as digitally generated PDFs, not photographed paper forms. Most insurance agencies have issued digital certificates as standard practice since the mid-2010s. If a sub is still submitting hand-filled paper forms, their agent can generate a digital replacement in under five minutes. This one requirement moves your extraction accuracy from 70-80% to 95%+.
2. Define specific column names that match the COI's field structure. The AI reads by semantic matching — the column name you type guides what it looks for. "Policy Number" is more precise than "Policy Info." "GL Each Occurrence Limit" is more precise than "Liability Limit." For the dense coverage grid, name each sub-limit as a separate column: "GL Each Occurrence," "GL Damage to Premises," "GL Medical Expense," "GL Personal Injury," "GL General Aggregate." The AI uses the column name as a semantic query against the document — the more specific the query, the more accurate the match.
3. Batch COIs from the same renewal cycle together. COIs arrive in waves — 40 subcontractors renewing after a quarterly push, 15 new subs coming onto a project. Processing them in batches that reflect these natural groupings keeps the extracted data organized: one batch per project or per renewal cycle, one spreadsheet per batch, one row per COI. The AI processes all certificates in parallel within a batch, and the merged output means you are verifying one table rather than 40 individual extractions.
4. Always review coverage limits before they drive compliance decisions. Even at 99% accuracy on digital forms, one mistyped aggregate limit across 200 certificates is a liability gap. The workflow that makes sense for COI extraction is: AI extracts all fields → you verify coverage limits and additional insured language → data lands in the compliance spreadsheet. The time savings come from elimination of transcription — you review 10-20 seconds per certificate instead of typing for 5-10 minutes. COI non-compliance costs in construction are too high to skip verification entirely.
5. Use a tool that outputs directly to your tracking spreadsheet. The fewer steps between extraction and your compliance record, the lower the chance of copy-paste errors. If your compliance tracking lives in Excel or Google Sheets, choose an extraction tool that exports directly to that format — no intermediate CSV download, no re-import step. Each handoff between tools is an opportunity for data to shift columns or lose formatting. For a hands-on walkthrough, try extracting COI data to Excel with your own certificates.
Real Examples: Where AI COI Reading Makes the Difference
Construction Subcontractor COI Tracking
A mid-size general contractor managing 50 active subcontractors across three projects receives COIs on a rolling basis — new subs joining, existing subs renewing, coverage changes after claims. Each certificate requires the same 12-15 fields extracted: named insured, policy number, carrier, five GL sub-limits, auto liability, workers' comp, umbrella, effective date, expiration date, certificate holder, additional insured. Manual entry takes 5-10 minutes per certificate. At 50 subs renewing quarterly, that's 200 COIs per year — 16 to 33 hours of pure data entry.
AI extraction collapses the data entry step to under a minute per certificate. The project manager uploads 50 COIs from 12 different agencies as a single batch, defines the extraction columns once, and receives a structured spreadsheet with all fields populated. Verification replaces transcription: 45 minutes of review instead of 4 hours of typing. The real gain is not time — it's the elimination of transcription errors in coverage limit values that no one catches until a claim is denied. For the related challenge of extracting data from subcontractor billing paperwork, see construction invoice extraction.
Vendor and Supplier Compliance Verification
A manufacturing facility onboards 200+ vendors annually, each requiring proof of general liability insurance before stepping onto the premises. The COIs arrive in a flood during onboarding season. Two administrative staff spend two weeks manually entering certificate data into a compliance spreadsheet — policy numbers, carrier names, effective dates, expiration dates, coverage limits. Vendors wait for clearance while their COIs sit in the queue.
AI extraction processes the entire onboarding batch in under an hour. The compliance staff's role shifts from data entry to exception handling — reviewing flagged certificates where limits fall below thresholds, following up on missing fields, verifying additional insured endorsements. The workflow becomes: AI reads everything → staff reviews the edge cases → compliant vendors are cleared, non-compliant vendors get a specific request for updated coverage. The bottleneck moves from the whole batch to the 5-10% of certificates that actually need human attention.
Property Management Tenant Insurance Verification
A commercial property manager overseeing 80 tenants across four buildings must verify that every tenant maintains the general liability coverage required by their lease — typically $1,000,000 per occurrence and $2,000,000 aggregate. Each tenant's COI renews on a different date. The property manager's administrative staff spends one week per month typing renewal certificate data into a tracking spreadsheet, then sorting by expiration date to identify upcoming lapses.
AI extraction handles the ongoing stream: as renewals arrive by email throughout the month, each certificate gets extracted immediately. The staff reviews limits and expiration dates against lease requirements in seconds per certificate rather than minutes. The always-current spreadsheet means the property manager can answer "which tenants have expiring coverage this month?" by filtering a column rather than hunting through a shared drive of PDFs.
FAQ
Can AI read COI documents from different insurance agencies in one batch?
Yes. This is where semantic extraction outperforms template-based tools. Because AI reads by field meaning rather than fixed position, you can upload COIs from five different agencies — each with its own ACORD 25 formatting — and extract the same fields from all of them in a single batch. The AI locates "Policy Number" whether it appears top-right on one form or mid-left on another.
Does AI understand the difference between "Each Occurrence" and "General Aggregate" limits?
Yes. Modern AI models understand that "EACH OCCURRENCE," "PER OCC," and "EA OCC" all refer to the same coverage sub-limit, and that the dollar value next to each label is the corresponding limit — not some other number on the page. This is the core advantage of semantic extraction over template OCR: the model understands what each field label means, so it can correctly attribute values even when column alignment shifts between agencies.
Can AI read handwritten COI forms?
Partially. On clearly printed block-letter handwriting with dark ink on clean paper, AI extracts at 75-85% accuracy. On messy cursive or faint carbon copies, accuracy drops below 70% — at which point manual entry may be more efficient than verification. For subcontractors who consistently submit handwritten certificates, requesting a digitally generated replacement from their insurance agent is the more reliable path. Most agencies can issue a digital ACORD 25 in under five minutes.
What's the accuracy on digital ACORD 25 COI forms?
On clean, digitally generated ACORD 25 PDFs from major insurance carriers, modern AI extraction achieves 95-99% field-level accuracy for structured fields — policy numbers, dollar amounts, dates, named insured, carrier name. Accuracy on the coverage grid sub-limits runs slightly lower (93-98%) due to ambiguous label abbreviations across agencies. Accuracy on free-text fields like description of operations runs 90-95%. No extraction tool achieves 100% across all fields, which is why coverage limits and additional insured language should be spot-checked before driving compliance decisions.
Can AI detect if coverage limits meet my contract requirements?
No. AI extraction reads what the certificate says — "General Liability: $1,000,000 Each Occurrence" — and outputs that value as structured data. It does not compare extracted values against your contractual minimums. The comparison ("does this sub's $500,000 aggregate meet our $2,000,000 requirement?") is a compliance decision, not a data extraction task. Some COI tracking platforms automate this comparison as part of their workflow layer. Standalone extraction tools give you the data; you apply the rules.
Is COI extraction the same as OCR?
No. OCR converts an image of the certificate into machine-readable characters — it answers "what text is on this page?" COI extraction goes further: it identifies which text is the policy number, which is the general liability aggregate limit, which is the expiration date, and places each value into a labeled column in a spreadsheet. OCR gives you the entire certificate as one undifferentiated text block. Extraction gives you a compliance-ready table with a "Policy Number" column, a "GL Aggregate Limit" column, and an "Expiration Date" column — each containing exactly one value.
What fields can AI extract from a COI document?
The standard set includes: named insured, policy number, insurance carrier, NAIC number, producer/agency, all general liability sub-limits (each occurrence, damage to rented premises, medical expense, personal & advertising injury, general aggregate), automobile liability limits, umbrella/excess liability limits, workers' compensation limits, policy effective date, policy expiration date, certificate holder name and address, additional insured name, and description of operations. You define which subset you need — the AI extracts only those columns.