What Is COI Data Extraction?
Turn ACORD Forms into Structured Data
Certificate of Insurance (COI) data extraction is the automated process of reading key insurance fields — like policy number, insured name, coverage types, limits, effective and expiration dates, and additional insured status — from a scanned or PDF COI form and outputting them as structured data for compliance tracking. Instead of a person opening each ACORD 25 certificate and manually typing carrier names and policy limits into a spreadsheet cell by cell, extraction software does the reading and the data entry in seconds.
Key Takeaways
- Most COI extraction tools draw a box around where each field should be — and silently read the wrong text when the next agency's form shifts by half an inch.
- A seven-figure liability gap hides behind a single mistyped coverage limit on one subcontractor's COI — and template-based tools produce these errors whenever an agency changes its form layout.
- Extraction that reads fields by meaning instead of position handles any agency's format with zero setup — and the real transformation is not the speed gain, it is that transcription errors stop being part of your compliance equation.
What COI Data Extraction Actually Is
COI data extraction is not the same as scanning a certificate or running OCR on it. Scanning gives you a picture of the form. OCR converts the picture into readable text — a wall of undifferentiated characters. Extraction goes further: it identifies which text is the policy number, which is the general liability aggregate limit, which is the expiration date, and places each value into a labeled column in a spreadsheet. The output is not a text file. It is structured, filterable, sortable data.
The standard vehicle for this data is the ACORD 25 — "Certificate of Liability Insurance" — the most widely used COI form in the U.S. commercial insurance market. Developed by the Association for Cooperative Operations Research and Development (ACORD), this one-page form condenses coverage details from a 100+ page policy into a standardized grid. But here is what matters for extraction: nearly every insurance agency modifies the standard ACORD layout. Some add their own headers and footers. Some rearrange the coverage sections. Some use electronic fill-in that shifts field positions. Some still issue typewriter-filled paper forms that scan with slight misalignment. The form is standardized in name only.
The fields typically extracted from an ACORD 25 COI fall into three groups:
Policy Identity
- Named Insured
- Policy Number
- Insurance Carrier
- NAIC Number
- Producer / Agency
Coverage & Limits
- General Liability (per occurrence / aggregate)
- Automobile Liability
- Workers' Compensation
- Umbrella / Excess Liability
- Professional Liability (where applicable)
Dates & Parties
- Policy Effective Date
- Policy Expiration Date
- Certificate Holder
- Additional Insured
- Description of Operations
Getting the coverage limit rows right is where extraction separates from OCR. The general liability section alone has up to five sub-limits — each occurrence, damage to rented premises, medical expense, personal & advertising injury, and general aggregate — often displayed in a compact grid where column alignment can shift between agencies. A tool that reads meaning rather than position can distinguish "$1,000,000" as the per-occurrence limit even when the label above it reads "EACH OCCURRENCE" for one agency and "PER OCC" for another.
COI Data Extraction vs COI Tracking Software vs Manual Review
These three terms get used interchangeably in construction compliance conversations, but they refer to different layers of the same problem — and conflating them leads to buying a platform that solves a problem you do not have, or worse, missing the piece you actually need.
Manual COI review is the baseline: someone opens each PDF certificate, reads the named insured, policy number, coverage limits, and expiration date, and types these values into a tracking spreadsheet. This is what most small and mid-sized GCs do. The spreadsheet then serves as the compliance record — sorting by expiration date, filtering by project, flagging gaps manually. The data entry step takes 5-10 minutes per certificate. At 40 subcontractors, that is 3-7 hours per renewal cycle. At 200, it is a full-time job that never ends.
COI data extraction automates only the data entry step. It does not send renewal reminders, does not maintain a vendor portal, does not compare extracted limits against contractual requirements — it reads the certificate and outputs structured data. You still decide where that data goes and what to do with it. For a GC who already has a spreadsheet-based compliance process that works, extraction removes the bottleneck without forcing a workflow change.
COI tracking software — platforms like myCOI (rebranded as illumend), TrustLayer, bcs, and Jones — automates the full compliance workflow: automated renewal requests to subcontractors, vendor portals for direct upload, coverage gap flagging against project requirements, audit-ready reporting, and integrations with Procore or CMiC. These platforms start at $200-500 per month and are built for organizations where subcontractor COI compliance is a department-level function. The extraction step is one component inside them — but bundled with workflow management you may or may not need.
The distinction matters because a GC managing 30 subcontractors does not have the same problem as a national contractor managing 300. For 30 subs, the bottleneck is data entry — reading 30 ACORD forms without transcription errors. For 300 subs, the bottleneck is workflow — chasing renewals across hundreds of expiration dates without a system. Extraction solves the first. Tracking platforms solve the second. Understanding which layer fits your scale is the decision that matters.
How COI Data Extraction Works
The mechanism that makes modern COI extraction work is fundamentally different from what powered document processing a decade ago. Understanding this difference explains why extraction accuracy on insurance certificates jumped from unreliable to production-grade in the last three years.
Position-Based (Template OCR)
Draws a bounding box around where "Policy Number" should appear on the page. Extracts whatever text falls inside that box. When the next agency's ACORD 25 shifts the field by half an inch — different font, different margins, different software — the box captures the wrong text or nothing at all. Every format variation requires a new template.
Semantic-Based (AI Extraction)
Reads the entire page and understands what each piece of information means. It knows that "GEN'L AGGREGATE LIMIT" and "GENERAL AGGREGATE" refer to the same thing regardless of abbreviation, font, or position. It finds the policy number by recognizing the pattern of a policy identifier — not by hunting for it at a fixed coordinate. One setup works across every agency's version of the ACORD form.
This shift from position-based to semantic-based extraction is the reason a tool can process COIs from different insurance agencies in the same batch without configuration changes. An agency in Texas may place the certificate holder block in the bottom-left with a 10pt font. An agency in California may place it in the bottom-center with 8pt. A template-based tool needs two separate templates — and both break when either agency changes its form software. Semantic extraction handles both with zero setup because it is not looking at coordinates. It is looking for a field called "Certificate Holder" and the name next to it.
The extraction workflow itself is straightforward, regardless of the underlying technology:
Upload the COI
Drop a PDF or scanned image of the certificate. Standard ACORD 25 forms and most carrier-issued certificates are supported.
Define what you want to extract
Type the column names you need — "Policy Number," "GL Each Occurrence Limit," "Expiration Date." The AI reads the document to find each value by meaning, not by position. This is Custom Column Extraction: you define the output columns, the AI locates matching data wherever it appears on the form.
Review and export
Extracted fields appear in a structured table. Verify the output — especially additional insured language and coverage limit values — then export to Excel, CSV, or directly into a Google Sheet.
Files are processed securely and not stored.
The AI reads the fields printed on the certificate form — it does not read the full insurance policy behind it, and it does not interpret whether an additional insured endorsement (e.g., CG 20 10 vs CG 20 37) meets your contractual requirements. Extraction tells you what the certificate says. A compliance professional determines whether what it says meets your standards.
When You Need COI Data Extraction
COI extraction is not a universal need. For a company that receives three certificates a year, manual entry is faster than setting up any tool. But there are specific scenarios where the volume and recurrence of COI processing make extraction the difference between a manageable process and a compliance liability.
General Contractor Subcontractor Compliance
The dominant use case. A mid-size GC managing 40-80 subcontractors across multiple active projects receives COIs on a rolling basis — new subs coming onto the job, existing subs renewing policies mid-project, coverage changes after claims. Each certificate needs the same fields extracted and compared against the same project requirements. At 5-10 minutes per manual review, the data entry alone consumes 3-7 hours per renewal cycle. Extraction collapses that to under a minute per certificate. The time saved is not the win — the elimination of transcription errors in coverage limit values is. A mistyped aggregate limit on one sub's COI is a seven-figure liability gap that no one notices until a claim is denied.
Vendor and Supplier Onboarding
Large property managers, healthcare networks, and manufacturing facilities onboard hundreds of vendors annually — each requiring proof of insurance before stepping onto the premises. The COIs arrive in a flood during onboarding season and a trickle year-round. Manual review at this volume creates a backlog where vendors wait days for compliance clearance. Extraction turns the data entry step into seconds, so the reviewer's time goes to the judgment calls — whether the additional insured language is correct, whether the coverage limits match the contract — rather than the transcription step.
Property Management Tenant COIs
Commercial property managers require COIs from every tenant as a lease condition. A single office building with 50 tenants means 50 certificates to track, each renewing on a different anniversary date. The extraction task is repetitive and year-round — same fields, different renewal dates, different carrier names. Administrative staff who process tenant COIs are not insurance experts; extraction removes the data entry burden so they can focus on flagging what looks wrong rather than typing what they see.
Annual Insurance Audits
Whether internal or external, an annual insurance compliance audit requires pulling structured data from every active COI on file. If your COIs live as PDFs in a shared drive with no searchable index, the audit means re-opening every file and re-reading every field. If you have been extracting COI data into a spreadsheet or database all year, the audit-ready record already exists — sortable by expiration date, filterable by project, exportable in one click. COI non-compliance costs in construction compound when the data needed to prove compliance is scattered across inboxes and network folders.
What to Look For in a COI Data Extraction Tool
Not every extraction tool handles ACORD certificates well. The dense coverage grid, the variations in agency formatting, and the compliance-critical nature of the data mean you need specific capabilities — not just any document extraction product with "COI" checked on its supported formats list.
Template-free field recognition. This is the non-negotiable. If the tool requires you to draw zones or create a template for each insurance agency's version of the ACORD 25, pass. The whole point is that you should not need to know how each agency lays out its form. A semantic extraction engine — one that reads by field meaning rather than field position — is the difference between a tool that works on day one and a tool that requires perpetual maintenance. AI document extraction that understands what a policy number looks like, regardless of where it sits on the page, is the mechanism that makes this possible.
Batch processing. A single COI is a one-minute task. Fifty COIs from forty subcontractors after a quarterly renewal push is when extraction earns its keep. The tool should let you upload multiple certificates at once and merge the extracted data into a single spreadsheet — one row per COI, columns for every field you named.
Coverage table handling. The general liability section on an ACORD 25 is not a single value — it is a grid of sub-limits: each occurrence, damage to premises, medical expense, personal injury, and aggregate. An extraction tool that pulls "$1,000,000" without labeling which sub-limit it belongs to is producing unusable data. The tool should preserve the relationship between each limit type and its dollar value.
Spreadsheet-native output. Extracted COI data lands where compliance tracking happens — in a spreadsheet. Export to Excel or direct insertion into Google Sheets via an add-on eliminates the intermediate export-then-import step that adds friction and another chance for error.
Handles non-standard certificates. Not every COI is a clean ACORD 25 PDF from a major carrier. Smaller agencies issue certificates on their own letterhead. Subcontractors sometimes submit photographed paper certificates from a job-site trailer. The extraction tool should handle these edge cases — PDFs, images, and non-ACORD layouts — without requiring a different workflow for each format.
FAQ
What is the difference between COI extraction and COI tracking?
COI extraction is the data entry step — reading fields from a certificate and outputting them as structured data. COI tracking is the full compliance workflow — automated renewal reminders, coverage gap detection, vendor portals, audit reporting. Extraction answers "what does this certificate say?" Tracking answers "is every subcontractor on this project compliant right now?" Most tracking platforms include extraction, but extraction tools do not include workflow management. If your current spreadsheet process works and the only bottleneck is typing data from PDFs, extraction alone solves your problem.
Does COI extraction work with handwritten certificates?
It depends on the handwriting quality. Modern AI extraction can read clearly printed handwriting on ACORD forms at useful accuracy — particularly for numeric fields like policy numbers and dollar amounts, which tend to be written more carefully than narrative text. Severely cursive or faint handwritten certificates will produce lower accuracy and may require manual review. The best approach is to test with your actual documents: upload a sample and verify the extracted fields against the original. For subcontractors who consistently submit handwritten certificates, requesting a digitally issued replacement from their insurance agent is the more reliable path.
Can COI extraction detect whether coverage limits meet my requirements?
No. Extraction reads and outputs what the certificate states. It does not compare extracted values against your contractual coverage minimums. That comparison — "does this sub's $500,000 general liability limit meet our $1,000,000 requirement?" — is a compliance judgment, not an extraction task. Some COI tracking platforms automate this comparison. Standalone extraction tools give you the data; you apply the rules.
What is an ACORD 25 form and why does it matter for extraction?
The ACORD 25 — "Certificate of Liability Insurance" — is the standard COI form used across the U.S. commercial insurance industry. It was developed by ACORD (Association for Cooperative Operations Research and Development), a nonprofit standards body serving the insurance industry since the 1970s. The form matters for extraction because it provides a standardized field structure — named insured, policy number, coverage types and limits, effective/expiration dates, certificate holder, additional insured — that every extraction tool targets. However, individual agencies modify the layout, which is why template-based extraction fails and semantic extraction is needed.
How accurate is COI data extraction?
On clean, digitally generated ACORD 25 PDFs, modern AI extraction achieves 95-99% accuracy for structured fields — policy numbers, dollar amounts, dates, named entities. Accuracy drops on photographed paper certificates (skew, lighting, resolution), handwritten forms, and non-standard layouts. No extraction tool achieves 100% accuracy on every certificate, which is why the output should be reviewed before it drives compliance decisions. The value proposition is not zero-review — it is replacing 5-10 minutes of manual transcription with 10-20 seconds of review.
Can I extract COI data from certificates issued by different insurance agencies in one batch?
Yes — and this is the scenario where semantic extraction outperforms template-based tools. Because semantic extraction reads by field meaning rather than fixed position, you can upload COIs from five different agencies in a single batch and extract the same fields from all of them. The AI locates "Policy Number" whether it appears in the top-right on one agency's form or mid-left on another's. Batch subcontractor COI tracking becomes practical when you can process mixed-agency certificates together.
Is COI extraction the same as OCR?
No. OCR (Optical Character Recognition) converts an image of text into machine-readable characters — it answers "what characters are on this page?" but not "which of these strings is the policy number?" COI extraction is the next step after OCR: it identifies which text corresponds to which insurance field and structures the output into labeled columns. OCR gives you an undifferentiated text dump. Extraction gives you a compliance-ready spreadsheet. An OCR tool pointed at an ACORD 25 produces every word on the form in one block. An extraction tool produces a table with a "Policy Number" column, a "GL Aggregate Limit" column, and an "Expiration Date" column — each containing exactly one value.