How to Extract COI Data
for M&A Due Diligence (2026 Guide)
In a recent survey of M&A practitioners, the International Bar Association identified four categories of risk hidden inside a target company's insurance portfolio: unpaid premiums that voided coverage retroactively, missing mandatory policies that triggered regulatory penalties, denied claims that drained balance-sheet reserves, and expired policies nobody noticed until after closing. Every one of those risks was discoverable in the insurance certificates sitting in the data room. What made them invisible was not lack of expertise — it was the sheer volume of certificates that forced the review team to sample rather than inspect.
Key Takeaways
- A 300-certificate deal room costs $5250 in transcription alone before your team can begin the coverage analysis the deal requires.
- RWI underwriters now expect complete portfolio review and sampling 20 certificates out of 300 can void the very insurance your client bought to protect the deal.
- Export the entire folder once and get a full coverage matrix in 25 minutes so your analysis starts from every certificate not a sample.
What Insurance Due Diligence Actually Looks Like in a Deal Room
Insurance due diligence in an M&A transaction is not the same exercise as subcontractor compliance tracking. In construction, a general contractor checks whether each sub's COI meets the project's minimum limits before they step on site. The question is binary: compliant or not. The reviewer typically processes two to three certificates per sub, once per renewal cycle.
In M&A, the reviewer opens a virtual data room — a secure online document repository used to share confidential deal information — and finds an Insurance folder containing every policy the target company has ever placed. Property. General liability. Directors and officers. Cyber. Employment practices liability. Auto. Workers' compensation. Umbrella and excess. Environmental. Professional liability. Product liability. Key man life insurance. For a mid-size target company with 300 to 800 employees, that folder routinely holds 300 to 500 certificates of insurance — each a single-page ACORD 25, 27, or 28 form, or a non-standard certificate from a regional agency on its own letterhead. For more on how the standard forms work, read our complete guide to COI extraction.
The review team's job is not to check boxes. It is to answer a set of questions that feed directly into the purchase agreement's representations and warranties: Does the target company carry adequate coverage for the risks inherent in its operations? Are there gaps between policy expiration dates and the projected closing timeline that would require tail coverage? Do the insurers themselves have financial-strength ratings that a prudent buyer would accept? Has the target maintained continuous coverage, or are there lapses that create uncovered exposure windows?
Answering those questions means extracting the same 15 fields from every certificate and populating a coverage matrix — a spreadsheet where rows are policies and columns are insurers, limits, deductibles, effective dates, and carrier ratings. Only once the matrix exists can the real analysis begin: comparing coverage against industry benchmarks, identifying expiration windows that fall inside the deal timeline, and flagging carriers whose AM Best ratings fall below the buyer's risk tolerance.
This is a fundamentally different workflow from the construction COI tracking most automation tools are built for. Our COI data extraction how-to covers the general extraction approach. This article focuses on what changes when the document stack sits inside a deal room and the output feeds into a purchase agreement.
Why Manual COI Review Breaks at M&A Volume
A junior associate opening her 50th certificate of the day is not making the same quality of judgment call she made on her fifth. The architecture of the task works against her. Reading a COI is not like reading a contract paragraph — it is a dense grid of abbreviations, numeric limits, checkbox states, and agency-specific formatting quirks spread across a single page with no narrative structure to guide the eye.
Each certificate requires locating and transcribing roughly 15 fields: named insured, producer or agency, each insurer with its NAIC number, policy type, policy number, effective date, expiration date, per-occurrence limit, aggregate limit, other sub-limits (medical expense, personal and advertising injury, damage to rented premises), deductible or self-insured retention, additional insured status, waiver of subrogation, certificate holder, and notice of cancellation terms. For the umbrella and excess policies, a separate limit block repeats the same structure with different numbers.
At three minutes per certificate for extraction alone — an optimistic pace that assumes the form is a clean ACORD 25 with no layout surprises — 300 certificates consume 15 hours of pure data entry. At a mid-level associate billable rate of $350 to $500 per hour, that is $5,250 to $7,500 in billable time spent on transcription. But the extraction is not the real work. The real work is the coverage analysis that can only begin after the matrix exists — and that analysis, across 300 policies with their endorsements and carrier qualifications, consumes another 25 to 45 hours. The extraction step is a toll booth on the road to the analysis. Every minute spent transcribing is a minute not spent on the risk assessment the client is actually paying for.
The error rate compounds with volume. A 2019 academic study on manual data entry accuracy in document-intensive workflows found that after approximately 100 repetitions of the same extraction task, field-level error rates climbed from roughly 2% to over 8%. At 300 certificates and 15 fields each, that is 4,500 individual data points. An 8% error rate means roughly 360 fields are wrong — wrong policy numbers, transposed limits, dates off by a month. One missed expiration date that falls two weeks before closing creates an uninsured gap in the coverage picture. When that gap surfaces post-closing, the cost is not the time it would have taken to catch it — it is the value of the liability it leaves uncovered.
The Fields That Matter in M&A (That Construction COI Trackers Ignore)
Construction COI tracking software is built to answer one question: does this subcontractor's certificate meet the project's minimum coverage requirements? The fields it cares about are the ones written into a subcontract: general liability per-occurrence and aggregate limits, workers' compensation coverage confirmation, additional insured status, and expiration date.
M&A insurance due diligence asks a fundamentally different set of questions about each policy. The following fields, which rarely appear in construction-oriented COI tools, are essential to the deal-room coverage matrix.
AM Best Financial Strength Rating. The AM Best rating is an independent evaluation of an insurer's ability to meet its ongoing policy obligations, issued by a credit rating agency that has specialized exclusively in the insurance industry since 1906. The scale runs from A++ (Superior) through D (In Default), with a total of 13 notched categories. In M&A, the insurer's rating matters because the target company's coverage is only as reliable as the carrier backing it. A policy issued by an A- rated carrier is fundamentally different from one issued by a B++ carrier — and a coverage matrix that shows policy limits without carrier ratings is missing half the risk picture. Most purchase agreement insurance representations require the seller to disclose material changes in insurance coverage, and a carrier downgrade during the deal window is exactly the kind of material change that should trigger a disclosure. AM Best assigns each rated insurer a Financial Size Category (FSC), from Class I (under $1 million in adjusted policyholders' surplus) to Class XV (over $2 billion), providing an additional dimension of carrier assessment beyond the letter rating.
Deductible and Self-Insured Retention (SIR). A $5 million general liability policy with a $500,000 SIR is not the same insurance as a $5 million policy with a $10,000 deductible. In the first case, the target company pays the first $500,000 of every claim out of its own balance sheet before the carrier pays a dollar. That $500,000 per-claim exposure is effectively an uninsured liability that belongs in the financial due diligence model — not just the insurance review. Construction COI workflows often skip deductibles because the project's contract language makes the subcontractor responsible for amounts within the deductible. In M&A, the buyer inherits that SIR exposure. A coverage matrix that shows limits without deductibles is a liability ledger with half the entries missing.
Claims-Made vs. Occurrence and Tail Coverage Triggers. A claims-made policy covers claims reported during the policy period, regardless of when the underlying incident occurred. An occurrence policy covers incidents that happened during the policy period, regardless of when the claim is filed. The distinction becomes deal-critical when the target company carries claims-made coverage for D&O, E&O, or employment practices liability. If those policies are not renewed post-closing, claims arising from pre-closing conduct but reported after closing have no coverage — unless a tail policy (also called an extended reporting period, or ERP) is purchased. Extracting the claims-made vs. occurrence designation for each policy type in the matrix tells the deal team which lines require tail coverage budgeting before the purchase agreement is finalized.
Notice of Cancellation Terms. Most ACORD certificates contain boilerplate cancellation language stating that the insurer "will endeavor to" notify the certificate holder of cancellation, but "failure to do so shall impose no obligation or liability." The actual policy may provide 30, 60, or 90 days' notice — or no notice obligation at all. For an M&A buyer, the period between signing and closing is typically 30 to 90 days. If a key liability policy can be cancelled mid-deal with no notice to the acquiring entity, the coverage picture at signing is not the coverage picture at closing. Extracting the cancellation clause wording from each certificate — even the boilerplate version — and flagging policies whose underlying terms may differ is a risk-identification step that manual review at scale almost never reaches.
Policy Period vs. Deal Timeline Overlap. The most straightforward field on the certificate — the expiration date — becomes the most operationally urgent during due diligence because of the timeline. A policy expiring in week four of a 10-week due diligence period needs a renewal certificate before the coverage matrix is complete. A policy expiring between signing and closing creates a coverage gap that the purchase agreement must address through a pre-closing covenant requiring the seller to maintain insurance. Extraction software can flag every expiration date falling within a user-defined window, turning a calendar-checking task that consumes hours of manual review into an automated filter.
For a deeper discussion of what AI can and cannot reliably read from a COI document, see our analysis of AI COI reading capabilities. For the fundamentals of what COI extraction is, start with what is COI data extraction.
How Semantic Extraction Works on Diverse COI Formats
The COI documents in a deal room come from dozens of different insurance agencies. Some use the current-year ACORD 25 template with fields in the standard positions. Some use the 2014 revision with slightly different spacing. Some are printed from an agency management system that rearranges the coverage grid into a two-column layout. Some are non-standard certificates on the agency's own letterhead — a growing share as more regional and surplus-lines carriers issue certificates from proprietary platforms. A few are scanned paper certificates, rotated at a slight angle, with handwritten notations in the margins. And some are not certificates at all — they are policy declaration pages with a different field layout entirely, sent by an agent who treats them as equivalent.
Position-based OCR — the technology behind traditional template extraction tools — works by memorizing where each field sits on the page. It expects "Policy Number" to be at coordinates (x=340, y=280), and when a different agency places it at (x=420, y=310), the extraction fails silently, pulling data from the wrong field or returning nothing. The alternative approach, which newer AI-based tools use, is semantic extraction: the system reads the document the way a person would, by understanding what each piece of text means rather than where it sits. A policy number is "a string of alphanumeric characters labeled as a policy number and associated with a specific coverage line." The AI locates it regardless of whether it is on the left, right, or middle of the page.
This distinction is the reason a single extraction tool can process 300 certificates from 40 different agencies without per-agency setup. Each certificate is read from scratch, its fields identified by their semantic role, not by their coordinates. Our analysis of COI tracking at scale — written for the construction context — covers the performance gap between template and semantic approaches in detail. The same gap applies in M&A, amplified by the diversity of agency formats in a deal-room insurance folder.
What makes semantic extraction particularly valuable in the due diligence context is what it does not require. There is no training phase where you upload sample certificates and label fields. There is no template builder where you draw boxes around policy numbers. You upload the stack of certificates, type the column names you want in your output — "Named Insured," "Policy Type," "Policy Number," "GL Per Occurrence," "GL Aggregate," "Expiration Date," "AM Best Rating," "Deductible/SIR" — and the AI reads every document to locate those fields wherever they appear. The output is one spreadsheet with each certificate as a row and each field as a column.
From 300 COIs to a Coverage Matrix: The Extraction Workflow
The workflow from a deal-room Insurance folder to a coverage matrix follows five steps. The extraction tool handles steps two through four; the reviewer's judgment remains central to step five.
Step 1 — Collect the certificates. Export the Insurance folder from the virtual data room. Most VDR platforms — DealRoom, Datasite, Intralinks, Ansarada — allow bulk download by folder. Certificates typically arrive as PDFs, though some are embedded in email chains forwarded by the target's broker as screenshots, which you will need to extract as separate image files.
Step 2 — Define your columns. This is where the M&A context shapes the output differently from a construction compliance tracker. Instead of "Meets Minimum Limit (Y/N)," your columns are the fields that populate a coverage matrix: Named Insured, Producer or Agency, Insurer Name, Insurer NAIC Number, AM Best Rating, Policy Type, Policy Number, Effective Date, Expiration Date, Claims-Made Indicator (Y/N), General Liability Per Occurrence, General Liability Aggregate, Umbrella or Excess Per Occurrence, Umbrella or Excess Aggregate, Deductible or SIR, Additional Insured (Y/N), Waiver of Subrogation (Y/N), Notice of Cancellation, Certificate Holder.
Step 3 — Upload and process. Drag all certificates into the upload area. The tool processes them in parallel — batch-first processing means you do not wait for one to finish before the next begins. For 300 certificates, the processing time depends on the tool's concurrency, but an AI-based extractor processes each certificate in 5 to 10 seconds, yielding a complete extraction run in approximately 25 to 50 minutes. Compare that to 15 hours of manual transcription, and the tool is not just faster — it frees the reviewer to do the analysis work those 15 hours of data entry were preventing.
Step 4 — Export the coverage matrix. The output is an Excel file with each certificate as a row. Sort by expiration date to see which policies expire before closing. Sort by AM Best rating to identify carriers below the buyer's risk threshold. Pivot by policy type to see whether the target carries every coverage line the buyer's industry expects. The matrix is not the deliverable — it is the input to the coverage analysis in step five.
Files are processed securely and not stored.
Step 5 — Analyze coverage gaps. This is where the reviewer's judgment is irreplaceable. The matrix tells you what coverage exists. It does not tell you what coverage is missing. A target company in manufacturing with no product liability policy, or a SaaS company with no cyber coverage, has a gap that the matrix makes visible but does not interpret. The reviewer compares the matrix against industry-standard coverage expectations, examines the underlying policy terms behind each certificate (the certificate is evidence of insurance, not the policy itself), and identifies gaps that belong in the purchase agreement's disclosure schedules.
What Extraction Can't Do (And What Still Needs a Human)
Being clear about what the tool cannot do is as important as explaining what it can. Three things remain firmly in the human reviewer's domain after extraction is complete.
Coverage adequacy assessment. Extraction tells you that a policy exists with a $2 million per-occurrence limit. It does not tell you whether $2 million is enough. A chemical manufacturer with $2 million in GL coverage is underinsured relative to industry norms. A software company with the same limit is not. That judgment requires knowledge of the target's industry, operating risks, and claims history — none of which extraction software has.
Endorsement interpretation. A COI may indicate that the certificate holder is an additional insured, but the specific endorsement form matters. A CG 20 10 endorsement covers the additional insured only for ongoing operations. A CG 20 37 extends coverage to completed operations. A CG 20 33 provides automatic additional insured status for any entity the named insured is contractually required to cover. Extraction can detect that the checkbox is checked. It cannot read the underlying endorsement language embedded in the policy, because the endorsement is not on the certificate — it is in the policy document that the certificate summarizes.
Gap-to-rep translation. The most valuable skill a deal lawyer brings to insurance due diligence is not data extraction — it is translating a coverage gap into a specific contractual protection. A gap in D&O tail coverage becomes a covenant requiring the seller to purchase a six-year tail policy before closing. A substandard carrier becomes a rep that all policies are placed with carriers rated A- or better by AM Best. A deductible structure that creates uninsured balance-sheet exposure becomes a special indemnity. Extraction produces the data that reveals the gap. The lawyer produces the deal term that protects against it.
Extraction makes the review team faster, not obsolete. Its value is not in replacing human judgment — it is in removing the transcription labor that currently prevents human judgment from having enough time to operate.
The RWI Factor: Why Insurance Due Diligence Matters More Now
Representation and warranty insurance (RWI) — a policy that covers losses from breaches of the seller's representations and warranties in the purchase agreement — is now used in an estimated 75% of private equity M&A transactions and 64% of large strategic acquisitions. RWI changes the insurance due diligence calculus in one critical way: the RWI underwriter reviews the buyer's due diligence process before binding coverage. If the buyer's insurance review was superficial — if the team opened 20 certificates out of 300 and called it done — the underwriter can exclude insurance-related losses from RWI coverage on the grounds that the buyer did not conduct adequate diligence.
This creates a diligence feedback loop. The more thoroughly the certificates are reviewed, the stronger the RWI coverage. Skipping or sampling the insurance review does not just leave coverage gaps undiscovered — it jeopardizes the insurance the buyer purchased to protect against post-closing surprises in the first place. The IBA's Legal Due Diligence Handbook, published in coordination with the IBA Corporate and M&A Law Committee, specifically identifies inadequate insurance due diligence as a source of post-transaction liability to the buyer, noting that "the lack of proper insurance policies purchased by the target may result in post-transaction liabilities" which RWI may not cover if the buyer's diligence was insufficient. (IBA Legal Due Diligence Handbook)
In this environment, the question is not whether to review all the certificates — it is how to review them completely without burning billable hours on data entry that a tool can handle in seconds. The extraction workflow described above produces a reviewable matrix from every certificate in the folder. The RWI underwriter sees that the buyer reviewed the full population, not a sample. The gaps that surface are deliberate findings, not oversights.
FAQ: COI Extraction for M&A Due Diligence
How many COIs does a typical M&A insurance due diligence involve?
For a mid-market target company with 300 to 800 employees and operations in multiple states, the Insurance folder in the data room typically contains 200 to 500 certificates. This includes primary layer policies, umbrella and excess layers, and state-specific filings where the target is qualified in multiple jurisdictions. Larger targets with international operations, multiple subsidiaries, or heavily regulated industries can exceed 1,000 certificates.
What if half my COIs are non-standard formats from regional agencies?
Semantic extraction — which reads by field meaning rather than field position — handles non-standard certificates without per-agency configuration. Whether the certificate is a standard ACORD 25 or a proprietary form from a regional surplus-lines broker, the AI identifies fields by their semantic role (e.g., "this is a policy number," "this is an expiration date") rather than by memorized coordinates. There is no template to build and no training data to supply.
How accurate is AI extraction on COI documents?
Printed text on standard ACORD forms achieves accuracy rates in the high-90s percentile range. Handwritten notations, heavily skewed scans, and water-stained paper certificates degrade accuracy — the same factors that make them difficult for a human to read. Our article on whether AI can read COI documents covers the accuracy-by-format breakdown in detail. For due diligence purposes, the practical comparison is not extraction vs. perfection — it is extraction vs. a reviewer on their 187th certificate of the day.
Does RWI underwriter accept AI-extracted insurance data as part of due diligence?
RWI underwriters evaluate the thoroughness of the buyer's due diligence process, not the specific tools used. An extraction matrix that covers every certificate in the folder demonstrates a complete review of the insurance portfolio. What matters to the underwriter is that the buyer reviewed the full population and identified material gaps. The tool that accelerated the data entry does not diminish the quality of the diligence — it increases the scope the review team could cover within the deal timeline.
Why not just use a data room's built-in OCR search to find key terms across COIs?
VDR platforms such as Datasite and Intralinks provide OCR-based full-text search across all documents in the room. That is useful for finding every certificate that mentions "Chubb" or "AIG," but it does not produce a structured matrix. A search result of 47 documents containing "General Liability" tells you where the term appears; it does not populate a column with the per-occurrence limit for each of those 47 policies. Full-text search helps you locate documents. Extraction turns documents into an analyzable dataset.
Can extraction detect whether a claims-made policy needs tail coverage?
Extraction can flag which policies are claims-made (by reading the "Claims-Made" checkbox or designation on the certificate) and which are occurrence-based. It cannot determine whether tail coverage is necessary — that is a judgment call based on the deal structure, the policy's retroactive date, the target's claims history, and the buyer's risk appetite. Extraction gives you the "claims-made" column in your matrix. The deal lawyer reads that column and determines which lines require tail coverage budgeting.