VLM Powered OCR

Convert PDF Quotations to Excel — Extract Fields by Meaning, Not by Layout

A quotation from one vendor looks nothing like a quotation from another. One embeds pricing in a table on page 2. Another writes "your price is $4,750 including delivery" in a narrative paragraph. The numbers are all there — but which one is the quoted price, which is the discount, and which is the tax? Column-name extraction identifies each field by what it means, not where it sits on the page.

Enterprise-grade security · TLS 1.3 encrypted

PDF

Scanned

XLSX/CSV

Any Format

What You Can Extract from a Vendor Quotation

Type the column names you need — the AI finds these values across any vendor's quote layout, whether pricing appears in a structured table, a narrative paragraph, or a mix of both.

Header & Summary Fields

Vendor Name

Quote Number

Quote Date

Valid Until

Payment Terms

Delivery Terms

Currency

Contact Email

Line Item & Financial Fields

Item Description

Quantity

Unit Price

Line Total

Subtotal

Discount

Tax

Grand Total

Shipping Cost

Lead Time

This is not a prescriptive list — type any field name your vendor quotes contain. The AI reads the document to find what you ask for.

Why Extracting Vendor Quote Data Is Harder Than Extracting Invoice Data

Invoices follow a loose standard. Every invoice has a total, a bill-to address, and a due date. Quotations have no such convention. A single page contains a quote number, a valid-until date, perhaps twenty line items, a discount structure applied on tiered quantities, a tax rate that may or may not include shipping, a grand total with separate freight, and payment terms that vary from "Net 30" to "50% upfront." The challenge is not reading the document — it is knowing which number means what.

The Problem

01 Multiple monetary numbers on the same page — with no visual standard for which is which

A quotation page typically contains five to eight monetary values: unit prices, line totals, a subtotal, a discount (percentage or absolute), tax, shipping, and a grand total. Vendor A places the discount between subtotal and tax. Vendor B applies the discount to the unit price directly and shows only the discounted line totals, no separate discount line. Vendor C writes "your price is $12,450 including delivery and tax" in a paragraph — one number carrying three layers of meaning. A human reading the quote understands the difference. Template-based OCR, which matches pixel coordinates to a pre-configured template, outputs whatever number lives at those coordinates — right or wrong — without distinguishing between the discount and the shipping charge. Procurement teams on Reddit's r/procurement consistently name "extracting data from non-standard vendor formats" as a primary source of comparison errors.

02 Pricing structure varies by vendor: tabular, narrative, or both

A manufacturing supplier generates a quote from their ERP — every line item in a clean table with Description, Qty, Unit Price, and Amount columns. An IT services vendor writes theirs in a Word document: "Server Rack Cabinet 42U: 2 units at $1,200 each — $2,400" as flowing text. A construction subcontractor fills out a printed form by hand and emails a scan. Template-based tools require a separate template for each format. With three vendors, that is three templates to build and maintain. With ten vendors per RFQ cycle, the template maintenance overhead eclipses the time saved by extraction.

03 Discount and tax structures are applied differently across vendors, making direct comparison dangerous

Vendor A quotes $10,000 with a 5% line-item discount and tax applied to the discounted subtotal. Vendor B quotes $9,500 flat with tax inclusive. Vendor C quotes $9,200 with a 3% early-payment discount and separate freight. These three quotes produce different values for every financial field. If your manual extraction accidentally treats Vendor B's tax-inclusive price as the pre-tax subtotal, the comparison is wrong by the tax rate (5-28% depending on jurisdiction). The error propagates into every subsequent analysis — scoring, ranking, and final award recommendation — and becomes invisible once the numbers are in a spreadsheet because the spreadsheet presents them as fact.

How Custom Column Extraction Solves This

01 Field identification by semantic meaning — not by pixel position

Custom Column Extraction — the core mechanism of ImageToTable.ai — lets you type the field names you want: "Subtotal," "Discount," "Tax," "Grand Total." The AI reads the document and identifies each value by understanding what it means in the document's context. It recognizes that the number next to "Sub Total" on a table-structured quote and the number described as "the total before tax and freight" on a narrative-format quote are the same field — Subtotal. You do not draw rectangles around each value. You do not configure per-vendor templates. You name the fields once, and the same definition works across every vendor's quote format — tabular, narrative, scanned, or any combination of the three.

02 One column definition works for every vendor — no per-supplier setup

Define your extraction columns once: Vendor Name, Quote Number, Quote Date, Valid Until, Item Description, Quantity, Unit Price, Line Total, Subtotal, Discount, Tax, Grand Total, Payment Terms, Delivery Terms. Use this same column list for every supplier quote you process. The AI reads each document independently and finds the corresponding values by meaning — the vendor whose pricing is in a table and the vendor whose pricing is in a paragraph produce output rows with the same column structure. When a new vendor sends a quote in a format you have never seen before, the same columns work without any new setup. For procurement teams that run RFQ cycles across 5-15 suppliers, this eliminates the template maintenance overhead that makes traditional OCR economically nonviable below very large procurement volumes.

03 Batch processing across all vendor quotes in one operation — unified output, ready for comparison

When you process all vendor quotes in a single batch, each line item from each vendor occupies one row in the output spreadsheet — with every requested field populated. The result is a flat table where each row identifies the vendor, the item, and all financial components of that line item. Because the same column definition was applied to every quote, the Subtotal column on Vendor A's row and the Subtotal column on Vendor B's row contain the same field — the pre-tax, pre-discount subtotal as calculated (or printed) by that vendor. This allows you to load the output directly into a comparison spreadsheet and apply analytical logic — scoring, ranking, normalization — starting from structured, verified data rather than hand-typed numbers. For guidance on building the comparison workflow from extracted data, see the approach detailed in our guide to extracting vendor quote data for side-by-side comparison in Excel.

From PDF Quotations to Comparison-Ready Excel: How It Works

If you receive vendor quotations as PDF attachments and need to compare pricing across suppliers — whether for a single RFQ cycle or ongoing procurement — here is what the extraction workflow looks like.

Upload quotation PDFs — from any vendor, in any format, all at once

Drop in PDF quotations from any source: ERP-generated PDFs with structured tables, Word-to-PDF exports with pricing written in paragraphs, scanned paper quote forms, email attachments, or phone photos of printed quotes. The tool accepts JPG, PNG, WebP, and PDF — including multi-page quotations where the pricing table spans multiple pages. Use batch processing to upload all vendor responses for an RFQ cycle at once and consolidate results into a single file. For gathering quotes from suppliers who are not in your procurement system, generate a Collection Link: a shareable URL where vendors can upload their quotations directly to your processing queue by entering a short verification code — no registration or login required on the vendor's end.

Define the columns you need — once, for all vendors

Type the field names you want extracted: "Vendor Name," "Quote Number," "Quote Date," "Valid Until," "Item Description," "Quantity," "Unit Price," "Line Total," "Subtotal," "Discount," "Tax," "Grand Total," "Payment Terms," "Delivery Terms." These are your column names — the AI locates each value by understanding its semantic role in the document, whether that value appears in a table cell, a sentence, or a footnote. Use an Inferred Column like "Price Type (options: Fixed/Estimated/Not-to-Exceed)" to have the AI classify each quote based on language it reads in the document. Use a Computed Column like "Discount % (Discount / Subtotal)" to calculate the effective discount rate during extraction — useful when one vendor quotes a dollar discount and another quotes a percentage, and you need a common basis for comparison. Save the column configuration as a template after logging in and reuse it for every RFQ cycle.

Download the unified Excel — every line item is one row, every financial field is unpacked

Each line item from each vendor's quote becomes one row in your output. A quote with 15 line items produces 15 rows — each with the vendor name, quote number, item description, and all requested financial fields. A batch of five vendor quotes averaging 12 line items each produces ~60 rows — all in one spreadsheet, all structured identically. Export as XLSX, CSV, or JSON. Because the Subtotal, Discount, Tax, and Grand Total fields are extracted separately — not just the final price — you can compare pricing components across vendors directly in Excel, apply your own normalization logic, and build the comparison analysis from verified source data. The output is structured for import into SAP, Oracle, NetSuite, Coupa, or your procurement comparison template.

When It Works Best — and When to Review Results

When it works best

ERP-generated vendor quotations with structured pricing tables. Quotes exported from SAP, Oracle, Microsoft Dynamics, or vendor-specific ERP systems extract with high accuracy. Machine-formatted line items, header fields, and summary financials map cleanly to column names. Multi-page quotes with line-item continuation across pages are handled correctly — the AI understands that the table on page 2 is a continuation of the table on page 1.

Mixed-format batch processing across vendors with different quote layouts. When one supplier uses a structured ERP format, another writes pricing in Word paragraphs, and a third sends a scanned form — the same column definition extracts from all three. This is the primary advantage over template-based OCR, which would require three separate templates for this single RFQ cycle.

Discount and tax disambiguation when both are present. The AI distinguishes between a discount applied before tax, a discount applied after tax, tax-exclusive pricing, and tax-inclusive pricing — by reading the surrounding text and numeric context. This semantic disambiguation is what column-name extraction offers that coordinate-based OCR cannot: understanding of financial structure, not just text recognition.

When to review results

Quotes with complex tiered pricing across multiple quantity breakpoints. When a vendor quotes $12/unit for 1-99 units, $10/unit for 100-499, and $8.50/unit for 500+, and presents this in a tier table alongside the line items, verify that each extracted unit price is paired with its correct quantity tier. The AI captures all values but may need a specific column name like "Unit Price (100-499 tier)" for unambiguous mapping in deeply nested tier structures.

Handwritten quotations with low contrast or unusual notation. Scanned paper forms filled out by hand — common in construction, specialty manufacturing, and field-service quoting — are readable, but accuracy depends on scan quality and handwriting clarity. A clean 300 DPI scan of a printed form with block-letter entries produces reliable results. A low-resolution phone photo of cursive handwriting at an angle may require manual verification of financial fields.

Quotes where the same number could be interpreted as two different fields. A quotation that states "Subtotal: $5,000 — Tax (10%): $500 — Grand Total: $5,000" (because the quote is tax-exempt but shows the tax line for reference) may confuse extraction of Tax vs. Grand Total equality. When the document language describes a line item that is listed but not charged, add a column like "Tax Applied (Y/N)" as an Inferred Column for verification — the AI reads the surrounding context to determine whether the tax line was actually included in the grand total.

Frequently Asked Questions

How does the AI distinguish between the unit price, discount, subtotal, tax, and grand total — especially when the layout varies dramatically between vendors?

Column-name extraction solves this by asking the AI to find fields semantically, not geometrically. When you type the column name "Discount," the AI looks for the number in the document that functions as a discount — the amount subtracted before tax — regardless of whether Vendor A places it below the subtotal in a table or Vendor B mentions it in a sentence like "10% discount applied." The same logic applies to Tax, Subtotal, and Grand Total. This semantic approach means the same column definition works across quotes from different vendors, different industries, and different countries without per-vendor configuration. For quotes where the discount is embedded in the unit price (no separate discount line), the AI extracts the unit price as-stated and you can apply a formula in Excel to reverse-calculate the implied discount — the extraction preserves the distinction between "discount shown on the document" and "discount you derive analytically."

Can I extract vendor quotes where pricing is written in narrative paragraphs rather than tables?

Yes — this is one of the primary advantages over template-based OCR. When a vendor writes "the unit price for the server rack is $1,200, with a 5% discount for orders of 3 or more," the AI reads the sentence as natural language, identifies $1,200 as the unit price, and recognizes 5% as a conditional discount. Template-based tools that work only on table-structured PDFs fail on these narrative-format quotes. The same column names produce consistent output whether the pricing is in a table body, a summary sentence, or a mix of both. For IT services quotes, consulting proposals, and custom manufacturing estimates that frequently use paragraph-based pricing, this capability eliminates the step of manually hunting through paragraphs for pricing numbers.

How does the tool handle quotes from multiple vendors with completely different layouts in one batch?

Batch processing accepts all vendor quote files at once — PDFs from ERP systems, scanned paper quotes, email-converted PDFs with pricing in paragraphs — regardless of format and layout. Each file is processed independently using the same column definition. Vendor A's table-based pricing and Vendor B's paragraph-based pricing produce identically structured rows in the same output spreadsheet. For an RFQ cycle with eight suppliers, you upload all eight quote files, define your comparison columns once, and receive one Excel file with every line item from every supplier. The extraction step is fully format-agnostic. For the full workflow of building a vendor comparison from extracted data, see the step-by-step approach to batch-extracting vendor quotes into one comparison table.

What happens when a vendor quote includes conditional discounts — like early-payment terms, volume breaks, or bundled pricing?

The AI extracts each discount-related value as it appears on the document. If the quote lists "Standard Price: $1,200/unit" and "Volume Discount (100+ units): 15% — $1,020/unit," both values are captured. Use multiple column definitions to capture each tier: "Unit Price (Standard)," "Unit Price (100+ tier)," "Volume Discount %." The AI reads the relationship between the tiers and extracts the values — but the decision of which tier to use in comparison is yours. This separation keeps the extraction step mechanical and the analytical step deliberate, rather than embedding assumptions about which tier applies. For quotes with complex discount structures, a Computed Column like "Effective Discount % (1 - Line Total / (Qty × Standard Unit Price))" can calculate the implied discount rate during extraction, giving you a normalized value for comparison across vendors with different discount-presentation conventions.

Does the extraction preserve the original terms language — like payment terms, delivery terms, and validity periods — or does it standardize them?

Fields like Payment Terms and Delivery Terms are extracted as written on the document — "Net 30," "50% upfront, 50% on delivery," "FOB Destination" — preserving the vendor's exact language. This is intentional: standardization of terms is a comparison task, not an extraction task, and should be done by you in Excel where the transformation is visible and auditable. For numerical fields like Quote Date and Valid Until, the AI standardizes the format to YYYY-MM-DD regardless of the source format (MM/DD/YYYY, DD-MM-YYYY, "October 24, 2024"), so these columns are directly comparable without manual date format normalization. The principle is: extract exactly what the document says, standardize only where mechanical conversion has no interpretive ambiguity.