How to Extract Supplier Invoice Data
for Ecommerce SKU Margins
On r/AmazonSeller, an FBA seller described their COGS workflow: "I use vlookups and whatnot to keep my Excel file updated through the month, then at the end it takes me 3-4 hours to compute COGS." Among ecommerce sellers running 50 to 200 SKUs, that's not an outlier — it's the norm. The bottleneck isn't the math. It's extracting the numbers from supplier invoices in the first place. Alibaba sends a PDF trade assurance invoice with 12 line items in Chinese and English. A 1688 supplier WhatsApp-screenshots a handwritten proforma. A factory emails a multi-page commercial invoice with shipping broken out by carton. Each format deposits the same data — SKU, unit cost, quantity, freight — into a different layout, and someone has to pull it out before the spreadsheet ever opens.
Key Takeaways
- Three to four hours a month is what you can see — the typing, the vlookups, the tab-switching — and it's the cheapest thing manual supplier invoice entry costs your ecommerce business.
- The expensive part stays invisible: when freight, duties, and handling from four separate documents never get allocated per SKU, the gap between your recorded product cost and true landed cost hits 46%, and every pricing decision you've made since carries that error.
- ImageToTable.ai reads all your supplier formats — Alibaba PDF, WhatsApp screenshot, factory proforma — through one column definition and outputs per-SKU landed cost already calculated during extraction, no templates per supplier required.
Why Ecommerce Supplier Invoices Are Different from Regular AP Invoices
Accounts payable teams process invoices to pay them. The fields that matter — invoice number, vendor name, total amount, due date — are universal. But an ecommerce operator looking at a supplier invoice isn't thinking about payment first. They're thinking: what did this product actually cost me?
The data an ecommerce business needs from a supplier invoice goes deeper than payment fields. You need the SKU code — which might be the supplier's internal code, your own SKU, or an ASIN. You need the wholesale cost per unit, which may appear as "FOB price," "EXW unit price," or simply "单价" (Chinese for unit price). You need the MSRP if it's listed, and the MOQ (minimum order quantity) to track whether you bought at a volume tier that actually makes sense. You need lead time — the number of days between order and delivery — because a supplier who quotes 30 days and delivers in 45 is quietly eating your working capital. And you need all of this per SKU, per order, across every supplier you work with.
Accounts payable software is built for the first problem — did we pay the right amount to the right vendor? That question closes at the invoice total. The ecommerce margin question opens at the line item and keeps going.
The Fields Every Ecommerce Product Cost Tracker Needs
Before setting up any extraction workflow, decide what to extract. The column list below is designed for ecommerce cost tracking — not for AP invoice processing. These are the fields that feed into a SKU-level margin sheet:
| Column Name | Where It Appears on Supplier Invoices | Why It Matters for Margin Analysis |
|---|---|---|
| SKU / Product Code | Supplier's item number, ASIN, or your internal SKU — labeled as "Item No," "货号," "Model #," or embedded in product descriptions | The key that connects this invoice line to your inventory and sales data |
| Product Name | May appear in Chinese alone ("不锈钢水瓶500ml"), Chinese+English, or English only | Verifies the SKU maps to the right physical product |
| Wholesale Cost per Unit | "FOB Price," "EXW Unit Price," "单价," or "Unit Price" — may be in USD, RMB, or supplier's local currency | The baseline for all margin calculations; currency must be noted |
| MSRP | Not always listed; when present, labeled "MSRP," "Retail Price," "建议零售价" | Quick sanity check on your pricing — is your selling price competitive against the manufacturer's own suggestion? |
| Quantity Ordered | "Qty," "Order Qty," "数量" | Confirms whether the supplier shipped what you ordered |
| Quantity Received | Not always on the invoice — sometimes only on packing list or delivery receipt | Discrepancy between ordered and received triggers a cost adjustment |
| MOQ | "MOQ," "Minimum Order," "起订量" — often a per-SKU number on the quote, not on every invoice | Tracks whether you're buying at optimal volume; a SKU bought below MOQ typically carries a penalty price |
| Lead Time (Days) | "Lead Time," "交货期," or calculated from Order Date vs Ship Date | Supplier reliability metric — sinking lead times across orders trigger a reorder-point adjustment |
| Supplier Name | Company name on the invoice header; Chinese suppliers often use a different legal entity name than their Alibaba store name | Critical for grouping cost data by supplier and comparing performance across vendors |
| Shipping / Freight Cost | "Freight," "Shipping," "运费" — may be a per-invoice total, a per-carton charge, or embedded in a line item | One component of landed cost; must be allocated to individual SKUs |
| Import Duties & Fees | Rarely on the supplier's invoice — usually on a separate customs broker or freight forwarder invoice | Another landed cost component that must be added to the per-unit cost after extraction |
| Order Date | "Invoice Date," "Order Date," "日期" | Enables trend analysis — are costs rising over time for the same SKU? |
| Payment Terms | "Terms," "Payment," "付款条件" — 30% deposit / 70% before shipment, Net 30, etc. | Cash flow planning — a supplier switching from 30/70 to 50/50 changes your working capital timeline |
You don't need every field on every invoice. The ones that matter are the ones that feed your margin formula. A dropshipper who never touches inventory doesn't need Qty Received. An FBA seller importing containers from Shenzhen needs all of the above plus the customs broker invoice data merged in. Define the columns that match your cost structure, and let the extraction tool fill the rest.
From Extraction to Landed Cost: The Missing Link in Ecommerce Accounting
Here's a scenario that plays out in ecommerce businesses every day: an FBA seller orders 1,000 units of a stainless steel water bottle at $3.50 FOB from a Shenzhen supplier. They record $3,500 as their product cost. Three weeks later, the freight forwarder bills $850 for ocean freight. A customs broker invoice arrives with $262.50 in duties (7.5% on the declared value). The shipment spent two weeks in a 3PL receiving center at $0.30 per unit. The seller's spreadsheet says the product cost $3.50. The product actually cost $5.12 — a 46% gap.
This is landed cost, and per IRS Publication 334, freight-in, import duties, and handling costs must be capitalized into inventory — they're part of COGS, not separate operating expenses. The IRS uniform capitalization rules (Pub 334, Section on Uniform Capitalization Rules) require these indirect costs to be included in the basis of property acquired for resale. You recover them through COGS when the unit sells — if you recorded them at all.
The landed cost formula is straightforward:
Landed Cost per Unit = (Wholesale Cost + Freight + Customs Duties + Insurance + Inspection + Handling + Warehousing) ÷ Units in Shipment
The calculation itself isn't hard. Getting the numbers into one place is. Supplier invoices carry wholesale cost. Freight forwarder invoices carry shipping cost. Customs broker invoices carry duties. 3PL invoices carry warehousing. Four documents, four formats, three different companies — and you need them all in the same spreadsheet row before the formula fires. Most ecommerce sellers handle this by manually keying numbers from four separate PDFs into one Excel row, per shipment, per SKU. At 20 shipments a quarter with an average of 8 SKUs each, that's 160 rows of manual data entry. Every quarter.
An ecommerce seller in the r/FulfillmentByAmazon community described the reality: "Once you have more than a few SKUs, true product cost gets confusing." The confusion isn't conceptual — sellers understand landed cost. The confusion is procedural: the numbers live in different documents, and getting them into one place requires typing.
When you extract supplier invoice data with AI, you can use computed columns to run the landed cost calculation during extraction — the AI reads the wholesale price from the supplier invoice, adds the freight figure you input as a parameter, applies a duty rate, and outputs the per-unit landed cost directly. No separate spreadsheet step. The output column is labeled "Landed Cost per Unit" and the number underneath it is ready for your margin formula. For a complete walkthrough of the extraction fundamentals, see our complete guide to invoice data extraction.
How the Extraction Workflow Works
Here's the step-by-step process, designed for someone with a folder of supplier invoices who needs a margin spreadsheet — not an AP automation deployment. The aim is to go from a pile of documents in mixed formats to a single Excel file, in one session.
Files are processed securely and not stored.
Drag in every format you have — Alibaba PDF trade assurance invoices, 1688 screenshots from WeChat, factory proforma invoices scanned to PDF, freight forwarder bills, customs broker statements. You can upload a mix of PDFs, JPGs, and PNGs in one batch. The AI reads each document independently — it doesn't need all your suppliers to use the same invoice layout. A seller managing 30 SKUs across four suppliers can upload an entire quarter's worth of invoices in one go. If you're testing the workflow, start with invoices from two different suppliers — one Alibaba PDF, one factory proforma — to see how one column list pulls data from both.
This is what separates semantic AI extraction from template-based OCR. Instead of drawing boxes around fields on each supplier's invoice layout, you type the column names you want — "SKU," "Product Name," "Wholesale Cost per Unit," "Quantity Ordered," "Supplier," "Order Date," "Freight Cost." The AI locates each value by understanding what the field means. A SKU code might appear as "Item No" on an Alibaba invoice, "货号" on a 1688 screenshot, or embedded in a product description line on a factory proforma. Three different positions, three different labels — one column definition captures all of them. This is Custom Column Extraction: you enter column names, and the AI fills rows by finding matching values anywhere in any document — no templates, no per-supplier configuration, no layout-dependent rules.
Define a computed column like "Landed Cost per Unit (Wholesale Cost per Unit + (Freight Cost / Quantity Ordered) + (Wholesale Cost per Unit × 0.075))" — substituting your actual duty rate. The AI extracts the wholesale price, reads the freight allocation, and runs the math during processing. When the output file opens, the landed cost column is already populated — no post-processing formulas needed. You can also use an inferred column if your supplier invoices use different currency codes: set the column to output in a single currency with a rule that applies the correct conversion.
Download as XLSX. Each invoice row has the same columns regardless of whether the source was an Alibaba trade assurance PDF, a factory proforma scan, or a WhatsApp screenshot from a 1688 supplier. No merging of supplier-specific files. No copy-pasting between spreadsheets. One output file, ready for margin analysis. For higher volumes, batch invoice extraction handles hundreds of documents with the same column definitions, processing pages in 5-10 seconds each — roughly 18 times faster than manual entry for a single page.
What Makes Supplier Invoice Formats a Different Kind of Problem
Template-based OCR tools work by matching a document against a known layout. You train the tool: "Invoice Number is always in the top-right corner of this PDF." That works when you process invoices from the same supplier every month. But ecommerce supplier relationships don't work that way:
- Alibaba Trade Assurance invoices follow Alibaba's own formatting template with separate sections for product details, shipping, and payment protection — clean, but unlike any other supplier's layout.
- 1688 supplier screenshots are often phone photos of the 1688 order screen, showing product names in Chinese, unit prices in RMB, and order quantities — legible to a human, but no structured PDF exists.
- Factory proforma invoices vary by factory. Some use a formal template with company letterhead, Incoterms, and bank details. Others are a table in a Word document exported to PDF. Still others are handwritten and photographed.
- Freight forwarder invoices bill per shipment, not per SKU, and need to be merged with supplier invoice data — the freight cost must be allocated across the SKUs in that shipment.
- Dropshipping supplier invoices from AliExpress or Shein-connected suppliers often arrive as screenshots with product names in a mix of English and Chinese — "Chinglish" descriptions that a human can parse but a keyword-based OCR tool misreads.
A template-based tool needs a separate template for each layout. Four suppliers means four templates. Add a new supplier — new template. A supplier updates their invoice format — broken template, silently producing incomplete extraction until someone notices. With 8-10 active suppliers — typical for a growing ecommerce store — the template maintenance burden exceeds the time saved by automation.
Semantic extraction sidesteps this entirely. The AI reads an invoice the way a procurement manager does: it sees a document with products, prices, quantities, and dates. It doesn't rely on coordinates or label matching. It understands that "单价 ¥23.50" on a 1688 screenshot and "Unit Price $3.50" on an Alibaba invoice are the same field, just in different languages and different positions. This approach is the reason a single set of column names works across all suppliers, and unified extraction across formats becomes possible without per-supplier configuration.
The difference that matters: template-based extraction multiplies your setup work with every new supplier. AI-based extraction keeps the setup constant. For an ecommerce store adding 2-3 new suppliers per year, that's the difference between a workflow that scales and one that eventually gets abandoned for manual entry.
From Spreadsheet to Margin Dashboard
Once you have supplier invoice data in a structured spreadsheet, the margin analysis that was previously a quarterly guessing game becomes a live dataset. Here's what that looks like in practice:
Per-SKU gross margin is now a formula pulled from cells, not a number you recalculate manually each month. Selling price minus landed cost per unit, divided by selling price. A SKU that looked profitable at a $3.50 wholesale cost may reveal a 12% margin after landed cost lands at $5.12 — below the 15-20% minimum most FBA sellers target to cover Amazon fees. Without extraction, that calculation happens once, manually, when you remember to do it. With extraction, it's live every time you update the spreadsheet. As one Amazon seller's CFO noted on Reddit: "Once you have more than a few SKUs, true product cost gets confusing." Extraction is what un-confuses it.
Amazon FBA sellers get an additional benefit: separating product costs from fulfillment costs. Amazon charges referral fees (typically 15%), FBA fulfillment fees, storage fees, and advertising costs — none of which appear on a supplier invoice. A SKU-level extraction spreadsheet keeps the supplier-side costs (wholesale, freight, duties) in one set of columns, and you can add Amazon-side costs in another set. Now you see margin at two levels: product margin (selling price minus landed cost) and net margin after Amazon fees. A product generating 40% product margin but 8% net margin after FBA fees tells a different story than one at 25% product margin and 18% net margin. Without the extraction step feeding accurate landed cost data, both products look like they cost "$3.50" and the real margin story never emerges.
Supplier performance tracking becomes a byproduct of the same dataset. Lead time trends — is Supplier A's average delivery window creeping from 30 days to 38 days? Cost trends — is Supplier B's per-unit price rising faster than the category average? MOQ compliance — are you accidentally ordering below minimum on a SKU and paying a premium price you didn't notice? These questions are unanswerable from individual invoices. They're easy to answer from a structured spreadsheet updated every order cycle.
And when tariff rates change — as they have repeatedly in 2025-2026, with US Harmonized Tariff Schedule duty rates fluctuating across product categories — having your landed cost data already structured means you can re-run the duty component and see which SKUs are most exposed, immediately. Without that structure, you're pulling every supplier invoice back out of a folder and recalculating by hand.
FAQ
Does this work with Chinese-language supplier invoices?
Yes. The AI reads Chinese, English, and mixed-language documents. A 1688 supplier screenshot with product names in Chinese ("不锈钢保温杯"), prices in RMB, and quantities in Arabic numerals is read the same way as an English-language Alibaba trade assurance invoice. The column names you define in English — "SKU," "Product Name," "Wholesale Cost" — become the headers, and the AI extracts corresponding values regardless of the source language. If a Chinese invoice lists a price as "¥23.50," the AI understands this is the value for your "Wholesale Cost" column and extracts it.
Can I extract data from screenshots, not just PDFs?
Yes. The tool accepts JPG, PNG, and PDF inputs. Phone screenshots from WhatsApp or WeChat — common when dealing with 1688 suppliers who send order confirmations as images — are processed the same way as scanned PDFs. The AI reads the visual content of the image, not the embedded text layer, so a photo of a computer screen showing a supplier's internal system works. Image quality matters: a blurry screenshot at 480p resolution will produce lower confidence extractions than a clear 1080p screenshot. For heavily compressed images, low-resolution extraction strategies can help.
How do I allocate freight costs across SKUs when the invoice only shows a total?
Most freight forwarder invoices show a single shipment total, not a per-SKU breakdown. You have three options: (1) Use a computed column to divide the freight total by total quantity ordered across all SKUs in the shipment, giving an equal per-unit allocation. (2) Allocate by weight — if your freight forwarder invoice specifies total weight, create a computed column that calculates each SKU's share of freight based on its unit weight times quantity. (3) Allocate by value — higher-value SKUs absorb a proportionally larger share of freight. Which method to use depends on your freight structure: weight-based allocation makes sense for ocean freight (priced per container/weight), value-based for air freight (priced per kg with a value surcharge). The extraction tool handles the math once you define the rule. You don't need the freight forwarder to itemize by SKU.
What about customs duties — those come on a separate invoice months later?
This is a real problem. Freight forwarders and customs brokers often send duty invoices 3-6 weeks after the shipment clears — sometimes after you've already sold the inventory and recorded COGS. The correct approach: extract the supplier invoice data now to get wholesale cost and freight into your spreadsheet. When the customs broker invoice arrives, extract it into a separate tab using columns like "Shipment ID," "Duty Paid," and "Date." Use a VLOOKUP or INDEX-MATCH to pull the duty amount into the landed cost column of the original shipment. This keeps your margin data accurate as late-arriving costs come in, rather than waiting weeks to start the extraction process. If you use accrual accounting, IRS Pub 334 expects estimated landed costs accrued at receipt and reconciled when actual invoices arrive.
Does this integrate with QuickBooks or Xero?
The tool outputs Excel (XLSX), CSV, and JSON. You can download the extracted data and import it into QuickBooks, Xero, or any accounting platform that accepts file imports. There is no direct API integration. For ecommerce sellers who use A2X or similar tools to bridge Amazon settlements to QuickBooks, the supplier-side cost data from extraction serves as the COGS input that those tools don't capture — you import the extracted supplier invoice spreadsheet as a journal entry or inventory cost update. For sellers using Google Sheets for cost tracking, the Google Sheets add-on lets you extract supplier invoice data directly into your tracking sheet without leaving Google Sheets.
How many invoices can I process at once?
You can upload multiple invoices in one batch — the tool processes each page in 5-10 seconds. There's no hard limit on batch size, though processing time scales linearly with the number of pages. A batch of 20 invoices averaging 2 pages each (40 pages total) processes in roughly 3-7 minutes. The real constraint isn't the tool capacity — it's whether you've defined your column names to cover all the fields you need across all suppliers in the batch. For large quarterly batch processing, batch extraction handles higher volumes efficiently.
The Spreadsheet Is the Easy Part
Most ecommerce sellers understand margin math. They know landed cost = wholesale + freight + duties + fees. They know they should track it per SKU. The broken link in the chain isn't knowledge — it's that the data lives in PDFs and screenshots from a dozen different suppliers, and getting it into a spreadsheet requires typing. Every article about landed cost calculation skips this step. They give you the formula and assume the numbers are already in a column.
They're not. They're on an Alibaba trade assurance PDF, a 1688 screenshot, and a freight forwarder invoice in three different formats. Extraction is the step that makes the formula usable. Once you can pull supplier invoice data into a structured spreadsheet without building a template for every supplier, margin analysis stops being a quarterly project and becomes something you update in minutes after every shipment.
No sign-up required. Upload a supplier invoice and define your columns.