What every PDF to Excel guide leaves out
PDF to Excel automation, traced past the flat-sheet dump.
The phrase “PDF to Excel automation” is two different problems wearing the same name. One of them is well solved by free tools and a Microsoft import button. The other one is what enterprise ops teams are actually trying to buy when they search for it, and the guides that currently rank for the term do not address it. This page covers both, and shows the file in the open-source repo where the interesting case lives.
Direct answer (verified 2026-05-05)
There are three honest paths for PDF to Excel automation. Pick the one that matches your destination, not your source:
- Native Excel. If the PDF is a flat table and the Excel target is a blank sheet, use
Data > Get Data > From File > From PDF. The import sits inside the Power Query stack Microsoft documents in the Office help. - Document AI tool. If you have a stream of similar PDFs and the Excel target is a row-per-document log, use Docparser, Klippa, or the Power Automate “Extract data from PDF” action.
- Destination-shaped pipeline. If the Excel target is a corporate template with named cells, formulas, validations, or sheet-specific destinations, use Mediar. The template gets recorded once, the recording generates a Zod schema, the vision pass on the PDF is constrained to that schema, and each value is typed into the right cell via Windows UI Automation.
The shape of the problem
Three Excel destinations, three different problems.
Look at where the data is supposed to land. That decides which path you are on. The PDF side is mostly the same problem in all three cases. The Excel side is what splits them.
- 1
Excel as a sheet
PDF table in, flat sheet out
- 2
Excel as a row
PDF in, one row appended per document
- 3
Excel as a template
PDF in, specific cells of a template typed
Path 1: Excel as a sheet
Your PDF is a single-page table of values. Your Excel is a fresh sheet. You want the table out of the PDF and into the workbook so you can sort it. This is the case Microsoft built into Excel itself in 2020. Open Excel, go to Data, Get Data, From File, From PDF, point it at the PDF, choose the table from the navigator, and load.
The build-in import is power-queryable, refreshable, and free. For one-off conversions and small recurring extracts, it is the right answer. The reason it is not the answer for an enterprise workflow is that it produces a fresh sheet, not a write into your existing structure. If your real destination is a reconciliation workbook with eight tabs, named ranges, and ten years of formulas, the import sits on a new tab and you still have to copy values across.
The other limit is that the import handles a flat table well and a form-shaped PDF poorly. An invoice PDF where total, vendor, and PO number are scattered across labelled boxes is not what the importer is for. For that case you need either a document AI tool or the destination-shaped pipeline below.
Path 2: Excel as a row-per-document log
The PDFs arrive on a schedule. Each one is a vendor invoice or a claims form or a bill of lading. The Excel target is a long sheet where each row represents one document. The columns are invoice_number, vendor, total, due_date, and so on. You want every new PDF to become one new row.
This is the lane the document AI vendors own. Docparser, Klippa, Parseur, Nanonets, Affinda, and Rossum will all parse the PDF into JSON, and most of them have a built-in Excel or Google Sheets destination that appends a row. Microsoft Power Automate ships a first-party action called Extract data from PDF that does the same thing inside Office tenants. They have all converged on the same shape, because the row-per-document case is the bulk of the market.
When this is your destination shape, do not over-think it. The row-append products are mature and the price-per-document is in single-digit cents. Mediar does not have a particularly differentiated answer here, and we will tell you so on a call. The destination-shaped pipeline below is for the case the row-append tools cannot handle.
Path 3: Excel as a corporate template
Your Excel destination is not a sheet that grows by row. It is a template. There are merged cells, named ranges, IF and SUMIF formulas across tabs, data validations on certain inputs, and a structure your finance team built and refuses to redesign. The PDF values do not become a row at the bottom; they become specific cells inside the template. The PO number goes in B5 of the Header tab, the line items go in A12 through E40 of the Detail tab, and the total in B5 of the Summary tab is a formula that references both.
This is the case the row-append products do not address. The destination is structured beyond what a column mapping can describe. A typical workaround is a brittle VBA macro that reads a JSON file and walks the workbook, which works until the template grows a tab or somebody renames a sheet. The destination-shaped pipeline is built for this case, and it leans on a feature of Excel that nobody in the PDF-to-Excel category writes about: every cell is a UI Automation element.
“Excel exposes each visible cell as an Edit control with the Value pattern, inside DataItem and Table parents. Same accessibility tree screen readers walk.”
Microsoft, UI Automation Support for the DataItem Control Type
Why Excel cells are UI Automation elements (and why that matters)
UI Automation is the framework Windows screen readers use to read the screen aloud. It exposes every visible control as a tree of elements, each with a role (Edit, Button, ComboBox, DataItem), an accessible name, an automation id, and one or more control patterns. The Value pattern is the one that says “this control accepts and emits a string,” and an Excel cell that is in edit mode supports it. Microsoft documents this as a property of the DataItem control type and the IValueProvider interface, both shipped before Office 365 had a public name.
The practical consequence is that any UI Automation client (a screen reader, a Mediar runtime, a Rust binary, a Python script using uiautomation) can address an Excel cell by its accessible name and call SetValue on it. Excel reacts the same way it reacts to a human typing: dependents recalc, validations fire, conditional formatting reflows. There is no xlsx parser writing bytes underneath the running Excel process. The state goes through the application itself.
Here is what the relevant slice of the UIA tree looks like when you inspect a workbook with the Inspect tool that ships in the Windows SDK:
That tree is what the Mediar runtime walks. The locator for cell B5 on Sheet1 is name:B5|role:DataItem|parent_name:Sheet1, captured at recording time and replayed at run time. When the template grows a column and the same cell is now C5, the locator falls back to the recorded automation id (which is the cell’s underlying Range identifier, not the visible label) and the typing pass still resolves.
The recorder writes the schema, not you
The reason this pipeline can target a custom Excel template at all is that nobody hand-writes the schema. The recorder watches the operator fill the template once, captures every cell that received a typed value, and emits a Zod input schema with one line per cell. The code that does this lives in one file in the open repo:
// apps/desktop/src-tauri/src/recording_processor.rs
let mut detected_inputs: HashMap<String, String> = HashMap::new();
for (i, step) in workflow.steps.iter().enumerate() {
for substep in &step.substeps {
for input in &substep.inputs {
if !detected_inputs.contains_key(input) {
detected_inputs.insert(
input.clone(),
format!("// Found in step: {}", step.step_name),
);
}
}
}
}
let mut input_fields = String::new();
for input in detected_inputs.keys() {
let field_name = to_camel_case(input);
input_fields.push_str(&format!(
" {}: z.string().optional().describe(\"{}\"),\n",
field_name, input,
));
}For an Excel reconciliation template with cells labelled invoice_number, vendor, po_number, total, due_date, and currency, the generated terminator.ts inputSchema is six lines of z.string().optional().describe(...). That schema is then handed to the vision pass on the PDF as the structured-output target, which means the JSON returned by the model has those exact keys. There is no manual remap from extractor field names to template cell names, because the template cell names became the extractor schema.
You can verify this by cloning the desktop app, opening the file at the path above, and grepping for detected_inputs. The loop that emits the schema lines is around line 1670.
The three paths, side by side
Here is the comparison rolled up. The first column is the row-per-document tooling that owns Path 2; the second column is the destination-shaped pipeline for Path 3. Path 1 is not in the table because the native Excel import does not really compete on these axes; it is for the case where Excel is empty.
| Feature | Document AI to spreadsheet (row-append) | Mediar (destination-shaped) |
|---|---|---|
| Destination shape | Flat sheet, one row per PDF | Named cells in a template with formulas and validations |
| How fields are mapped | Manual mapping: extractor field name to spreadsheet column | The recorded template defines the field names; mapping is implicit |
| When the PDF format changes | Re-train the extractor or rewrite the column mapping | Vision pass returns the same schema; the new layout is irrelevant |
| When the Excel template changes | Update the column mapping by hand | Re-record the template; new cell labels become new schema fields |
| Where values land | Append a row at the bottom of a sheet | Type into the cell the operator typed into during the recording |
| Formulas and validations | Recalculated only if cells are written by Excel itself | Recalculated because values are typed via Windows UI Automation |
| Audit artifact | The CSV row plus the source PDF | The generated terminator.ts workflow file plus per-step UIA trace |
When the Excel destination is a flat sheet that grows by row, the row-append tools are the right answer and the destination-shaped pipeline is overkill. This table is about templates.
One concrete example
A vendor invoice into a 14-cell AP reconciliation template
The customer ships PDF invoices into a Microsoft 365 OneDrive folder. The destination is a workbook called AP-Recon-MASTER.xlsx with a Header tab (vendor, invoice number, PO, dates, currency, six inputs) and a Detail tab (line items, A12 through E40, eight column inputs per row). The Summary tab is a formula sheet with a SUMIF across the Detail range.
Recording produces a 14-field Zod schema. The vision pass returns JSON with those 14 keys. The runtime opens the template, types values into each cell via UI Automation, and saves. The Summary tab recalculates because Excel itself processed every cell write. The audit log is the workflow run id plus the per-cell trace and the saved xlsx file hash.
Per-document runtime is around 70 seconds, so under a dollar each. The previous workflow at this customer was a copy-paste pass by an AP clerk that took 8 to 12 minutes per invoice. We do not publish the customer name on this page; we will share it on a call.
What this still does not solve
A page that only describes the easy cases is a brochure. Three honest limits:
- On PDFs that current vision-capable models cannot read (handwritten cursive, multi-language carbon copies, severe ink bleed) the extraction step still needs human review. The schema constraint keeps the model honest about which cells it could not fill, but it does not improve recognition quality.
- On Excel installs where the workbook is open in a SharePoint web view, an iframe inside Teams, or Excel for Mac, the UI Automation tree is different or absent. The native Windows Excel client is the reliable target, and the runtime is honest about that.
- If your Excel destination is a fundamentally flat sheet that grows by row, the destination-shaped recording is overkill. Use Docparser or Power Automate’s Extract data from PDF action and pay cents per document.
Within those limits, the destination-shaped path is what most finance, ops, and claims teams are actually trying to buy when they search for PDF to Excel automation. They do not want a smarter table extractor. They have an extractor. They want the values in the right cells of the template they already built.
Bring the PDF and the template. We will show the round trip.
On a call, share one PDF you actually receive and one Excel template you actually post into. We will record the template, generate the schema, run the extraction, and show the typed result on a real workflow run.
Frequently asked questions
What is the simplest PDF to Excel automation that actually works?
If your PDF is a single page of clean tabular data and your Excel is a blank sheet, open Excel, go to Data > Get Data > From File > From PDF, point it at the PDF, and pick the table. Microsoft has shipped this for several years and it works. It is the right answer when the document is a one-off table and you do not need the values in any specific cell. It is the wrong answer the moment Excel becomes a template with formulas, named ranges, or sheet-specific destinations, because the import lands rows in a new sheet and ignores everything you built around the data.
When should I reach for a document AI tool like Docparser, Klippa, or Power Automate's Extract from PDF action?
When you have a stream of similar PDFs (vendor invoices, claims forms, bills of lading) and you want each document to become one row in a spreadsheet that grows over time. Those tools are good at PDF to JSON to row-append. They stop being a fit when the destination has structure beyond a flat table: a per-vendor template that calls SUMIF on a named range, a multi-sheet workbook where the header goes on Sheet1 and the line items go on Sheet2, or an existing reconciliation file where specific cells must be written without nuking neighbors. None of the document AI products own the typing pass into a structured destination, because they all assume the destination is a database row.
Why is Excel different from SAP or Epic when typing into specific cells?
It is not. Excel cells are UI Automation elements with the Edit control type and the Value pattern, the same pattern SAP F-02 fields and Epic Welcome fields support. They live inside DataItem and Table parents, addressable by accessible name, and they accept SetValue calls from any UI Automation client. That is why Mediar treats Excel as a destination form rather than a file: the typing pass is the same code path that types into SAP. Microsoft documents this directly in the UI Automation Support for the DataItem Control Type and the IValueProvider interface; both pre-date the modern document-AI wave by about fifteen years.
How does the destination-shaped pipeline cope when a vendor changes the PDF layout?
It does not care, within reason. The schema is fixed by the Excel template's recording, not by the PDF. The vision pass is asked to fill out the same Zod schema regardless of which vendor sent the document. If a new vendor invoice puts the total on the third page instead of the first and renames PO# to Order Reference, the model still returns total and po_number because those are the destination's labels, not the document's. What does require a re-record is the destination side: if the Excel template gets a new cell for tax_jurisdiction, the recording has to be redone so that field shows up in the schema.
Can I keep the formulas and validations in my Excel template, or do they get clobbered?
They are preserved. Mediar types values into the cells the way a human typist would, via the Value pattern of each Edit control. Excel's calc engine fires after each write, validations evaluate, dependent cells recalc, conditional formatting reflows. There is no xlsx parser writing the file behind Excel's back. The cost is a slightly slower typing pass than a library write, but the gain is that every formula, validation, and named range that lives in your template stays live, because Excel itself processed the input.
Do I have to write Rust to use the typing pass?
No. The execution layer ships as the open-source Terminator SDK at github.com/mediar-ai/terminator, with TypeScript and Python clients on top of the Rust core. A team that wants to wire its own queue can call the type_into_element MCP tool directly from any language that speaks JSON-RPC. The recording UI that generates the terminator.ts schema lives in the desktop app and is the part most teams want, because hand-writing locators for fifty cells of a template is the work the recorder is meant to avoid.
What does a real run actually cost?
Pricing is $0.75 per minute of runtime, where runtime means the typing pass into the destination plus the inline reads. A four-cell journal-template insert lands at 25 to 60 seconds, so under a dollar per document. A 30-cell claims worksheet lands at 90 to 180 seconds, so a few dollars per document. The vision pass on the PDF itself is included in the same envelope for short documents and itemized for long ones. The $10K turn-key program fee converts to credits with a bonus, so it is effectively prepaid usage.
What about scanned PDFs and bad photocopies?
Same answer as for any document AI pipeline. A born-digital PDF goes straight to a vision-capable model that reads the rendered pages and the embedded text. A scanned or faxed PDF runs through an OCR pre-pass first (Tesseract for cheap, AWS Textract or Azure Document Intelligence when handwriting is involved), and the OCR output plus the page image is fed into the vision pass. Confidence scores from OCR are passed through, so a low-confidence cell can be sent to human review instead of typed into the template.
Where does this approach actually fall down?
Three places. First, on PDFs that current vision models cannot read reliably (handwritten cursive, multi-language carbon copies, severe ink bleed) the extraction step needs human review. The schema constraint helps the model say 'I could not read tax_jurisdiction' instead of inventing one, but it does not improve OCR quality. Second, on Excel installs where the workbook is open in a SharePoint web view or Excel for Mac, the UI Automation tree is different or absent, and the typing pass falls back to slower image-based locators. The native Windows Excel client is the reliable target. Third, if your Excel destination is a fundamentally flat sheet that grows by row, the destination-shaped recording gives you no marginal value over a regular document AI tool, and Docparser or Power Automate is the right answer.
Is any of this open source?
The execution layer is. The Terminator SDK that performs the UI Automation calls and locator resolution lives at github.com/mediar-ai/terminator under MIT, including the type_into_element, click_element, set_value, and get_text MCP tools. The recording processor that synthesizes the destination schema, plus the orchestration layer that runs the workflow queue, are commercial. A team that wants to script PDF data entry into an Excel template without paying for the cloud product can build directly on Terminator.
Same architecture, different destinations and starting points.
Related walkthroughs
AI data entry from PDF: schema-shaped extraction and what happens after the JSON
The general version of the destination-shaped pipeline. Same architecture, applied across SAP, Epic, Jack Henry, and Oracle EBS rather than Excel.
SAP data entry automation: one journal entry, traced field by field
The same recording-to-schema-to-typing pipeline applied to a four-field SAP F-02 journal entry. File and line numbers a reviewer can open.
RPA agent UI input layer: accessibility tree vs pixels
Why the accessibility-API path resolves Excel cells, SAP fields, and Epic charts the same way, and why the pixel-matching path does not.