How We Cut Client Processing Time by 10x with Agentic Workflows â Dawnovation AI

The Challenge

A mid-sized logistics company came to us with a problem that sounded deceptively simple: their operations team was spending 6–8 hours every day manually processing inbound freight documents — bills of lading, customs declarations, carrier confirmations — and reconciling them against their internal ERP system.

The process involved extracting structured data from unstructured PDFs and emails, cross-referencing against existing shipment records, flagging discrepancies, escalating edge cases to human reviewers, and updating the ERP with confirmed data. It was meticulous, error-prone, and consumed the majority of their operations team’s capacity.

The Original Process

Before our engagement, the process looked like this:

Documents arrived via email and a document portal (average: 200–400 documents/day)
Each document was manually opened and classified by document type
Key fields were extracted by hand and entered into a spreadsheet
An analyst compared entries against the ERP to identify discrepancies
Discrepancies were emailed to the relevant carrier or broker for resolution
Confirmed data was manually entered into the ERP

Error rate: approximately 3–5% of entries contained transcription errors. Average processing time per document: 6–8 minutes. Daily team cost for this single workflow: significant.

Our Approach

We designed a hierarchical multi-agent system with four specialist agent types:

Intake Agent — monitors email and the document portal, classifies inbound documents, and routes them to the appropriate specialist
Extraction Agent — parses each document type using a combination of structured extraction prompts and custom fine-tuned parsing for the client’s most common document formats
Reconciliation Agent — queries the ERP API, compares extracted data against existing records, and computes confidence scores for each field match
Escalation Agent — handles low-confidence or high-discrepancy documents, drafts resolution emails to carriers, and queues cases for human review with a structured summary

A supervisor agent orchestrated the pipeline, maintained run state, and enforced SLAs for each processing stage.

Key design decision: We deliberately kept humans in the loop for escalations rather than trying to automate everything. The 15% of documents that required human judgment got faster, better-prepared cases — the agents did the legwork, humans made the calls.

System Architecture

The technical stack was intentionally pragmatic:

Document ingestion via watched email inbox + API polling, normalised to a common queue
Extraction using GPT-4o with structured output mode, with custom fine-tuning for the three highest-volume document types
ERP integration via the client’s existing REST API (read and write access)
Confidence scoring using a rules-based layer on top of extraction outputs — no additional model calls needed
State persistence in PostgreSQL with a pgvector extension for semantic deduplication of documents
Human review queue surfaced via a lightweight internal dashboard

Total infrastructure: two small cloud VMs, one managed PostgreSQL instance. No exotic tooling. The complexity was in the agent logic and the evaluation harness, not the infrastructure.

Results

After a two-week parallel-run validation period (agents and humans processing the same documents independently), we moved to production. The results over the first 90 days:

Processing time: 6–8 hours per day → 35–45 minutes of human review time for escalations
Error rate: 3–5% → 0.4% (almost entirely in the escalated-case subset)
Document throughput: handled a 40% volume spike during peak season without adding headcount
Operations team capacity: freed up >6 hours/day for higher-value analysis work

The 10x headline refers to end-to-end processing time per document — from arrival to ERP update — dropping from ~7 minutes to ~40 seconds for the 85% of documents that didn’t require escalation.

Lessons Learned

A few things that made this engagement work that we didn’t fully anticipate:

The evaluation harness was half the project. We spent as much time building the tooling to measure extraction quality as we did building the agents themselves. You cannot improve what you cannot measure.
The escalation path was the difference-maker. Attempting to automate the hard 15% would have added months of complexity and still wouldn’t have matched human judgment. Routing cleanly to humans and making those cases easy to review was the right call.
Fine-tuning on the top three document types dropped extraction errors by 60% on those formats specifically, at a fraction of the cost of fine-tuning across all document types.
Operations team buy-in was non-negotiable. The agents that ship and stay shipped are the ones the end users trust. We involved the ops team in defining the escalation criteria from day one.

How We Cut Client Processing Time by 10x with Agentic Workflows