March 17, 2026

Proof of address verification API: build vs buy

At some point, every engineering team building KYC or onboarding flows has the same conversation: "Proof of address verification? It's just OCR and some string matching. We can build that in a sprint." It sounds reasonable. The documents are mostly text. The matching logic seems straightforward. And you already have an OCR API in your stack.

Then you start building it, and the edge cases multiply. What follows is an honest breakdown of what's actually involved, when building makes sense, and when buying is the smarter call.

The hidden complexity

Proof of address verification looks simple on the surface. A document comes in, you read it, you check if the name and address match. But each step hides real complexity that only shows up in production.

Document parsing

Every utility company, bank, and government agency has a different document layout. A German electricity bill looks nothing like a UK council tax letter or a French bank statement. There are thousands of formats in the wild, and they change without notice. Your parsing logic that worked last month can break when a utility provider updates their template.

OCR gets you raw text, but raw text isn't structured data. You still need to figure out which block of text is the customer name, which is the address, and which is the date, without any consistent labels or positioning to rely on.

Name extraction

A document contains many names: the customer's name, the utility company's name, possibly a joint account holder, maybe a property management company. Distinguishing which name belongs to the person you're verifying is not a string search problem, it requires understanding document structure and context.

Then there are variations. "John A. Smith" on the document vs "John Smith" in your database. "Maria Garcia Lopez" vs "Maria Garcia". Married names, transliterated names, names with diacritics. Each of these needs to match correctly.

Address extraction

Most documents contain multiple addresses. A utility bill has the customer's service address, the billing address (sometimes different), and the utility company's headquarters. A bank statement might have the branch address alongside the customer's address. You need to extract the right one, the one that proves where the customer lives.

Even once you've found the correct address, the formatting varies: "Flat 3, 42 Oak Lane" vs "42 Oak Lane, Apt 3" vs "42 Oak Ln Fl 3". These are all the same place, but naive matching treats them as different.

Fuzzy matching

Exact string matching fails constantly. "123 Main St" vs "123 Main Street, Apt 4B". "London SW1A 1AA" vs "SW1A1AA". "Müller" vs "Mueller". You need fuzzy matching that's smart enough to handle abbreviations, missing apartment numbers, postcode formatting, and character normalization, but strict enough to reject genuinely different addresses.

Getting this balance right takes significant iteration. Too loose and you approve bad matches. Too strict and you reject legitimate ones, creating manual review queues that defeat the purpose of automation.

Multi-language support

If you only process English documents, you can maybe get away with simpler tooling. But the real world sends you documents in Cyrillic, Arabic, Chinese, Georgian, Thai, and dozens of other scripts. Now you need transliteration so you can compare "Иванов" against "Ivanov", or an Arabic address against its Latin equivalent.

Cross-script matching is a research-grade problem. Getting it wrong means rejecting legitimate customers who happen to have documents in a non-Latin language, which is a compliance and user experience failure.

Date parsing

Is "03/04/2026" the 3rd of April or March 4th? It depends on the country. Written dates like "15 Mars 2026" or "17. März 2026" need language-aware parsing. And you need to determine not just the date, but whether the document is recent enough to be valid, most compliance rules require documents issued within the last 3 to 6 months.

Ongoing maintenance

Even after you've built and shipped all of the above, the work doesn't stop. New document formats appear constantly. OCR models get updated and change behavior. Edge cases trickle in from production, a scanned document that's slightly rotated, a bank statement with a watermark that confuses the parser, a government letter in a language you haven't seen before.

You need monitoring to catch accuracy regressions, a test suite of real documents to validate against, and someone on the team who understands both the ML pipeline and the compliance requirements.

What building looks like in practice

Here's the typical architecture when teams build this themselves:

OCR API: Google Vision, AWS Textract, or Azure Document Intelligence to extract raw text
Custom parsing logic: regular expressions, heuristics, or a fine-tuned model to identify names, addresses, and dates from the raw output
Matching algorithm: Levenshtein distance, Jaro-Winkler, or something custom to compare extracted values against expected ones
Test suite: a growing collection of real documents to catch regressions
Monitoring and review queue: dashboards for match rates, manual review for borderline cases

This works, eventually. But the time from "let's build it" to "it works reliably across diverse document types" is measured in months, not sprints. And the long tail of edge cases keeps the team busy long after launch.

When building makes sense

Building your own POA verification system is the right call in some situations:

Very high volume with a narrow scope: if you process millions of documents per month, all from a known set of formats (say, only UK utility bills from the top 10 providers), a custom solution tuned to those specific formats can outperform a general-purpose API
Strong in-house ML team: if you already have engineers experienced with document understanding models and NLP, the learning curve is shorter
Single language, single country: the complexity drops significantly when you only need to handle one script and one set of address formats
Regulatory requirement to keep everything in-house: some organizations can't send documents to third-party APIs, period

When buying makes sense

For most teams, buying is the pragmatic choice:

Diverse document types: you receive utility bills, bank statements, government letters, medical bills, and more, from multiple countries
Multiple languages: your customers send documents in scripts you don't have expertise in
Compliance pressure: you need defensible, consistent decisions with audit trails, not ad-hoc logic that varies with each engineer who touches the code
Small team: you'd rather spend engineering time on your core product than maintaining an ML pipeline
Speed to market: you need POA verification working this week, not in three months

The cost comparison usually favors buying too. An API that costs a few hundred euros per month is cheaper than the engineering hours needed to build, test, and maintain a custom system, unless you're at very high volume.

What to look for in a POA verification API

Not all verification APIs are equal. Here's what matters:

Match scores, not just pass/fail: you need to see that the name matched at 0.92 and the address at 0.87, so you can set your own thresholds and understand borderline cases
Configurable thresholds: different use cases need different strictness. A real estate lease check might tolerate more variation than a regulated financial institution
Multi-language and multi-script support: with automatic transliteration for non-Latin documents
Audit reports: downloadable, compliance-ready reports that show exactly what was extracted, what was compared, and why the decision was made
Simple integration: a single API endpoint with clear documentation, not an SDK maze with a week-long onboarding process
Transparent pricing: per-check pricing you can model in a spreadsheet, not "contact sales" black boxes

How trusqo handles this

trusqo is a proof of address verification API built around exactly these principles. It's a single REST endpoint that handles the full verification flow: document parsing, data extraction, fuzzy matching, and verdict, in one API call.

You send a document with the expected name and address. You get back match scores for name, address, and postcode, a document type classification, date validation, and a downloadable PDF audit report. Thresholds are configurable per request. Documents in any language are supported, with automatic transliteration for cross-script matching.

Plans start at €25/month with 50 checks included. Full API documentation is at trusqo.com/docs. You can sign up and send your first verification at app.trusqo.com, no sales calls, no onboarding meetings.

If you're evaluating whether to build or buy, the honest answer is: try the API first. If it doesn't meet your needs, you'll at least have a clear picture of the problem space before committing to a build.