Skip to content

March 17, 2026

Proof of address verification API: build vs buy

At some point, every engineering team building KYC or onboarding flows has the same conversation: "Proof of address verification? It's just OCR and some string matching. We can build that in a sprint." It sounds reasonable. The documents are mostly text. The matching logic seems straightforward. And you already have an OCR API in your stack.

Then you start building it, and the edge cases multiply. What follows is an honest breakdown of what's actually involved, when building makes sense, and when buying is the smarter call.

The hidden complexity

Proof of address verification looks simple on the surface. A document comes in, you read it, you check if the name and address match. But each step hides real complexity that only shows up in production.

Document parsing

Every utility company, bank, and government agency has a different document layout. A German electricity bill looks nothing like a UK council tax letter or a French bank statement. There are thousands of formats in the wild, and they change without notice. Your parsing logic that worked last month can break when a utility provider updates their template.

OCR gets you raw text, but raw text isn't structured data. You still need to figure out which block of text is the customer name, which is the address, and which is the date, without any consistent labels or positioning to rely on.

Name extraction

A document contains many names: the customer's name, the utility company's name, possibly a joint account holder, maybe a property management company. Distinguishing which name belongs to the person you're verifying is not a string search problem, it requires understanding document structure and context.

Then there are variations. "John A. Smith" on the document vs "John Smith" in your database. "Maria Garcia Lopez" vs "Maria Garcia". Married names, transliterated names, names with diacritics. Each of these needs to match correctly.

Address extraction

Most documents contain multiple addresses. A utility bill has the customer's service address, the billing address (sometimes different), and the utility company's headquarters. A bank statement might have the branch address alongside the customer's address. You need to extract the right one, the one that proves where the customer lives.

Even once you've found the correct address, the formatting varies: "Flat 3, 42 Oak Lane" vs "42 Oak Lane, Apt 3" vs "42 Oak Ln Fl 3". These are all the same place, but naive matching treats them as different.

Fuzzy matching

Exact string matching fails constantly. "123 Main St" vs "123 Main Street, Apt 4B". "London SW1A 1AA" vs "SW1A1AA". "Müller" vs "Mueller". You need fuzzy matching that's smart enough to handle abbreviations, missing apartment numbers, postcode formatting, and character normalization, but strict enough to reject genuinely different addresses.

Getting this balance right takes significant iteration. Too loose and you approve bad matches. Too strict and you reject legitimate ones, creating manual review queues that defeat the purpose of automation.

Multi-language support

If you only process English documents, you can maybe get away with simpler tooling. But the real world sends you documents in Cyrillic, Arabic, Chinese, Georgian, Thai, and dozens of other scripts. Now you need transliteration so you can compare "Иванов" against "Ivanov", or an Arabic address against its Latin equivalent.

Cross-script matching is a research-grade problem. Getting it wrong means rejecting legitimate customers who happen to have documents in a non-Latin language, which is a compliance and user experience failure.

Date parsing

Is "03/04/2026" the 3rd of April or March 4th? It depends on the country. Written dates like "15 Mars 2026" or "17. März 2026" need language-aware parsing. And you need to determine not just the date, but whether the document is recent enough to be valid, most compliance rules require documents issued within the last 3 to 6 months.

Ongoing maintenance

Even after you've built and shipped all of the above, the work doesn't stop. New document formats appear constantly. OCR models get updated and change behavior. Edge cases trickle in from production, a scanned document that's slightly rotated, a bank statement with a watermark that confuses the parser, a government letter in a language you haven't seen before.

You need monitoring to catch accuracy regressions, a test suite of real documents to validate against, and someone on the team who understands both the ML pipeline and the compliance requirements.

What building looks like in practice

Here's the typical architecture when teams build this themselves:

  1. OCR API: Google Vision, AWS Textract, or Azure Document Intelligence to extract raw text
  2. Custom parsing logic: regular expressions, heuristics, or a fine-tuned model to identify names, addresses, and dates from the raw output
  3. Matching algorithm: Levenshtein distance, Jaro-Winkler, or something custom to compare extracted values against expected ones
  4. Test suite: a growing collection of real documents to catch regressions
  5. Monitoring and review queue: dashboards for match rates, manual review for borderline cases

This works, eventually. But the time from "let's build it" to "it works reliably across diverse document types" is measured in months, not sprints. And the long tail of edge cases keeps the team busy long after launch.

When building makes sense

Building your own POA verification system is the right call in some situations:

When buying makes sense

For most teams, buying is the pragmatic choice:

The cost comparison usually favors buying too. An API that costs a few hundred euros per month is cheaper than the engineering hours needed to build, test, and maintain a custom system, unless you're at very high volume.

What to look for in a POA verification API

Not all verification APIs are equal. Here's what matters:

How trusqo handles this

trusqo is a proof of address verification API built around exactly these principles. It's a single REST endpoint that handles the full verification flow: document parsing, data extraction, fuzzy matching, and verdict, in one API call.

You send a document with the expected name and address. You get back match scores for name, address, and postcode, a document type classification, date validation, and a downloadable PDF audit report. Thresholds are configurable per request. Documents in any language are supported, with automatic transliteration for cross-script matching.

Plans start at €25/month with 50 checks included. Full API documentation is at trusqo.com/docs. You can sign up and send your first verification at app.trusqo.com, no sales calls, no onboarding meetings.

If you're evaluating whether to build or buy, the honest answer is: try the API first. If it doesn't meet your needs, you'll at least have a clear picture of the problem space before committing to a build.