How I Built DockParser: AI Invoice Parsing with Gemini — Anass Agdi

The Problem

Manual invoice processing is slow, expensive, and error-prone. Operations teams spend hours on data entry instead of strategic work. I built DockParser to solve this — an AI-powered document parser that extracts structured data from invoices and contracts with confidence scoring.

Why Gemini Over GPT-4

This was one of the first major architectural decisions. Here's my reasoning:

Cost: Gemini is approximately 3x cheaper for vision tasks
Cold starts: Faster response times for document processing
Accuracy: Comparable performance on structured data extraction
API design: Cleaner multimodal interface

I tested both on 100 sample invoices. Gemini hit 94% accuracy on field extraction vs GPT-4's 96% — a 2% difference that didn't justify the 3x cost increase for my use case.

Server-Side Only Architecture

Every AI call happens on the server. Zero client-side API calls. This was non-negotiable for two reasons:

Security: API keys never touch the browser. No key exposure, no abuse.
Control: Rate limiting, usage tracking, and cost controls all happen server-side.

The Stack

Frontend: Next.js 14 with App Router
Backend: Vercel Edge Functions + Supabase Edge Functions
Database: Supabase (PostgreSQL)
AI: Google Gemini 1.5 Pro (vision + text)
Payments: Stripe with webhooks
Storage: Supabase Storage (documents never leave Supabase)

Key Tradeoffs

Supabase RLS vs Custom RBAC

I used Supabase Row Level Security instead of building a custom authorization layer. Faster to ship, battle-tested security, but less flexibility for complex permission hierarchies. For a document parser, this was the right call.

Rate-Limited Demo Mode

The public demo is limited to 5 documents per day per IP. This prevents abuse and keeps my Gemini bill under control while still letting users try the product.

Results

~90% reduction in manual data entry time in test scenarios
Handles 50+ document formats (invoices, contracts, receipts)
Production-ready with Stripe billing integration

What I'd Do Differently

If I rebuilt this today, I'd add a confidence threshold that routes low-confidence extractions to human review. The current system flags them, but doesn't have a review queue. That's the next feature.

Try DockParser → | View Source Code →