All projects

Machine Learning

Historical Documents AI

Multilingual OCR and RAG over historical document archives — 3rd place at a national AI hackathon.

Overview

A document-intelligence platform that processes degraded 1960s-1990s scans in Azerbaijani, Russian, and English. It pairs a Llama-4-Maverick vision model for OCR (87.75% character accuracy) with a BAAI bge-large embedding pipeline into Pinecone for semantic search, and a Llama-4-Maverick LLM that answers questions with citations. Packaged as a FastAPI service with Docker and ngrok exposure, plus a benchmarking framework that drove every model choice. Placed 3rd at the hackathon's AI track.

Key highlights

Tech stack

Topics