Greetings!
I recently built pdfocr, a tool designed to perform highly accurate OCR on PDFs. My primary use case has been parsing lecture slides, which notoriously have complex structures packed with pictures and diagrams.
You can use it as a standalone external tool, or call it via an LLM in an agentic environment (which is exactly how I use it for my own exam preparation).
Under the Hood The tool is written entirely in Nim and is built on top of a few of my own custom libraries:
For the actual OCR model, I’m using olmOCR 2. Based on my testing, it reads complex lecture slides much better than alternatives like Chandra OCR, and it is significantly cheaper to run.
Performance & Availability I’ve done extensive comparisons with other approaches, and pdfocr consistently performs faster while using a smaller memory footprint.
I currently offer pre-compiled binary builds for three platforms. You can check it out here: https://github.com/planetis-m/pdfocr
Looking Forward I believe this has the potential to be integrated into a larger app or turned into a viable commercial product. However, I am a developer first and don't currently have the bandwidth or business expertise to take it to market. If you are interested in partnering up to turn this into a product, please reach out—I'd love to chat!