OCR web project with Claude Code

Link to my OCR web project.

Today, I used Claude Code to write my frontend for my app, an OCR (Optical Character Recognition) using a LLM. LLM might not be the best tool for doing OCR, but it gave me a chance to learn about how OpenAI, llama.cpp and various other LLMs communicate with their frontends and tools.

To install and use Claude Code, follow this guide. Very quick and easy. You need Claude Pro, though. With any Claude subscription, you get 200K tokens per context window. There is also usage limit you have to watch out when using Claude. But I got Claude Code to make the frontend and burned through 80K tokens, which was around 35% limit. There is plenty if your project is simple.

I gotta admit. This thing did my frontend in 10 minutes what I would need a few days to do. I looked through the code. It was good code, but I'm not the best developer to judge it. It just worked. It even wrote the README.md, the bash script for getting the OCR model and running llama.cpp, for goodness sake. I guess that's what everyone is up in arms about. The machine that do the work of many days to one (with low price of $20/month).

Essentially, Claude Code is a LLM command line program that works like how llama-cli works, with additional function calling and ability to compress a conversation in the event of low context window. I asked what Claude Code is and how it compresses a conversation, it even came up with an implementation using local LLM and function calling. I'm not sure if that is how Claude Code is made. But it was impressive it even gave me instructions to build something similar in abilities.

Note that Claude only did the frontend portion. The rest was written by me. That part doesn't work, though, because llama-cpp-python was kinda useless in models outside of its recommendations. I'm thinking of putting the part I did as "legacy", but who knows? I might come up with something else in mind.

But looking at it making my program, it made me sad. While it's impressive how far we've come in terms of AI doing our works, I missed the ability to just do it myself. Sure, it takes longer. But using LLM as a souped-up Google/StackOverflow to help me code is much better for me to learn the code. It makes me feel like I accomplish something, instead of directing someone to do it.

”The joy is in the journey, not the destination."