DIY GPT

Nov 5, 2025

I have a dream. That I, one day, shall have my own ChatGPT. Because after being an avid user since 2023, it has been so slow and laggy. Not like many people, I hold no emotional attachment to ChatGPT models, only its uses and effectiveness. Despite its overtly cherry nature, ChatGPT 4 and 4o represented the best OpenAI has to offer in terms of speed and accuracy. Those reasoning models have never been appealing to me. They are just models reiterating their own answers over and over again. These are "souped up" chatbots, no reasoning capability. And sometimes I just need a quick answer. Right or wrong, ChatGPT always gives me a starting point. They make great rubber ducks to iterate ideas with. Even since OpenAI "upgrades" to version 5, ChatGPT has become sluggish. It is not more correct, nor faster than previous models. They even disabled your ability to switch models. Like bro, I don't need a chatbot to spend a minute coming up with an answer. Just give me something to start.

I looked at some ways to run LLMs locally before. But I was turned off by how much VRAM they need to run big models like gpt-oss. And then, something randomly clicked yesterday. Pretty sure I just searched something randomly, as always. And I came across ollama. There are models that are much smaller than the big boy gpt-oss. Like the deepseek-r1:8b, or the gemma3:4b. The 8b stands for billion parameters. The bigger the number, the more accurate the output, but also the larger the size. So these two models fit perfectly in my 6GB GTX 1660 Ti. And the ollama is easy to install and use. I chose the manual installation method: download an archive and run its binary. Its interface is reminiscent of Docker. So just ollama serve & first to create a server. Then ollama pull <model> and ollama run <model> and you are good.

I read some online posts on Reddit, r/LocalLLaMA. There seems to be an all-out war between these three applications:

I won't go into details, because even I am at a loss myself. I use ollama because it's easy to get your feet wet, like Docker before learning Kubernetes. ollama loses out to llama.cpp in terms of performance, functions. There are beefs between the two applications. If I ever decided to get serious about this running local GPT seriously, I will consider llama.cpp. Or vllm like this guy.

Other than that, I am just happy to have my own DIY GPT.


”This content was not written, nor helped written, by ChatGPT, Gemini, DeepSeek, or any of the LLMs. Viewer discretion is advised."