Local LLM - 101
So, you want to run LLM locally & set yourself with some sort of agentic workflow. Maybe you know what you're doing, maybe you don't. I don't know either. This is a self note to myself, and maybe to you too.
When it comes to running LLM locally, first issue is hardware. LLMs require a lot of computational power. If you are using Apple Silicon, you'll be surprised to see it's performance (at least I was ¯\_(ツ)_/¯
) Otherwise, you'll need a decent GPU to run any decent LLM.
- Check which model you can run on your system from Can you run it? LLM version
Now that you've figured out that you can run LLM, it's time to set up some sort of backend. There are plenty of options available. Notably,
- Text generation web UI / oobabooga
- Axolotl - Mostly used for fine-tuing
- LM Studio
- We'll go with Ollama. You can either directly install it or run it through official docker image
Now, you need to choose a model to run with ollama. Find out which model works good for your usecase. For coding, I found DeepSeek-Coder pretty good. You can find the list of models in Ollama library. Choose one that has appropirate parameter size to your system.
You'll see that models have parameter size with quantization.