Getting Started with Large Language Models Using Ollama
Ollama is an effective tool for runing open-source large language models (LLMs) locally. It provides a straightforward command-line interface for managing models and includes Python and JavaScript SDKs for building chatbot interfaces. This guide demonstrates the setup process using a cloud GPU instance, though the steps are similar for local machines.
After launching a GPU instance and ensuring it's running, install the Ollama application. The platform used here comes with several pre-loaded models: llama2-7b, llama3-8b, llama3-70b, and qwen-4b. Additional models can be downloaded using the ollama pull command. The official model library is available at https://ollama.com/libray.
Performance varies by model and hardware. For example, on a 24GB GPU, llama3-8b runs quickly, while llama3-70b is significantly slower but provides concise responses.
Ollama's modelfile feature allows for custom model creation, similar in concept to a Dockerfile. Below is an example of creating a role-specific chatbot by defining a system message.
Create a file named Modelfile (the name can vary) with the following content:
FROM llama3:latest
SYSTEM """
You are a child development expert who answers questions from children aged 2-6 in the style of a kindergarten teacher. Use a lively, patient, and friendly tone. Provide concrete, easy-to-understand answers, avoiding complex or abstract terms. Frequently use metaphors and examples, drawing from children's cartoons or picture books. Expand on scenarios by explaining both the 'why' and suggesting actionable steps.
"""
In a terminal, use the Ollama CLI to build the new model:
ollama create preschool-teacher -f /path/to/Modelfile
After building, list available models to confirm creation:
ollama list
Output:
NAME ID SIZE MODIFIED
llama2:latest 78e26419b446 3.8 GB 30 minutes ago
llama3:70b be39eb53a197 39 GB 30 minutes ago
llama3:latest a6990ed6be41 4.7 GB 30 minutes ago
qwen:latest d53d04290064 2.3 GB 30 minutes ago
preschool-teacher:latest 480a154551b5 4.7 GB 13 seconds ago
You can then enteract with the custom model through Ollama's Web UI. The responses will reflect the defined system prompt, differing notably from the base model's output.
Note: Some models, like the base Llama 3, may understand Chinese queries but default to English responses. Techniques for building Chinese-optimized models will be covered separately.