Ollama
Ollama is a desktop application that lets you download and run models locally.
Running tools locally may require additional GPU resources depending on the model you are using.
Use the ollama
provider to access Ollama models.
Start the Ollama application or
Terminal window ollama serveUpdate your script to use the
ollama:phi3.5
model (or any other model or from Hugging Face).script({...,model: "ollama:phi3.5",})GenAIScript will automatically pull the model, which may take some time depending on the model size. The model is cached locally by Ollama.
If Ollama runs on a server or a different computer or on a different port, you have to configure the
OLLAMA_HOST
environment variable to connect to a remote Ollama server..env OLLAMA_HOST=https://<IP or domain>:<port>/ # server urlOLLAMA_HOST=0.0.0.0:12345 # different port
You can specify the model size by adding the size to the model name, like ollama:llama3.2:3b
.
script({ ..., model: "ollama:llama3.2:3b",})
Ollama with Hugging Face models
Section titled “Ollama with Hugging Face models”You can also use GGUF models from Hugging Face.
script({ ..., model: "ollama:hf.co/bartowski/Llama-3.2-1B-Instruct-GGUF",})
Ollama with Docker
Section titled “Ollama with Docker”You can conviniately run Ollama in a Docker container.
- if you are using a devcontainer
or a GitHub Codespace,
make sure to add the
docker-in-docker
option to yourdevcontainer.json
file.
{ "features": { "docker-in-docker": "latest" }}
- start the Ollama container
docker run -d -v ollama:/root/.ollama -p 11434:11434 --name ollama ollama/ollama
- stop and remove the Ollama containers
docker stop ollama && docker rm ollama
Aliases
The following model aliases are attempted by default in GenAIScript.
Alias | Model identifier |
---|---|
embeddings | nomic-embed-text |
Limitations
- Uses OpenAI compatibility layer
- logit_bias ignored
- Ignore prediction of output tokens