Ollama Integration¶
COSMIC integrates with Ollama for local LLM verification without API costs.
Setup¶
1. Install Ollama¶
Download from ollama.com/download:
2. Pull a Model¶
# Recommended: Fast and efficient
ollama pull gemma3
# Alternatives
ollama pull qwen2.5-coder:7b
ollama pull llama3.2
3. Use with COSMIC¶
# Auto-detect best model
cosmic chunk document.txt --strategy full --ollama auto
# Use specific model
cosmic chunk document.txt --strategy full --ollama gemma3:latest
Model Recommendations¶
| Model | Size | Speed | Quality | Best For |
|---|---|---|---|---|
gemma3 |
3.3 GB | Fast | Good | Default choice |
qwen2.5-coder:7b |
4.7 GB | Medium | Good | Technical docs |
llama3.2 |
Various | Medium | Good | General use |
deepseek-coder-v2 |
8.9 GB | Slow | Better | Code-heavy docs |
qwen3:30b |
18 GB | Slow | Best | Quality priority |
Auto-Selection Logic¶
When using --ollama auto, COSMIC selects models in this order:
gemma3/gemma2(smallest, fastest)qwen2.5-codervariantsllama3.2/llama3.1mistral- Larger models as fallback
CLI Commands¶
Check Status¶
Output:
List Models¶
Output:
Available Ollama models:
NAME SIZE
--------------------------------------------------
gemma3:latest 3.3 GB
qwen2.5-coder:7b 4.7 GB
llama3.2:latest 2.0 GB
Start Server¶
Python API¶
Basic Usage¶
from cosmic import COSMICChunker, COSMICConfig, Document
from cosmic.models.ollama import OllamaManager
# Create Ollama manager
ollama = OllamaManager()
if ollama.is_available():
# List models
models = ollama.list_models()
for model in models:
print(f"{model.name}: {model.size_gb:.1f} GB")
# Auto-select best model
model_name = ollama.auto_select_model()
# Configure COSMIC
config = COSMICConfig()
config.llm.enabled = True
config.llm.base_url = ollama.api_base_url
config.llm.model_name = model_name
# Process document
chunker = COSMICChunker(config)
chunks = chunker.chunk_document(doc, strategy="full")
Context Manager¶
from cosmic.models.ollama import OllamaManager
# Automatic server lifecycle management
with OllamaManager() as ollama:
config = COSMICConfig()
config.llm.base_url = ollama.api_base_url
config.llm.model_name = ollama.auto_select_model()
chunker = COSMICChunker(config)
chunks = chunker.chunk_document(doc, strategy="full")
# Server automatically stopped if COSMIC started it
Environment Variables¶
# Ollama server URL
OLLAMA_HOST=http://localhost:11434
# Default model (or "auto")
COSMIC_OLLAMA_MODEL=auto
# Use Ollama as default provider
COSMIC_LLM_PROVIDER=ollama
Server Management¶
Automatic Management¶
When using --ollama, COSMIC:
- Checks if Ollama is installed
- Checks for available models
- Starts server if not running
- Uses the model for verification
- Stops server if COSMIC started it
Manual Management¶
# Start server manually
ollama serve
# Stop server
pkill ollama
# Check if running
curl http://localhost:11434/api/tags
Troubleshooting¶
Ollama Not Found¶
Solution: Install Ollama from the official website.
No Models Available¶
Solution:
Server Won't Start¶
Solutions:
- Check if port 11434 is in use
- Try starting manually:
ollama serve - Check Ollama logs
Model Too Large¶
If a model is too large for your system:
# Use a smaller model
ollama pull gemma3 # 3.3 GB
# Or a quantized version
ollama pull llama3.2:1b # Smaller variant
Slow Performance¶
- Use smaller models (gemma3, phi3)
- Ensure GPU is being used
- Consider using
--no-llmfor faster processing
Without Ollama¶
If you don't want to use Ollama:
# Skip LLM verification entirely
cosmic chunk document.txt --strategy full --no-llm
# Or use semantic-only strategy
cosmic chunk document.txt --strategy semantic
LLM verification is optional - COSMIC works well without it for most documents.