Ollama (Local)
Ollama enables running AI models locally on your machine for private, offline security testing.
📸 SCREENSHOT: ollama-model-select.png
Ollama local model selection
Overview
Ollama advantages:
- Complete data privacy
- No API costs
- Offline operation
- No rate limits
- Full control over models
Installation
macOS
brew install ollamaLinux
curl -fsSL https://ollama.com/install.sh | shWindows
Download from ollama.com/download
Starting Ollama
Start Server
ollama serveThe server runs on http://localhost:11434 by default.
Verify Installation
ollama --versionAvailable Models
Recommended Models
| Model | Size | Best For |
|---|---|---|
| llama3.3:70b | 40GB | Best quality, requires high-end GPU |
| llama3.3 | 4.7GB | Good balance of quality and speed |
| codellama:34b | 19GB | Code analysis |
| mistral | 4.1GB | Fast general purpose |
| mixtral:8x7b | 26GB | High quality, efficient |
| deepseek-coder:33b | 19GB | Code security review |
| qwen2.5:72b | 41GB | Excellent reasoning |
Pull Models
# Recommended for security testingollama pull llama3.3
# For code reviewollama pull codellama:34b
# Lightweight optionollama pull mistralList Installed Models
ollama listConfiguration
Basic Setup
{ "provider": { "ollama": { "options": { "baseURL": "http://localhost:11434" } } }, "model": "ollama/llama3.3"}Custom Host
For remote Ollama server:
{ "provider": { "ollama": { "options": { "baseURL": "http://192.168.1.100:11434" } } }}Usage
Command Line
cyberstrike --model ollama/llama3.3In-Session
/model# Select Ollama modelModel Configuration
Context Length
Increase context window:
ollama run llama3.3 --num-ctx 8192Or in Modelfile:
FROM llama3.3PARAMETER num_ctx 8192GPU Layers
Control GPU usage:
OLLAMA_NUM_GPU=999 ollama serve # Use all GPU layersOLLAMA_NUM_GPU=0 ollama serve # CPU onlyMemory Management
# Limit VRAM usageOLLAMA_MAX_VRAM=8G ollama serveCustom Models
Create Modelfile
FROM llama3.3
SYSTEM """You are a security testing assistant specialized in:- Web application vulnerabilities- Code security review- Network penetration testing
Always follow OWASP guidelines and report findings with:- Vulnerability name- Severity (Critical/High/Medium/Low)- Evidence- Remediation steps"""
PARAMETER temperature 0.7PARAMETER num_ctx 8192Build Custom Model
ollama create security-assistant -f ModelfileUse Custom Model
cyberstrike --model ollama/security-assistantPerformance Optimization
Hardware Requirements
| Model Size | Min RAM | Recommended GPU |
|---|---|---|
| 7B | 8GB | 8GB VRAM |
| 13B | 16GB | 12GB VRAM |
| 34B | 32GB | 24GB VRAM |
| 70B | 64GB | 48GB VRAM |
Quantization
Use quantized models for lower memory:
ollama pull llama3.3:q4_0 # 4-bit quantizationollama pull llama3.3:q8_0 # 8-bit quantizationParallel Requests
Enable concurrent requests:
OLLAMA_NUM_PARALLEL=4 ollama serveOffline Usage
Pre-download Models
ollama pull llama3.3ollama pull codellamaAir-Gapped Systems
- Download models on connected system
- Copy
~/.ollama/models/to air-gapped system - Run Ollama without internet
Docker Deployment
Run in Container
docker run -d \ --gpus all \ -v ollama:/root/.ollama \ -p 11434:11434 \ --name ollama \ ollama/ollamaPull Models in Container
docker exec -it ollama ollama pull llama3.3API Compatibility
Ollama supports OpenAI-compatible API:
{ "provider": { "openai": { "options": { "baseURL": "http://localhost:11434/v1", "apiKey": "ollama" } } }, "model": "openai/llama3.3"}Troubleshooting
Connection Refused
Error: Connection refusedEnsure Ollama is running:
ollama serveOut of Memory
Error: CUDA out of memorySolutions:
- Use smaller model
- Use quantized version
- Reduce context length
- Set
OLLAMA_NUM_GPU=0for CPU
Slow Performance
- Enable GPU acceleration
- Use quantized models
- Increase
num_parallel - Check thermal throttling
Tip
For best results, use llama3.3:70b with a high-end GPU. For resource-limited systems, mistral provides good quality with lower requirements.
Related Documentation
- Providers Overview - All providers
- Custom Providers - OpenAI-compatible setup
- Configuration - Full options