Skip to main content

Cyberstrike is now open source! AI-powered penetration testing for security professionals. Star on GitHub

Ollama (Local)

Ollama enables running AI models locally on your machine for private, offline security testing.

📸 SCREENSHOT: ollama-model-select.png

Ollama local model selection

Overview

Ollama advantages:

  • Complete data privacy
  • No API costs
  • Offline operation
  • No rate limits
  • Full control over models

Installation

macOS

Terminal window
brew install ollama

Linux

Terminal window
curl -fsSL https://ollama.com/install.sh | sh

Windows

Download from ollama.com/download

Starting Ollama

Start Server

Terminal window
ollama serve

The server runs on http://localhost:11434 by default.

Verify Installation

Terminal window
ollama --version

Available Models

ModelSizeBest For
llama3.3:70b40GBBest quality, requires high-end GPU
llama3.34.7GBGood balance of quality and speed
codellama:34b19GBCode analysis
mistral4.1GBFast general purpose
mixtral:8x7b26GBHigh quality, efficient
deepseek-coder:33b19GBCode security review
qwen2.5:72b41GBExcellent reasoning

Pull Models

Terminal window
# Recommended for security testing
ollama pull llama3.3
# For code review
ollama pull codellama:34b
# Lightweight option
ollama pull mistral

List Installed Models

Terminal window
ollama list

Configuration

Basic Setup

~/.cyberstrike/config.json
{
"provider": {
"ollama": {
"options": {
"baseURL": "http://localhost:11434"
}
}
},
"model": "ollama/llama3.3"
}

Custom Host

For remote Ollama server:

{
"provider": {
"ollama": {
"options": {
"baseURL": "http://192.168.1.100:11434"
}
}
}
}

Usage

Command Line

Terminal window
cyberstrike --model ollama/llama3.3

In-Session

/model
# Select Ollama model

Model Configuration

Context Length

Increase context window:

Terminal window
ollama run llama3.3 --num-ctx 8192

Or in Modelfile:

FROM llama3.3
PARAMETER num_ctx 8192

GPU Layers

Control GPU usage:

Terminal window
OLLAMA_NUM_GPU=999 ollama serve # Use all GPU layers
OLLAMA_NUM_GPU=0 ollama serve # CPU only

Memory Management

Terminal window
# Limit VRAM usage
OLLAMA_MAX_VRAM=8G ollama serve

Custom Models

Create Modelfile

Modelfile
FROM llama3.3
SYSTEM """
You are a security testing assistant specialized in:
- Web application vulnerabilities
- Code security review
- Network penetration testing
Always follow OWASP guidelines and report findings with:
- Vulnerability name
- Severity (Critical/High/Medium/Low)
- Evidence
- Remediation steps
"""
PARAMETER temperature 0.7
PARAMETER num_ctx 8192

Build Custom Model

Terminal window
ollama create security-assistant -f Modelfile

Use Custom Model

Terminal window
cyberstrike --model ollama/security-assistant

Performance Optimization

Hardware Requirements

Model SizeMin RAMRecommended GPU
7B8GB8GB VRAM
13B16GB12GB VRAM
34B32GB24GB VRAM
70B64GB48GB VRAM

Quantization

Use quantized models for lower memory:

Terminal window
ollama pull llama3.3:q4_0 # 4-bit quantization
ollama pull llama3.3:q8_0 # 8-bit quantization

Parallel Requests

Enable concurrent requests:

Terminal window
OLLAMA_NUM_PARALLEL=4 ollama serve

Offline Usage

Pre-download Models

Terminal window
ollama pull llama3.3
ollama pull codellama

Air-Gapped Systems

  1. Download models on connected system
  2. Copy ~/.ollama/models/ to air-gapped system
  3. Run Ollama without internet

Docker Deployment

Run in Container

Terminal window
docker run -d \
--gpus all \
-v ollama:/root/.ollama \
-p 11434:11434 \
--name ollama \
ollama/ollama

Pull Models in Container

Terminal window
docker exec -it ollama ollama pull llama3.3

API Compatibility

Ollama supports OpenAI-compatible API:

{
"provider": {
"openai": {
"options": {
"baseURL": "http://localhost:11434/v1",
"apiKey": "ollama"
}
}
},
"model": "openai/llama3.3"
}

Troubleshooting

Connection Refused

Error: Connection refused

Ensure Ollama is running:

Terminal window
ollama serve

Out of Memory

Error: CUDA out of memory

Solutions:

  • Use smaller model
  • Use quantized version
  • Reduce context length
  • Set OLLAMA_NUM_GPU=0 for CPU

Slow Performance

  • Enable GPU acceleration
  • Use quantized models
  • Increase num_parallel
  • Check thermal throttling

Tip

For best results, use llama3.3:70b with a high-end GPU. For resource-limited systems, mistral provides good quality with lower requirements.