LLM Engine

Secure Ollama with Authentication

Ollama ships with no authentication by default. Add OIDC/JWT verification, rate limits, and usage tracking in under 5 minutes with Attach Gateway.

Get Started View Documentation

The Problem

Running Ollama locally or on a server? By default, anyone with network access can make requests. Teams often resort to:

✗Ad-hoc nginx configs with basic auth
✗Exposing ports without any protection
✗Copy-pasting JWT verification code everywhere
✗No visibility into who's using what

What You Get

OIDC/JWT Authentication

Verify tokens from Auth0, Okta, or any OIDC provider before requests reach Ollama.

Rate Limiting & Quotas

Per-user token limits and request throttling to prevent abuse and manage costs.

Usage Metrics

Prometheus metrics and OpenMeter integration for billing and monitoring.

Quick Start

Install Attach Gateway

pip install attach-dev

Configure your identity provider

export OIDC_ISSUER=https://your-domain.auth0.com
export OIDC_AUD=ollama-local

Start the gateway

attach-gateway --port 8080

Make authenticated requests

curl -H "Authorization: Bearer $JWT" \
  -d '{"model":"llama3","prompt":"hello"}' \
  http://localhost:8080/api/chat

How It Works

🔐

Client sends JWT

✅

Attach verifies token

📊

Logs usage metrics

🦙

Forwards to Ollama

Full Feature List

OIDC/JWT token verification

Per-user rate limiting

Token usage quotas

Prometheus metrics endpoint

OpenMeter billing integration

X-Attach-User header injection

Request/response logging

Zero code changes to Ollama

Common Use Cases

Team Development Server

Run one Ollama instance for your entire team. Each developer authenticates with their corporate SSO, and you get per-user usage tracking for cost allocation.

Air-Gapped Environments

Keep your LLM completely offline while still enforcing authentication. Attach verifies tokens locally without external network calls after initial JWKS fetch.

GPU Cost Management

Expensive GPU time shouldn't be unlimited. Set per-user token quotas to prevent runaway costs and ensure fair resource allocation across teams.

Compliance & Audit

Need to know who asked what? Attach logs every request with user identity, timestamp, and token usage—ready for compliance audits.

Frequently Asked Questions

Does Attach add latency to Ollama requests?

Minimal. Token verification adds ~1-2ms per request. JWKS keys are cached locally, so there's no external network call on each request. For typical LLM inference times of 1-30 seconds, this overhead is negligible.

Can I use Attach with Ollama's streaming responses?

Yes. Attach fully supports streaming. The gateway proxies Server-Sent Events (SSE) and chunked responses transparently. Authentication happens once at the start of the request, not per-chunk.

What if my identity provider is down?

Attach caches JWKS (JSON Web Key Sets) locally with configurable TTL. If your IdP becomes unreachable, existing valid tokens continue to work. New tokens can't be issued, but that's your IdP's responsibility, not Attach's.

Can I run multiple Ollama models behind one gateway?

Absolutely. Attach proxies all Ollama endpoints including /api/generate, /api/chat, and /api/embeddings. Users can switch between models (llama3, mistral, etc.) while using the same authentication token.

Ready to Secure Your Ollama Instance?

Get started in under 5 minutes. No registration required.

Quick Start Guide View on GitHub