Ollama ships with no authentication by default. Add OIDC/JWT verification, rate limits, and usage tracking in under 5 minutes with Attach Gateway.
Running Ollama locally or on a server? By default, anyone with network access can make requests. Teams often resort to:
Verify tokens from Auth0, Okta, or any OIDC provider before requests reach Ollama.
Per-user token limits and request throttling to prevent abuse and manage costs.
Prometheus metrics and OpenMeter integration for billing and monitoring.
pip install attach-dev
export OIDC_ISSUER=https://your-domain.auth0.com export OIDC_AUD=ollama-local
attach-gateway --port 8080
curl -H "Authorization: Bearer $JWT" \
-d '{"model":"llama3","prompt":"hello"}' \
http://localhost:8080/api/chatClient sends JWT
Attach verifies token
Logs usage metrics
Forwards to Ollama
Run one Ollama instance for your entire team. Each developer authenticates with their corporate SSO, and you get per-user usage tracking for cost allocation.
Keep your LLM completely offline while still enforcing authentication. Attach verifies tokens locally without external network calls after initial JWKS fetch.
Expensive GPU time shouldn't be unlimited. Set per-user token quotas to prevent runaway costs and ensure fair resource allocation across teams.
Need to know who asked what? Attach logs every request with user identity, timestamp, and token usageโready for compliance audits.
Minimal. Token verification adds ~1-2ms per request. JWKS keys are cached locally, so there's no external network call on each request. For typical LLM inference times of 1-30 seconds, this overhead is negligible.
Yes. Attach fully supports streaming. The gateway proxies Server-Sent Events (SSE) and chunked responses transparently. Authentication happens once at the start of the request, not per-chunk.
Attach caches JWKS (JSON Web Key Sets) locally with configurable TTL. If your IdP becomes unreachable, existing valid tokens continue to work. New tokens can't be issued, but that's your IdP's responsibility, not Attach's.
Absolutely. Attach proxies all Ollama endpoints including /api/generate, /api/chat, and /api/embeddings. Users can switch between models (llama3, mistral, etc.) while using the same authentication token.
Get started in under 5 minutes. No registration required.