LLM Engine

    Secure Ollama with Authentication

    Ollama ships with no authentication by default. Add OIDC/JWT verification, rate limits, and usage tracking in under 5 minutes with Attach Gateway.

    The Problem

    Running Ollama locally or on a server? By default, anyone with network access can make requests. Teams often resort to:

    • โœ—Ad-hoc nginx configs with basic auth
    • โœ—Exposing ports without any protection
    • โœ—Copy-pasting JWT verification code everywhere
    • โœ—No visibility into who's using what

    What You Get

    OIDC/JWT Authentication

    Verify tokens from Auth0, Okta, or any OIDC provider before requests reach Ollama.

    Rate Limiting & Quotas

    Per-user token limits and request throttling to prevent abuse and manage costs.

    Usage Metrics

    Prometheus metrics and OpenMeter integration for billing and monitoring.

    Quick Start

    1

    Install Attach Gateway

    pip install attach-dev
    2

    Configure your identity provider

    export OIDC_ISSUER=https://your-domain.auth0.com
    export OIDC_AUD=ollama-local
    3

    Start the gateway

    attach-gateway --port 8080
    4

    Make authenticated requests

    curl -H "Authorization: Bearer $JWT" \
      -d '{"model":"llama3","prompt":"hello"}' \
      http://localhost:8080/api/chat

    How It Works

    ๐Ÿ”

    Client sends JWT

    โœ…

    Attach verifies token

    ๐Ÿ“Š

    Logs usage metrics

    ๐Ÿฆ™

    Forwards to Ollama

    Full Feature List

    OIDC/JWT token verification
    Per-user rate limiting
    Token usage quotas
    Prometheus metrics endpoint
    OpenMeter billing integration
    X-Attach-User header injection
    Request/response logging
    Zero code changes to Ollama

    Common Use Cases

    Team Development Server

    Run one Ollama instance for your entire team. Each developer authenticates with their corporate SSO, and you get per-user usage tracking for cost allocation.

    Air-Gapped Environments

    Keep your LLM completely offline while still enforcing authentication. Attach verifies tokens locally without external network calls after initial JWKS fetch.

    GPU Cost Management

    Expensive GPU time shouldn't be unlimited. Set per-user token quotas to prevent runaway costs and ensure fair resource allocation across teams.

    Compliance & Audit

    Need to know who asked what? Attach logs every request with user identity, timestamp, and token usageโ€”ready for compliance audits.

    Frequently Asked Questions

    Does Attach add latency to Ollama requests?

    Minimal. Token verification adds ~1-2ms per request. JWKS keys are cached locally, so there's no external network call on each request. For typical LLM inference times of 1-30 seconds, this overhead is negligible.

    Can I use Attach with Ollama's streaming responses?

    Yes. Attach fully supports streaming. The gateway proxies Server-Sent Events (SSE) and chunked responses transparently. Authentication happens once at the start of the request, not per-chunk.

    What if my identity provider is down?

    Attach caches JWKS (JSON Web Key Sets) locally with configurable TTL. If your IdP becomes unreachable, existing valid tokens continue to work. New tokens can't be issued, but that's your IdP's responsibility, not Attach's.

    Can I run multiple Ollama models behind one gateway?

    Absolutely. Attach proxies all Ollama endpoints including /api/generate, /api/chat, and /api/embeddings. Users can switch between models (llama3, mistral, etc.) while using the same authentication token.

    Ready to Secure Your Ollama Instance?

    Get started in under 5 minutes. No registration required.