4. Configuration

All MAC configuration is managed through environment variables in the .env file. The configuration is validated at startup using Pydantic Settings in mac/config.py.

4.1. Core Settings

Variable	Default	Description
`MAC_ENV`	`development`	Environment mode: `development` or `production`
`MAC_SECRET_KEY`	(required)	JWT signing key – must be a strong random string
`MAC_DEV_MODE`	`false`	Enable mock LLM streaming (no GPU needed)
`APP_PORT`	`80`	Port for the web interface

4.2. Database Configuration

Variable	Default	Description
`DATABASE_URL`	`postgresql+asyncpg://mac:mac@postgres:5432/mac`	PostgreSQL connection string
`REDIS_URL`	`redis://redis:6379/0`	Redis connection string

4.3. Authentication Settings

Variable	Default	Description
`JWT_SECRET_KEY`	(from MAC_SECRET_KEY)	JWT signing key
`JWT_ACCESS_TOKEN_EXPIRE_MINUTES`	`1440`	Access token lifetime (24 hours)

4.4. LLM Inference Settings

Variable	Default	Description
`VLLM_BASE_URL`	`http://vllm-speed:8001`	vLLM API endpoint
`VLLM_SPEED_MODEL`	`Qwen/Qwen2.5-7B-Instruct-AWQ`	Default chat model
`OLLAMA_URL`	`http://ollama:11434`	Ollama fallback endpoint
`MAC_ENABLED_MODELS`	`qwen2.5:7b,...`	Comma-separated list of enabled models

4.5. External Services

Variable	Default	Description
`QDRANT_URL`	`http://qdrant:6333`	Qdrant vector database URL
`SEARXNG_URL`	`http://searxng:8080`	SearXNG web search URL
`WHISPER_URL`	`http://whisper:8000`	Whisper STT URL

4.6. Rate Limiting

Variable	Default	Description
`RATE_LIMIT_REQUESTS_PER_HOUR`	`100`	Maximum API requests per hour per user
`RATE_LIMIT_TOKENS_PER_DAY`	`50000`	Maximum AI tokens per day per user
`KERNEL_TIMEOUT`	`120`	Code execution timeout (seconds)

4.7. Cluster Settings

Variable	Default	Description
`CLUSTER_SECRET`	(optional)	Shared secret for worker authentication
`CLUSTER_HOST`	(optional)	Host IP for worker nodes to connect to

4.8. CORS Settings

Variable	Default	Description
`MAC_CORS_ORIGINS`	`*`	Allowed CORS origins (comma-separated)

4.9. GPU Configuration

The vLLM service is configured in docker-compose.yml with these key settings:

mac-vllm-speed:
  image: vllm/vllm-openai:latest
  command: >
    --model Qwen/Qwen2.5-7B-Instruct-AWQ
    --gpu-memory-utilization 0.85
    --max-model-len 8192
  deploy:
    resources:
      reservations:
        devices:
          - driver: nvidia
            count: 1
            capabilities: [gpu]

Tip

For an RTX 3060 12 GB, the AWQ-quantised Qwen2.5-7B model uses approximately 5 GB of VRAM, leaving headroom for KV cache with --gpu-memory-utilization 0.85.