Features
Drop-in Replacement
Fully compatible with OpenAI v1 format with streaming support. Any client supporting OpenAI API — Cursor, Continue, Open WebUI — just switch the base URL.
Dual-Layer Load Balancing
Layer 1 filters by group, model, health status; Layer 2 selects optimal node via weighted scoring. Hot-model-first strategy gives highest weight to models already in VRAM, avoiding 10-60s cold starts.
Multi-Backend Support
Auto-scans ports to detect LM Studio, Ollama, Generic OpenAI backends. API Translator unifies formats, NDJSON auto-converted to SSE, zero manual configuration.
NAT Traversal
Providers behind NAT/firewalls can still connect. WSS reverse tunnel carries both Heartbeat and Request Forwarding — no frp or public IP needed.
Group-Based Trust
Built on the mutual-aid model — trust is built on social relationships. Group isolation prevents strangers from using your GPU, no complex Reputation System needed.
Real-Time Metrics
Collects real-time GPU utilization, VRAM, temperature, latency via nvidia-smi / rocm-smi. Dashboard shows all group node statuses at a glance.
Architecture
Consumer, Relay, Provider — three layers separated. Cloud relay handles auth and scheduling, Desktop Agent handles local inference.
FastAPI async architecture handling API Gateway, authentication, scheduling and WebSocket management
FastAPI · PostgreSQL · RedisPython core + Tauri Shell, auto-detects backends, GPU monitoring, API translation
Python · Tauri · WebSocketGroup management, node monitoring, API Key management, usage statistics
SvelteKit · TailwindCSS · shadcnSupported Backends
Supports mainstream local inference engines with auto-detection and auto-translation, fully transparent to consumers.
Native OpenAI v1 compatible
NDJSON → SSE auto-conversion
Any OpenAI-compatible endpoint
Security
Who is it for
Pool idle GPUs, unified API for development
80% lower inference costSwitch between models, A/B test different backends
Zero-wait experimentsPool compute with friends to run larger models
4090 × N compute poolData stays in group, local inference with zero uploads
Fully offline deployableBuild your compute mutual-aid circle. Let idle GPUs reach their full potential.