Decentralized GPU Compute Network

LLM ClusterDecentralized GPU Compute

More than GPU sharing — amutual-aid compute platform.
Share local GPUs among friends, unified OpenAI-compatible API, smart scheduling with zero latency.

How it works

Connect → Schedule → Infer

01

Node Online

Desktop Agent auto-detects local LM Studio / Ollama and establishes WSS tunnel

02

Smart Scheduling

Dual-layer load balancing, prioritizing nodes with models already loaded to avoid cold starts

03

Unified Inference

Call local compute power like a cloud service through OpenAI v1 compatible API

Features

A GPU Network Built for Your Circle

OpenAI Compatible API

Drop-in Replacement

Fully compatible with OpenAI v1 format with streaming support. Any client supporting OpenAI API — Cursor, Continue, Open WebUI — just switch the base URL.

Dual-Layer Load Balancing

Dual-Layer Load Balancing

Layer 1 filters by group, model, health status; Layer 2 selects optimal node via weighted scoring. Hot-model-first strategy gives highest weight to models already in VRAM, avoiding 10-60s cold starts.

Multi-Backend Auto-Detection

Multi-Backend Support

Auto-scans ports to detect LM Studio, Ollama, Generic OpenAI backends. API Translator unifies formats, NDJSON auto-converted to SSE, zero manual configuration.

WebSocket Reverse Tunnel

NAT Traversal

Providers behind NAT/firewalls can still connect. WSS reverse tunnel carries both Heartbeat and Request Forwarding — no frp or public IP needed.

Group-Based Trust Isolation

Group-Based Trust

Built on the mutual-aid model — trust is built on social relationships. Group isolation prevents strangers from using your GPU, no complex Reputation System needed.

Real-Time GPU Monitoring

Real-Time Metrics

Collects real-time GPU utilization, VRAM, temperature, latency via nvidia-smi / rocm-smi. Dashboard shows all group node statuses at a glance.

Architecture

Three-Layer Architecture

Consumer, Relay, Provider — three layers separated. Cloud relay handles auth and scheduling, Desktop Agent handles local inference.

Relay Service

FastAPI async architecture handling API Gateway, authentication, scheduling and WebSocket management

FastAPI · PostgreSQL · Redis

Desktop Agent

Python core + Tauri Shell, auto-detects backends, GPU monitoring, API translation

Python · Tauri · WebSocket

Web Dashboard

Group management, node monitoring, API Key management, usage statistics

SvelteKit · TailwindCSS · shadcn

Supported Backends

Your Inference Backend, Your Choice

Supports mainstream local inference engines with auto-detection and auto-translation, fully transparent to consumers.

LM Studio

Native OpenAI v1 compatible

Ollama

NDJSON → SSE auto-conversion

Generic OpenAI

Any OpenAI-compatible endpoint

Security

Three-Layer Authentication

Consumer → Relay
API Key (Bearer Token)
Frontend → Relay
JWT (Google OAuth)
Provider → Relay
JWT + Node Token (WSS Dual Verification)
All external communication over TLSAPI Keys stored as SHA-256 hash onlyThree-tier rate limiting: Key / User / Group

Who is it for

Who Is LLM Cluster For?

👨‍💻

Small Dev Teams

Pool idle GPUs, unified API for development

80% lower inference cost
🔬

AI Researchers

Switch between models, A/B test different backends

Zero-wait experiments
🎮

Hobbyists

Pool compute with friends to run larger models

4090 × N compute pool
🔒

Privacy-First Teams

Data stays in group, local inference with zero uploads

Fully offline deployable

Compute Sharing Starts Here.

Build your compute mutual-aid circle. Let idle GPUs reach their full potential.