OllamaCloud
Load Balancer

Self-hosted intelligent proxy for distributing inference requests across multiple OllamaCloud accounts. Predictive load balancing, quota management, and real-time monitoring.

install.sh
$ git clone https://github.com/lazarcloud/ollamacloud-load-balancer.git
$ cd ollamacloud-load-balancer

$ docker-compose up --build
[+] Running 1/1
 ✓ app  Creating                         0.2s

[+] Building 0.0s (0/0) DOCKER
[+] Building 1.3s (18/22)
...
app  | Server running at http://0.0.0.0:3000

Powerful Features

Smart Load Balancing

Intelligent routing with virtual credit tracking. Automatically routes to accounts with available quota, avoiding rate limits before they happen.

Real-time Monitoring

Beautiful dashboard with live analytics. Track token usage, model performance, quota status, and account health at a glance.

Auto-Healing Retries

5-attempt failover chain across accounts. If one account hits limits, automatically tries the next with zero user intervention.

Single Binary

Frontend embedded in the Rust binary. Zero external dependencies, lightning-fast startup, and bulletproof deployment.

Quota Auto-Detection

Automatically detects when accounts reset quotas (hourly/daily/weekly). Real-time recovery without manual intervention.

Docker Ready

Single command deployment. Just run docker-compose up. Complete control over your infrastructure and data.

How It Works

1

Add Accounts

Connect multiple OllamaCloud API keys to the dashboard

2

Point Your Client

Replace your Ollama endpoint with the load balancer URL

3

Smart Routing

Load balancer intelligently distributes requests across accounts

4

Monitor & Scale

Watch real-time analytics and quota status in the dashboard

Quick Start

install.sh
$ git clone https://github.com/lazarcloud/ollamacloud-load-balancer.git
$ cd ollamacloud-load-balancer

$ docker-compose up --build
[+] Running 1/1
 ✓ app  Creating                         0.2s

[+] Building 0.0s (0/0) DOCKER
[+] Building 1.3s (18/22)
...
app  | Server running at http://0.0.0.0:3000

That's it!

Your load balancer is running. Visit the dashboard to add your OllamaCloud accounts and start distributing load across multiple accounts immediately.

No complex configuration needed. No external services required. Just Docker and your Ollama API keys.

Built with Modern Tech

Rust + Axum

High-performance backend with zero-copy routing and async I/O

Svelte 5

Reactive frontend with minimal JavaScript payload

SQLite

Reliable embedded database, no external services

Docker

Containerized deployment with multi-stage builds

Ready to Scale?

Start distributing Ollama load across multiple accounts in minutes.