Skip to main content

Benchmarks

Benchmarks for LiteLLM Gateway (Proxy Server) tested against a fake OpenAI endpoint.

Use this config for testing:

Note: we're currently migrating to aiohttp which has 10x higher throughput. We recommend using the aiohttp_openai/ provider for load testing.

model_list:
- model_name: "fake-openai-endpoint"
litellm_params:
model: aiohttp_openai/any
api_base: https://your-fake-openai-endpoint.com/chat/completions
api_key: "test"

1 Instance LiteLLM Proxy​

MetricLitellm Proxy (1 Instance)
Median Latency (ms)110
RPS250

Key Findings​

  • Single instance: 250 RPS @ 100ms latency
  • 4 LiteLLM instances: 1000 RPS @ 100ms latency

2 Instances​

Adding 1 instance, will double the RPS and maintain the 100ms-110ms median latency.

MetricLitellm Proxy (2 Instances)
Median Latency (ms)100
RPS500

Logging Callbacks​

GCS Bucket Logging​

Using GCS Bucket has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with GCS Bucket Logging
RPS1133.21137.3
Median Latency (ms)140138

LangSmith logging​

Using LangSmith has no impact on latency, RPS compared to Basic Litellm Proxy

MetricBasic Litellm ProxyLiteLLM Proxy with LangSmith
RPS1133.21135
Median Latency (ms)140132

Locust Settings​

  • 2500 Users
  • 100 user Ramp Up