Resiliency
Glide does fallbacks, provider heath tracking seamlessly for you. Other than that, Glide exposes a various of configurations to let you control that resiliency functionality.
Adaptive Health Tracking
Every time your models fail to serve a request, Glide tracks that and uses that information to make decisions about which model to use next. If a model fails below a certain threshold in a period of time (known as the error budget), Glide will consider that model unhealthy and will not use it to serve requests until its error budget recovers.
Here is a list of conditions that are considered as a failure:
- a model returns an error response (e.g. 5xx HTTP status code). This normally implies some temporary issues on the provide side.
- a model request is rate limited (e.g. 429 HTTP status code). Glide will mark the model as unhealthy for a specific time.
- a model request times out. This usually means the provider is overloaded or has a temporary issues.
- a model request fails with an authentication error (e.g. 401 HTTP status code). Glide will mark the model as unhealthy forever.
- a model returns no response (e.g. in case of OpenAI, the
choices
array is empty).
Health Tracking configuration (located on the model item level):
routers.language[].models[].error_budget
(default: "10/m") - the number of errors per second to tolerate before considering the model unhealthy. Supported time units:ms
,s
,m
(minutes),h
.routers.language[].models[].client.timeout
(default: "10s") - the timeout for model requests
routers:
language:
- id: default
models:
- id: openai
error_budget: "10/m" # tolerate not more than 10 failures per minute
client:
timeout: "10s" # wait not longer than 10 seconds to receive a response from the model
openai:
api_key: "${env:OPENAI_API_KEY}"
Fallbacks
Falling back is a part of every routing strategy Glide provides.
In order to leverage automatic fallbacks you need to configure a router with a model pool with more than one model. It may be two and more different providers or the same provider deployed into a different regions (e.g. AWS and Azure).
What model to fall back to in any specific case is defined by various factors like:
- health of each model in a pool (e.g. Glide is not going to fall back to models considered as unhealthy)
- routing strategy (e.g. priority, least latency, etc.)
To minimize latency, Glide falls back right on the first model error to serve a given request with a healthy model as soon as possible.
Retries
Finally, if the whole model pool is considered unhealthy, Glide resorts to retries with exponential backoff optimistically trying to wait a bit to do its best to serve the request.
Retry configuration (located on the router level):
routers.language[].models[].retry.max_retries
(default: 3) - maximum number of retriesrouters.language[].models[].retry.base_multiplier
(default: 2) - base multiplier for exponential backoffrouters.language[].models[].retry.min_delay
(default: 2s) - minimum delay between retriesrouters.language[].models[].retry.max_delay
(default: 5s) - maximum delay between retries
Here is a sample retry configuration:
routers:
language:
- id: default
models:
- id: openai
retry:
max_retries: 3
base_multiplier: 2
min_delay: "2s"
max_delay: "5s"
openai:
api_key: "${env:OPENAI_API_KEY}"