Resiliency

Glide does fallbacks, provider heath tracking seamlessly for you. Other than that, Glide exposes a various of configurations to let you control that resiliency functionality.

Adaptive Health Tracking

Every time your models fail to serve a request, Glide tracks that and uses that information to make decisions about which model to use next. If a model fails below a certain threshold in a period of time (known as the error budget), Glide will consider that model unhealthy and will not use it to serve requests until its error budget recovers.

Here is a list of conditions that are considered as a failure:

a model returns an error response (e.g. 5xx HTTP status code). This normally implies some temporary issues on the provide side.
a model request is rate limited (e.g. 429 HTTP status code). Glide will mark the model as unhealthy for a specific time.
a model request times out. This usually means the provider is overloaded or has a temporary issues.
a model request fails with an authentication error (e.g. 401 HTTP status code). Glide will mark the model as unhealthy forever.
a model returns no response (e.g. in case of OpenAI, the choices array is empty).

Health Tracking configuration (located on the model item level):

routers.language[].models[].error_budget (default: "10/m") - the number of errors per second to tolerate before considering the model unhealthy. Supported time units: ms, s, m (minutes), h.
routers.language[].models[].client.timeout (default: "10s") - the timeout for model requests

routers:
  language:
    - id: default
      models:
        - id: openai
          error_budget: "10/m" # tolerate not more than 10 failures per minute
          client:
            timeout: "10s" # wait not longer than 10 seconds to receive a response from the model
          openai:
            api_key: "${env:OPENAI_API_KEY}"

Fallbacks

Falling back is a part of every routing strategy Glide provides.

In order to leverage automatic fallbacks you need to configure a router with a model pool with more than one model. It may be two and more different providers or the same provider deployed into a different regions (e.g. AWS and Azure).

What model to fall back to in any specific case is defined by various factors like:

health of each model in a pool (e.g. Glide is not going to fall back to models considered as unhealthy)
routing strategy (e.g. priority, least latency, etc.)

To minimize latency, Glide falls back right on the first model error to serve a given request with a healthy model as soon as possible.

Retries

Finally, if the whole model pool is considered unhealthy, Glide resorts to retries with exponential backoff optimistically trying to wait a bit to do its best to serve the request.

Retry configuration (located on the router level):

routers.language[].models[].retry.max_retries (default: 3) - maximum number of retries
routers.language[].models[].retry.base_multiplier (default: 2) - base multiplier for exponential backoff
routers.language[].models[].retry.min_delay (default: 2s) - minimum delay between retries
routers.language[].models[].retry.max_delay (default: 5s) - maximum delay between retries

Here is a sample retry configuration:

routers:
  language:
    - id: default
      models:
        - id: openai
          retry:
            max_retries: 3
            base_multiplier: 2
            min_delay: "2s"
            max_delay: "5s"
          openai:
            api_key: "${env:OPENAI_API_KEY}"

Supported Providers Routing

Resiliency

Adaptive Health Tracking

Fallbacks

Retries

Project

Resources

Feedback