Providers

Here is a list of supported language providers:

OpenAI

OpenAI (opens in a new tab) is a super popular LLM provider that basically created the GenAI movement.

Below is an example OpenAI provider config:

routers:
  language:
    - id: default
      models:
        - id: openai
          openai:
            base_url: https://api.openai.com/v1
            chat_endpoint: /chat/completions
            model: gpt-3.5-turbo
            api_key: <YOUR API KEY>
            default_params:
                temperature: 0.8
                top_p: 1
                max_tokens: 100
                n: 1
                frequency_penalty: 0
                presence_penalty: 0
                seed: 42

Here is a list of all supported provider model params:

Github (opens in a new tab)

Azure OpenAI

Azure OpenAI (opens in a new tab) is an Azure-hosted version of OpenAI models.

Below is an example Azure OpenAI provider config:

routers:
  language:
    - id: default
      models:
        - id: azureopenai
          azureopenai:
            base_url: <YOUR AZURE ENDPOINT>
            chat_endpoint: /chat/completions
            api_version: "2023-05-15"
            model: gpt-3.5-turbo
            api_key: <YOUR API KEY>
            default_params:
                temperature: 0.8
                top_p: 1
                max_tokens: 100
                n: 1
                frequency_penalty: 0
                presence_penalty: 0
                seed: 42

Here is a list of all supported provider model params:

Github (opens in a new tab)

Anthropic

Anthropic has not yet integrated with the streaming chat API.

Anthropic (opens in a new tab)

Below is an example Anthropic provider config:

routers:
  language:
    - id: default
      models:
        - id: anthropic
          anthropic:
            base_url: https://api.anthropic.com/v1
            api_version: "2023-06-01"
            chat_endpoint: /messages
            model: claude-instant-1.2
            api_key: <YOUR API KEY>
            default_params:
                system: You are a helpful assistant.
                temperature: 1
                max_tokens: 250

Here is a list of all supported provider model params:

Github (opens in a new tab)

Cohere

Cohere (opens in a new tab) is another popular LLM provider that has a great low latency models.

Here is an example Cohere configuration:

routers:
  language:
    - id: default
      models:
        - id: cohere
          cohere:
            base_url: https://api.cohere.ai/v1
            chat_endpoint: /chat
            model: command-light
            api_key: <YOUR API KEY>
            default_params:
                temperature: 0.3
                p: 0.75

Here is a list of all supported provider model params:

Github (opens in a new tab)

AWS Bedrock

Bedrock has not yet integrated with the streaming chat API.

routers:
  language:
    - id: default
      models:
        - id: cohere
          bedrock:
            base_url: <YOUR AWS ENDPOINT>
            chat_endpoint: /model
            model: amazon.titan-text-express-v1
            api_key: <YOUR API KEY>
            access_key: <YOUR ACCESS KEY>
            secret_key: <YOUR SECRET KEY>
            aws_region: <YOUR REGION>
            default_params:
                temperature: 0
                top_p: 1
                max_tokens: 512
                stop_sequences: []

Here is a list of all supported provider model params:

Github (opens in a new tab)

OctoML

OctoML has not yet integrated with the streaming chat API.

OctoML (opens in a new tab) the default_params and model name for OctoML. Specify override values in the config.yaml file.

routers:
  language:
    - id: default
      models:
        - id: octoml
          octoml:
            base_url: https://text.octoai.run/v1
            chat_endpoint: /chat/completions
            model: mistral-7b-instruct-fp16
            api_key: <YOUR API KEY>
            default_params:
                temperature: 1
                top_p: 1
                max_tokens: 100

Here is a list of all supported provider model params:

Github (opens in a new tab)

Ollama

Ollama has not yet integrated with the streaming chat API.

Ollama (opens in a new tab) is a great way to serve Open Source LLMs locally and beyond.

Here is an example Ollama configuration:

routers:
  language:
    - id: default
      models:
        - id: ollama
          ollama:
            base_url: http://localhost:11434
            chat_endpoint: /api/chat
            model: llama3
            default_params:
                temperature: 0.8
                top_p: 0.9
                num_ctx: 2048
                top_k: 40

Here is a list of all supported provider model params:

Github (opens in a new tab)

Streaming Chat Resiliency

Providers

OpenAI

Azure OpenAI

Anthropic

Cohere

AWS Bedrock

OctoML

Ollama

Project

Resources

Feedback