Skip to content

Providers Overview

Iris provides a unified interface across multiple LLM and embedding providers. Each provider implements the core.Provider interface, enabling you to switch between services without changing your application code. This abstraction layer handles authentication, request formatting, response parsing, and provider-specific quirks.

Iris’s provider abstraction solves several real-world challenges:

  • Vendor flexibility: Avoid lock-in by coding to an interface, not an implementation
  • Cost optimization: Route requests to different providers based on task complexity
  • Reliability: Implement fallback chains across multiple providers
  • Experimentation: Test new models without code changes
  • Compliance: Meet data residency requirements by routing to specific providers
ProviderChatStreamingToolsReasoningVisionImage GenEmbeddingsReranking
OpenAI
Anthropic
Gemini
xAI
Z.ai
Perplexity
Ollama
Hugging Face
Voyage AI
ProviderEnvironment VariableFormat
OpenAIOPENAI_API_KEYsk-...
AnthropicANTHROPIC_API_KEYsk-ant-...
GeminiGEMINI_API_KEY or GOOGLE_API_KEYAI...
xAIXAI_API_KEYxai-...
Z.aiZAI_API_KEYVaries
PerplexityPERPLEXITY_API_KEYpplx-...
OllamaOLLAMA_HOST (local) or OLLAMA_API_KEY (cloud)URL or key
Hugging FaceHF_TOKEN or HUGGINGFACE_TOKENhf_...
Voyage AIVOYAGE_API_KEYpa-...

Every provider implements the core.Provider interface:

type Provider interface {
// ID returns the provider identifier (e.g., "openai", "anthropic")
ID() string
// Models returns available models for this provider
Models() []ModelInfo
// Supports checks if the provider supports a specific feature
Supports(feature Feature) bool
// Chat sends a chat completion request and returns the response
Chat(ctx context.Context, req *ChatRequest) (*ChatResponse, error)
// StreamChat sends a streaming chat request and returns a stream
StreamChat(ctx context.Context, req *ChatRequest) (*ChatStream, error)
}

All providers follow consistent initialization patterns:

import (
"github.com/petal-labs/iris/providers/openai"
"github.com/petal-labs/iris/providers/anthropic"
"github.com/petal-labs/iris/providers/gemini"
)
// Create providers with API keys
openaiProvider := openai.New("sk-...")
anthropicProvider := anthropic.New("sk-ant-...")
geminiProvider := gemini.New("AI...")
// Load from environment variables
openaiProvider, err := openai.NewFromEnv()
if err != nil {
log.Fatal("OPENAI_API_KEY not set:", err)
}
anthropicProvider, err := anthropic.NewFromEnv()
if err != nil {
log.Fatal("ANTHROPIC_API_KEY not set:", err)
}
geminiProvider, err := gemini.NewFromEnv()
if err != nil {
log.Fatal("GEMINI_API_KEY or GOOGLE_API_KEY not set:", err)
}

The Iris CLI provides secure API key management:

Terminal window
# Store keys in the encrypted keystore
iris keys set openai
iris keys set anthropic
iris keys set gemini
# List stored keys
iris keys list
# Remove a key
iris keys delete openai
// Load from keystore (falls back to environment)
provider, err := openai.NewFromKeystore()

All providers support these standard configuration options:

OptionDescription
WithBaseURL(url)Override the API endpoint
WithHTTPClient(client)Use a custom *http.Client
WithHeader(key, value)Add custom HTTP headers
WithTimeout(duration)Set request timeout
provider := openai.New("sk-...",
openai.WithBaseURL("https://custom-endpoint.example.com/v1"),
openai.WithTimeout(60 * time.Second),
openai.WithHeader("X-Custom-Header", "value"),
)

Wrap any provider with core.NewClient to add retry logic, telemetry, and middleware:

import "github.com/petal-labs/iris/core"
provider := openai.New("sk-...")
client := core.NewClient(provider,
core.WithRetryPolicy(core.DefaultRetryPolicy()),
core.WithTelemetry(myTelemetryHook),
core.WithMaxTokens(4096),
)
OptionDescription
WithRetryPolicy(policy)Configure retry behavior
WithTelemetry(hook)Add telemetry/observability
WithMaxTokens(n)Set default max tokens
WithTemperature(t)Set default temperature
WithMiddleware(mw)Add request/response middleware

Before using provider-specific features, verify support:

if provider.Supports(core.FeatureToolCalling) {
// Safe to use tools
builder.Tools(myTools...)
}
if provider.Supports(core.FeatureEmbeddings) {
// Safe to generate embeddings
resp, err := provider.Embeddings(ctx, req)
}
if provider.Supports(core.FeatureReasoning) {
// Safe to use reasoning/thinking
builder.ReasoningEffort(core.ReasoningMedium)
}
if provider.Supports(core.FeatureBuiltInTools) {
// Safe to use built-in tools like web search
builder.WebSearch()
}
const (
FeatureChat Feature = "chat"
FeatureChatStreaming Feature = "chat_streaming"
FeatureToolCalling Feature = "tool_calling"
FeatureReasoning Feature = "reasoning"
FeatureBuiltInTools Feature = "built_in_tools"
FeatureEmbeddings Feature = "embeddings"
FeatureContextualizedEmbeddings Feature = "contextualized_embeddings"
FeatureReranking Feature = "reranking"
)

Switch providers without changing application code:

type ProviderRegistry struct {
mu sync.RWMutex
providers map[string]core.Provider
current string
}
func NewProviderRegistry() *ProviderRegistry {
return &ProviderRegistry{
providers: make(map[string]core.Provider),
}
}
func (r *ProviderRegistry) Register(name string, p core.Provider) {
r.mu.Lock()
defer r.mu.Unlock()
r.providers[name] = p
}
func (r *ProviderRegistry) SetCurrent(name string) error {
r.mu.Lock()
defer r.mu.Unlock()
if _, ok := r.providers[name]; !ok {
return fmt.Errorf("provider %s not registered", name)
}
r.current = name
return nil
}
func (r *ProviderRegistry) Current() core.Provider {
r.mu.RLock()
defer r.mu.RUnlock()
return r.providers[r.current]
}
func ProviderFromEnv() (core.Provider, error) {
providerName := os.Getenv("LLM_PROVIDER")
switch providerName {
case "openai":
return openai.NewFromEnv()
case "anthropic":
return anthropic.NewFromEnv()
case "gemini":
return gemini.NewFromEnv()
case "ollama":
return ollama.NewLocal(), nil
default:
return nil, fmt.Errorf("unknown provider: %s", providerName)
}
}

Implement automatic failover across providers:

type FallbackProvider struct {
primary core.Provider
fallbacks []core.Provider
}
func (f *FallbackProvider) Chat(ctx context.Context, req *core.ChatRequest) (*core.ChatResponse, error) {
// Try primary provider
resp, err := f.primary.Chat(ctx, req)
if err == nil {
return resp, nil
}
// Log primary failure
log.Printf("Primary provider %s failed: %v", f.primary.ID(), err)
// Try fallbacks in order
for _, fb := range f.fallbacks {
resp, err = fb.Chat(ctx, req)
if err == nil {
log.Printf("Fallback provider %s succeeded", fb.ID())
return resp, nil
}
log.Printf("Fallback provider %s failed: %v", fb.ID(), err)
}
return nil, fmt.Errorf("all providers failed, last error: %w", err)
}
// Usage
fallback := &FallbackProvider{
primary: openaiProvider,
fallbacks: []core.Provider{anthropicProvider, geminiProvider},
}

Route requests to cost-effective providers based on task complexity:

type CostAwareRouter struct {
cheap core.Provider // e.g., Ollama, GPT-4o-mini
standard core.Provider // e.g., GPT-4o, Claude Sonnet
premium core.Provider // e.g., GPT-4, Claude Opus
}
func (r *CostAwareRouter) Route(complexity string) core.Provider {
switch complexity {
case "simple":
return r.cheap // Simple classification, short responses
case "standard":
return r.standard // Most tasks
case "complex":
return r.premium // Complex reasoning, long context
default:
return r.standard
}
}
// Usage
router := &CostAwareRouter{
cheap: ollama.NewLocal(),
standard: openai.New("sk-..."),
premium: anthropic.New("sk-ant-..."),
}
// Route based on task
provider := router.Route("simple")
client := core.NewClient(provider)

Query multiple providers simultaneously for comparison or consensus:

func QueryAll(ctx context.Context, providers []core.Provider, req *core.ChatRequest) []ProviderResult {
results := make(chan ProviderResult, len(providers))
for _, p := range providers {
go func(provider core.Provider) {
resp, err := provider.Chat(ctx, req)
results <- ProviderResult{
Provider: provider.ID(),
Response: resp,
Error: err,
}
}(p)
}
var all []ProviderResult
for range providers {
all = append(all, <-results)
}
return all
}

Combine responses from multiple providers:

func Ensemble(ctx context.Context, providers []core.Provider, req *core.ChatRequest) (string, error) {
results := QueryAll(ctx, providers, req)
var responses []string
for _, r := range results {
if r.Error == nil {
responses = append(responses, r.Response.Output)
}
}
if len(responses) == 0 {
return "", fmt.Errorf("all providers failed")
}
// Use another LLM to synthesize responses
synthesisReq := &core.ChatRequest{
Model: "gpt-4o",
Messages: []core.Message{
{Role: "system", Content: "Synthesize these responses into a single coherent answer."},
{Role: "user", Content: strings.Join(responses, "\n\n---\n\n")},
},
}
resp, err := providers[0].Chat(ctx, synthesisReq)
if err != nil {
return responses[0], nil // Fall back to first response
}
return resp.Output, nil
}

All providers are safe for concurrent use after construction:

provider := openai.New("sk-...")
client := core.NewClient(provider)
// Safe to use from multiple goroutines
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
wg.Add(1)
go func(id int) {
defer wg.Done()
resp, err := client.Chat("gpt-4o").
User(fmt.Sprintf("Request %d", id)).
GetResponse(ctx)
// Handle response
}(i)
}
wg.Wait()

Never hardcode API keys:

// Good
provider, err := openai.NewFromEnv()
// Bad - don't commit API keys
provider := openai.New("sk-abc123...")

Always set appropriate timeouts:

provider := openai.New(key,
openai.WithTimeout(30 * time.Second),
)
// Or use context
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
resp, err := client.Chat("gpt-4o").GetResponse(ctx)
resp, err := client.Chat(model).User(prompt).GetResponse(ctx)
if err != nil {
var apiErr *core.APIError
if errors.As(err, &apiErr) {
switch apiErr.StatusCode {
case 429:
// Rate limited - back off and retry
case 401:
// Invalid API key
case 503:
// Service unavailable - try fallback
}
}
return err
}
func ProcessWithReasoning(p core.Provider) error {
if !p.Supports(core.FeatureReasoning) {
return fmt.Errorf("provider %s does not support reasoning", p.ID())
}
// Proceed with reasoning request
}

Match model capabilities to task requirements:

Use CaseRecommended
Simple chatgpt-4o-mini, claude-3-haiku, gemini-1.5-flash
Complex reasoninggpt-4o, claude-sonnet-4, gemini-1.5-pro
Code generationclaude-sonnet-4, gpt-4o
Long contextgemini-1.5-pro (1M), claude-3 (200K)
Real-time infoPerplexity sonar models
Local/privateOllama with llama3, mistral
Embeddingsvoyage-3-large, text-embedding-3-large

OpenAI

GPT-4o, DALL-E, embeddings, and the Responses API. OpenAI →

Anthropic

Claude models with extended thinking and vision. Anthropic →

Gemini

Google’s multimodal models with long context. Gemini →

xAI

Grok models with real-time knowledge. xAI →

Z.ai

GLM models with multilingual support. Z.ai →

Perplexity

Search-augmented models for real-time information. Perplexity →

Ollama

Local models for privacy and offline use. Ollama →