Providers Overview

Providers

Iris provides a unified interface across multiple LLM and embedding providers. Each provider implements the core.Provider interface, enabling you to switch between services without changing your application code. This abstraction layer handles authentication, request formatting, response parsing, and provider-specific quirks.

Why Provider Abstraction?

Iris’s provider abstraction solves several real-world challenges:

Vendor flexibility: Avoid lock-in by coding to an interface, not an implementation
Cost optimization: Route requests to different providers based on task complexity
Reliability: Implement fallback chains across multiple providers
Experimentation: Test new models without code changes
Compliance: Meet data residency requirements by routing to specific providers

Provider Matrix

Provider	Chat	Streaming	Tools	Reasoning	Vision	Image Gen	Embeddings	Reranking
OpenAI	✓	✓	✓	✓	✓	✓	✓
Anthropic	✓	✓	✓	✓	✓
Gemini	✓	✓	✓	✓	✓	✓	✓
xAI	✓	✓	✓	✓	✓
Z.ai	✓	✓	✓	✓	✓	✓
Perplexity	✓	✓	✓	✓
Ollama	✓	✓	✓	✓	✓		✓
Hugging Face	✓	✓	✓		✓
Voyage AI							✓	✓

Environment Variables

Provider	Environment Variable	Format
OpenAI	`OPENAI_API_KEY`	`sk-...`
Anthropic	`ANTHROPIC_API_KEY`	`sk-ant-...`
Gemini	`GEMINI_API_KEY` or `GOOGLE_API_KEY`	`AI...`
xAI	`XAI_API_KEY`	`xai-...`
Z.ai	`ZAI_API_KEY`	Varies
Perplexity	`PERPLEXITY_API_KEY`	`pplx-...`
Ollama	`OLLAMA_HOST` (local) or `OLLAMA_API_KEY` (cloud)	URL or key
Hugging Face	`HF_TOKEN` or `HUGGINGFACE_TOKEN`	`hf_...`
Voyage AI	`VOYAGE_API_KEY`	`pa-...`

Provider Interface

Every provider implements the core.Provider interface:

type Provider interface {
    // ID returns the provider identifier (e.g., "openai", "anthropic")
    ID() string

    // Models returns available models for this provider
    Models() []ModelInfo

    // Supports checks if the provider supports a specific feature
    Supports(feature Feature) bool

    // Chat sends a chat completion request and returns the response
    Chat(ctx context.Context, req *ChatRequest) (*ChatResponse, error)

    // StreamChat sends a streaming chat request and returns a stream
    StreamChat(ctx context.Context, req *ChatRequest) (*ChatStream, error)
}

Initialization Patterns

All providers follow consistent initialization patterns:

Direct Initialization

import (
    "github.com/petal-labs/iris/providers/openai"
    "github.com/petal-labs/iris/providers/anthropic"
    "github.com/petal-labs/iris/providers/gemini"
)

// Create providers with API keys
openaiProvider := openai.New("sk-...")
anthropicProvider := anthropic.New("sk-ant-...")
geminiProvider := gemini.New("AI...")

Environment Variable Initialization

// Load from environment variables
openaiProvider, err := openai.NewFromEnv()
if err != nil {
    log.Fatal("OPENAI_API_KEY not set:", err)
}

anthropicProvider, err := anthropic.NewFromEnv()
if err != nil {
    log.Fatal("ANTHROPIC_API_KEY not set:", err)
}

geminiProvider, err := gemini.NewFromEnv()
if err != nil {
    log.Fatal("GEMINI_API_KEY or GOOGLE_API_KEY not set:", err)
}

Using the Iris CLI Keystore

The Iris CLI provides secure API key management:

# Store keys in the encrypted keystore
iris keys set openai
iris keys set anthropic
iris keys set gemini

# List stored keys
iris keys list

# Remove a key
iris keys delete openai

// Load from keystore (falls back to environment)
provider, err := openai.NewFromKeystore()

Common Configuration Options

All providers support these standard configuration options:

Option	Description
`WithBaseURL(url)`	Override the API endpoint
`WithHTTPClient(client)`	Use a custom `*http.Client`
`WithHeader(key, value)`	Add custom HTTP headers
`WithTimeout(duration)`	Set request timeout

provider := openai.New("sk-...",
    openai.WithBaseURL("https://custom-endpoint.example.com/v1"),
    openai.WithTimeout(60 * time.Second),
    openai.WithHeader("X-Custom-Header", "value"),
)

Creating a Client

Wrap any provider with core.NewClient to add retry logic, telemetry, and middleware:

import "github.com/petal-labs/iris/core"

provider := openai.New("sk-...")
client := core.NewClient(provider,
    core.WithRetryPolicy(core.DefaultRetryPolicy()),
    core.WithTelemetry(myTelemetryHook),
    core.WithMaxTokens(4096),
)

Client Options

Option	Description
`WithRetryPolicy(policy)`	Configure retry behavior
`WithTelemetry(hook)`	Add telemetry/observability
`WithMaxTokens(n)`	Set default max tokens
`WithTemperature(t)`	Set default temperature
`WithMiddleware(mw)`	Add request/response middleware

Feature Checking

Before using provider-specific features, verify support:

if provider.Supports(core.FeatureToolCalling) {
    // Safe to use tools
    builder.Tools(myTools...)
}

if provider.Supports(core.FeatureEmbeddings) {
    // Safe to generate embeddings
    resp, err := provider.Embeddings(ctx, req)
}

if provider.Supports(core.FeatureReasoning) {
    // Safe to use reasoning/thinking
    builder.ReasoningEffort(core.ReasoningMedium)
}

if provider.Supports(core.FeatureBuiltInTools) {
    // Safe to use built-in tools like web search
    builder.WebSearch()
}

Available Features

const (
    FeatureChat                     Feature = "chat"
    FeatureChatStreaming            Feature = "chat_streaming"
    FeatureToolCalling              Feature = "tool_calling"
    FeatureReasoning                Feature = "reasoning"
    FeatureBuiltInTools             Feature = "built_in_tools"
    FeatureEmbeddings               Feature = "embeddings"
    FeatureContextualizedEmbeddings Feature = "contextualized_embeddings"
    FeatureReranking                Feature = "reranking"
)

Provider Switching

Runtime Provider Switching

Switch providers without changing application code:

type ProviderRegistry struct {
    mu        sync.RWMutex
    providers map[string]core.Provider
    current   string
}

func NewProviderRegistry() *ProviderRegistry {
    return &ProviderRegistry{
        providers: make(map[string]core.Provider),
    }
}

func (r *ProviderRegistry) Register(name string, p core.Provider) {
    r.mu.Lock()
    defer r.mu.Unlock()
    r.providers[name] = p
}

func (r *ProviderRegistry) SetCurrent(name string) error {
    r.mu.Lock()
    defer r.mu.Unlock()
    if _, ok := r.providers[name]; !ok {
        return fmt.Errorf("provider %s not registered", name)
    }
    r.current = name
    return nil
}

func (r *ProviderRegistry) Current() core.Provider {
    r.mu.RLock()
    defer r.mu.RUnlock()
    return r.providers[r.current]
}

Environment-Based Provider Selection

func ProviderFromEnv() (core.Provider, error) {
    providerName := os.Getenv("LLM_PROVIDER")

    switch providerName {
    case "openai":
        return openai.NewFromEnv()
    case "anthropic":
        return anthropic.NewFromEnv()
    case "gemini":
        return gemini.NewFromEnv()
    case "ollama":
        return ollama.NewLocal(), nil
    default:
        return nil, fmt.Errorf("unknown provider: %s", providerName)
    }
}

Fallback Chains

Implement automatic failover across providers:

type FallbackProvider struct {
    primary   core.Provider
    fallbacks []core.Provider
}

func (f *FallbackProvider) Chat(ctx context.Context, req *core.ChatRequest) (*core.ChatResponse, error) {
    // Try primary provider
    resp, err := f.primary.Chat(ctx, req)
    if err == nil {
        return resp, nil
    }

    // Log primary failure
    log.Printf("Primary provider %s failed: %v", f.primary.ID(), err)

    // Try fallbacks in order
    for _, fb := range f.fallbacks {
        resp, err = fb.Chat(ctx, req)
        if err == nil {
            log.Printf("Fallback provider %s succeeded", fb.ID())
            return resp, nil
        }
        log.Printf("Fallback provider %s failed: %v", fb.ID(), err)
    }

    return nil, fmt.Errorf("all providers failed, last error: %w", err)
}

// Usage
fallback := &FallbackProvider{
    primary:   openaiProvider,
    fallbacks: []core.Provider{anthropicProvider, geminiProvider},
}

Cost-Based Routing

Route requests to cost-effective providers based on task complexity:

type CostAwareRouter struct {
    cheap     core.Provider  // e.g., Ollama, GPT-4o-mini
    standard  core.Provider  // e.g., GPT-4o, Claude Sonnet
    premium   core.Provider  // e.g., GPT-4, Claude Opus
}

func (r *CostAwareRouter) Route(complexity string) core.Provider {
    switch complexity {
    case "simple":
        return r.cheap      // Simple classification, short responses
    case "standard":
        return r.standard   // Most tasks
    case "complex":
        return r.premium    // Complex reasoning, long context
    default:
        return r.standard
    }
}

// Usage
router := &CostAwareRouter{
    cheap:    ollama.NewLocal(),
    standard: openai.New("sk-..."),
    premium:  anthropic.New("sk-ant-..."),
}

// Route based on task
provider := router.Route("simple")
client := core.NewClient(provider)

Multi-Provider Patterns

Parallel Execution

Query multiple providers simultaneously for comparison or consensus:

func QueryAll(ctx context.Context, providers []core.Provider, req *core.ChatRequest) []ProviderResult {
    results := make(chan ProviderResult, len(providers))

    for _, p := range providers {
        go func(provider core.Provider) {
            resp, err := provider.Chat(ctx, req)
            results <- ProviderResult{
                Provider: provider.ID(),
                Response: resp,
                Error:    err,
            }
        }(p)
    }

    var all []ProviderResult
    for range providers {
        all = append(all, <-results)
    }
    return all
}

Ensemble Responses

Combine responses from multiple providers:

func Ensemble(ctx context.Context, providers []core.Provider, req *core.ChatRequest) (string, error) {
    results := QueryAll(ctx, providers, req)

    var responses []string
    for _, r := range results {
        if r.Error == nil {
            responses = append(responses, r.Response.Output)
        }
    }

    if len(responses) == 0 {
        return "", fmt.Errorf("all providers failed")
    }

    // Use another LLM to synthesize responses
    synthesisReq := &core.ChatRequest{
        Model: "gpt-4o",
        Messages: []core.Message{
            {Role: "system", Content: "Synthesize these responses into a single coherent answer."},
            {Role: "user", Content: strings.Join(responses, "\n\n---\n\n")},
        },
    }

    resp, err := providers[0].Chat(ctx, synthesisReq)
    if err != nil {
        return responses[0], nil // Fall back to first response
    }
    return resp.Output, nil
}

Thread Safety

All providers are safe for concurrent use after construction:

provider := openai.New("sk-...")
client := core.NewClient(provider)

// Safe to use from multiple goroutines
var wg sync.WaitGroup
for i := 0; i < 10; i++ {
    wg.Add(1)
    go func(id int) {
        defer wg.Done()
        resp, err := client.Chat("gpt-4o").
            User(fmt.Sprintf("Request %d", id)).
            GetResponse(ctx)
        // Handle response
    }(i)
}
wg.Wait()

Best Practices

1. Use Environment Variables

Never hardcode API keys:

// Good
provider, err := openai.NewFromEnv()

// Bad - don't commit API keys
provider := openai.New("sk-abc123...")

2. Implement Timeouts

Always set appropriate timeouts:

provider := openai.New(key,
    openai.WithTimeout(30 * time.Second),
)

// Or use context
ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
defer cancel()
resp, err := client.Chat("gpt-4o").GetResponse(ctx)

3. Handle Errors Gracefully

resp, err := client.Chat(model).User(prompt).GetResponse(ctx)
if err != nil {
    var apiErr *core.APIError
    if errors.As(err, &apiErr) {
        switch apiErr.StatusCode {
        case 429:
            // Rate limited - back off and retry
        case 401:
            // Invalid API key
        case 503:
            // Service unavailable - try fallback
        }
    }
    return err
}

4. Check Feature Support

func ProcessWithReasoning(p core.Provider) error {
    if !p.Supports(core.FeatureReasoning) {
        return fmt.Errorf("provider %s does not support reasoning", p.ID())
    }
    // Proceed with reasoning request
}

5. Use Appropriate Models

Match model capabilities to task requirements:

Use Case	Recommended
Simple chat	gpt-4o-mini, claude-3-haiku, gemini-1.5-flash
Complex reasoning	gpt-4o, claude-sonnet-4, gemini-1.5-pro
Code generation	claude-sonnet-4, gpt-4o
Long context	gemini-1.5-pro (1M), claude-3 (200K)
Real-time info	Perplexity sonar models
Local/private	Ollama with llama3, mistral
Embeddings	voyage-3-large, text-embedding-3-large

Provider-Specific Documentation

OpenAI

GPT-4o, DALL-E, embeddings, and the Responses API. OpenAI →

Anthropic

Claude models with extended thinking and vision. Anthropic →

Gemini

Google’s multimodal models with long context. Gemini →

xAI

Grok models with real-time knowledge. xAI →

Z.ai

GLM models with multilingual support. Z.ai →

Perplexity

Search-augmented models for real-time information. Perplexity →

Ollama

Local models for privacy and offline use. Ollama →

Hugging Face

Thousands of open-source models. Hugging Face →

Voyage AI

Specialized embeddings and reranking. Voyage AI →