OpenAI
GPT-4o, DALL-E, embeddings, and the Responses API. OpenAI →
Iris provides a unified interface across multiple LLM and embedding providers. Each provider implements
the core.Provider interface, enabling you to switch between services without changing your application
code. This abstraction layer handles authentication, request formatting, response parsing, and
provider-specific quirks.
Iris’s provider abstraction solves several real-world challenges:
| Provider | Chat | Streaming | Tools | Reasoning | Vision | Image Gen | Embeddings | Reranking |
|---|---|---|---|---|---|---|---|---|
| OpenAI | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| Anthropic | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Gemini | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | |
| xAI | ✓ | ✓ | ✓ | ✓ | ✓ | |||
| Z.ai | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Perplexity | ✓ | ✓ | ✓ | ✓ | ||||
| Ollama | ✓ | ✓ | ✓ | ✓ | ✓ | ✓ | ||
| Hugging Face | ✓ | ✓ | ✓ | ✓ | ||||
| Voyage AI | ✓ | ✓ |
| Provider | Environment Variable | Format |
|---|---|---|
| OpenAI | OPENAI_API_KEY | sk-... |
| Anthropic | ANTHROPIC_API_KEY | sk-ant-... |
| Gemini | GEMINI_API_KEY or GOOGLE_API_KEY | AI... |
| xAI | XAI_API_KEY | xai-... |
| Z.ai | ZAI_API_KEY | Varies |
| Perplexity | PERPLEXITY_API_KEY | pplx-... |
| Ollama | OLLAMA_HOST (local) or OLLAMA_API_KEY (cloud) | URL or key |
| Hugging Face | HF_TOKEN or HUGGINGFACE_TOKEN | hf_... |
| Voyage AI | VOYAGE_API_KEY | pa-... |
Every provider implements the core.Provider interface:
type Provider interface { // ID returns the provider identifier (e.g., "openai", "anthropic") ID() string
// Models returns available models for this provider Models() []ModelInfo
// Supports checks if the provider supports a specific feature Supports(feature Feature) bool
// Chat sends a chat completion request and returns the response Chat(ctx context.Context, req *ChatRequest) (*ChatResponse, error)
// StreamChat sends a streaming chat request and returns a stream StreamChat(ctx context.Context, req *ChatRequest) (*ChatStream, error)}All providers follow consistent initialization patterns:
import ( "github.com/petal-labs/iris/providers/openai" "github.com/petal-labs/iris/providers/anthropic" "github.com/petal-labs/iris/providers/gemini")
// Create providers with API keysopenaiProvider := openai.New("sk-...")anthropicProvider := anthropic.New("sk-ant-...")geminiProvider := gemini.New("AI...")// Load from environment variablesopenaiProvider, err := openai.NewFromEnv()if err != nil { log.Fatal("OPENAI_API_KEY not set:", err)}
anthropicProvider, err := anthropic.NewFromEnv()if err != nil { log.Fatal("ANTHROPIC_API_KEY not set:", err)}
geminiProvider, err := gemini.NewFromEnv()if err != nil { log.Fatal("GEMINI_API_KEY or GOOGLE_API_KEY not set:", err)}The Iris CLI provides secure API key management:
# Store keys in the encrypted keystoreiris keys set openaiiris keys set anthropiciris keys set gemini
# List stored keysiris keys list
# Remove a keyiris keys delete openai// Load from keystore (falls back to environment)provider, err := openai.NewFromKeystore()All providers support these standard configuration options:
| Option | Description |
|---|---|
WithBaseURL(url) | Override the API endpoint |
WithHTTPClient(client) | Use a custom *http.Client |
WithHeader(key, value) | Add custom HTTP headers |
WithTimeout(duration) | Set request timeout |
provider := openai.New("sk-...", openai.WithBaseURL("https://custom-endpoint.example.com/v1"), openai.WithTimeout(60 * time.Second), openai.WithHeader("X-Custom-Header", "value"),)Wrap any provider with core.NewClient to add retry logic, telemetry, and middleware:
import "github.com/petal-labs/iris/core"
provider := openai.New("sk-...")client := core.NewClient(provider, core.WithRetryPolicy(core.DefaultRetryPolicy()), core.WithTelemetry(myTelemetryHook), core.WithMaxTokens(4096),)| Option | Description |
|---|---|
WithRetryPolicy(policy) | Configure retry behavior |
WithTelemetry(hook) | Add telemetry/observability |
WithMaxTokens(n) | Set default max tokens |
WithTemperature(t) | Set default temperature |
WithMiddleware(mw) | Add request/response middleware |
Before using provider-specific features, verify support:
if provider.Supports(core.FeatureToolCalling) { // Safe to use tools builder.Tools(myTools...)}
if provider.Supports(core.FeatureEmbeddings) { // Safe to generate embeddings resp, err := provider.Embeddings(ctx, req)}
if provider.Supports(core.FeatureReasoning) { // Safe to use reasoning/thinking builder.ReasoningEffort(core.ReasoningMedium)}
if provider.Supports(core.FeatureBuiltInTools) { // Safe to use built-in tools like web search builder.WebSearch()}const ( FeatureChat Feature = "chat" FeatureChatStreaming Feature = "chat_streaming" FeatureToolCalling Feature = "tool_calling" FeatureReasoning Feature = "reasoning" FeatureBuiltInTools Feature = "built_in_tools" FeatureEmbeddings Feature = "embeddings" FeatureContextualizedEmbeddings Feature = "contextualized_embeddings" FeatureReranking Feature = "reranking")Switch providers without changing application code:
type ProviderRegistry struct { mu sync.RWMutex providers map[string]core.Provider current string}
func NewProviderRegistry() *ProviderRegistry { return &ProviderRegistry{ providers: make(map[string]core.Provider), }}
func (r *ProviderRegistry) Register(name string, p core.Provider) { r.mu.Lock() defer r.mu.Unlock() r.providers[name] = p}
func (r *ProviderRegistry) SetCurrent(name string) error { r.mu.Lock() defer r.mu.Unlock() if _, ok := r.providers[name]; !ok { return fmt.Errorf("provider %s not registered", name) } r.current = name return nil}
func (r *ProviderRegistry) Current() core.Provider { r.mu.RLock() defer r.mu.RUnlock() return r.providers[r.current]}func ProviderFromEnv() (core.Provider, error) { providerName := os.Getenv("LLM_PROVIDER")
switch providerName { case "openai": return openai.NewFromEnv() case "anthropic": return anthropic.NewFromEnv() case "gemini": return gemini.NewFromEnv() case "ollama": return ollama.NewLocal(), nil default: return nil, fmt.Errorf("unknown provider: %s", providerName) }}Implement automatic failover across providers:
type FallbackProvider struct { primary core.Provider fallbacks []core.Provider}
func (f *FallbackProvider) Chat(ctx context.Context, req *core.ChatRequest) (*core.ChatResponse, error) { // Try primary provider resp, err := f.primary.Chat(ctx, req) if err == nil { return resp, nil }
// Log primary failure log.Printf("Primary provider %s failed: %v", f.primary.ID(), err)
// Try fallbacks in order for _, fb := range f.fallbacks { resp, err = fb.Chat(ctx, req) if err == nil { log.Printf("Fallback provider %s succeeded", fb.ID()) return resp, nil } log.Printf("Fallback provider %s failed: %v", fb.ID(), err) }
return nil, fmt.Errorf("all providers failed, last error: %w", err)}
// Usagefallback := &FallbackProvider{ primary: openaiProvider, fallbacks: []core.Provider{anthropicProvider, geminiProvider},}Route requests to cost-effective providers based on task complexity:
type CostAwareRouter struct { cheap core.Provider // e.g., Ollama, GPT-4o-mini standard core.Provider // e.g., GPT-4o, Claude Sonnet premium core.Provider // e.g., GPT-4, Claude Opus}
func (r *CostAwareRouter) Route(complexity string) core.Provider { switch complexity { case "simple": return r.cheap // Simple classification, short responses case "standard": return r.standard // Most tasks case "complex": return r.premium // Complex reasoning, long context default: return r.standard }}
// Usagerouter := &CostAwareRouter{ cheap: ollama.NewLocal(), standard: openai.New("sk-..."), premium: anthropic.New("sk-ant-..."),}
// Route based on taskprovider := router.Route("simple")client := core.NewClient(provider)Query multiple providers simultaneously for comparison or consensus:
func QueryAll(ctx context.Context, providers []core.Provider, req *core.ChatRequest) []ProviderResult { results := make(chan ProviderResult, len(providers))
for _, p := range providers { go func(provider core.Provider) { resp, err := provider.Chat(ctx, req) results <- ProviderResult{ Provider: provider.ID(), Response: resp, Error: err, } }(p) }
var all []ProviderResult for range providers { all = append(all, <-results) } return all}Combine responses from multiple providers:
func Ensemble(ctx context.Context, providers []core.Provider, req *core.ChatRequest) (string, error) { results := QueryAll(ctx, providers, req)
var responses []string for _, r := range results { if r.Error == nil { responses = append(responses, r.Response.Output) } }
if len(responses) == 0 { return "", fmt.Errorf("all providers failed") }
// Use another LLM to synthesize responses synthesisReq := &core.ChatRequest{ Model: "gpt-4o", Messages: []core.Message{ {Role: "system", Content: "Synthesize these responses into a single coherent answer."}, {Role: "user", Content: strings.Join(responses, "\n\n---\n\n")}, }, }
resp, err := providers[0].Chat(ctx, synthesisReq) if err != nil { return responses[0], nil // Fall back to first response } return resp.Output, nil}All providers are safe for concurrent use after construction:
provider := openai.New("sk-...")client := core.NewClient(provider)
// Safe to use from multiple goroutinesvar wg sync.WaitGroupfor i := 0; i < 10; i++ { wg.Add(1) go func(id int) { defer wg.Done() resp, err := client.Chat("gpt-4o"). User(fmt.Sprintf("Request %d", id)). GetResponse(ctx) // Handle response }(i)}wg.Wait()Never hardcode API keys:
// Goodprovider, err := openai.NewFromEnv()
// Bad - don't commit API keysprovider := openai.New("sk-abc123...")Always set appropriate timeouts:
provider := openai.New(key, openai.WithTimeout(30 * time.Second),)
// Or use contextctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)defer cancel()resp, err := client.Chat("gpt-4o").GetResponse(ctx)resp, err := client.Chat(model).User(prompt).GetResponse(ctx)if err != nil { var apiErr *core.APIError if errors.As(err, &apiErr) { switch apiErr.StatusCode { case 429: // Rate limited - back off and retry case 401: // Invalid API key case 503: // Service unavailable - try fallback } } return err}func ProcessWithReasoning(p core.Provider) error { if !p.Supports(core.FeatureReasoning) { return fmt.Errorf("provider %s does not support reasoning", p.ID()) } // Proceed with reasoning request}Match model capabilities to task requirements:
| Use Case | Recommended |
|---|---|
| Simple chat | gpt-4o-mini, claude-3-haiku, gemini-1.5-flash |
| Complex reasoning | gpt-4o, claude-sonnet-4, gemini-1.5-pro |
| Code generation | claude-sonnet-4, gpt-4o |
| Long context | gemini-1.5-pro (1M), claude-3 (200K) |
| Real-time info | Perplexity sonar models |
| Local/private | Ollama with llama3, mistral |
| Embeddings | voyage-3-large, text-embedding-3-large |
OpenAI
GPT-4o, DALL-E, embeddings, and the Responses API. OpenAI →
Anthropic
Claude models with extended thinking and vision. Anthropic →
Gemini
Google’s multimodal models with long context. Gemini →
xAI
Grok models with real-time knowledge. xAI →
Z.ai
GLM models with multilingual support. Z.ai →
Perplexity
Search-augmented models for real-time information. Perplexity →
Ollama
Local models for privacy and offline use. Ollama →
Hugging Face
Thousands of open-source models. Hugging Face →
Voyage AI
Specialized embeddings and reranking. Voyage AI →