Gemini

The Gemini provider connects Iris to Google’s Gemini model family. Gemini excels at multimodal tasks, long-context processing (up to 1M tokens), reasoning, and offers competitive performance at attractive price points.

Quick Start

package main

import (
    "context"
    "fmt"
    "os"

    "github.com/petal-labs/iris/core"
    "github.com/petal-labs/iris/providers/gemini"
)

func main() {
    provider := gemini.New(os.Getenv("GEMINI_API_KEY"))
    client := core.NewClient(provider)

    resp, err := client.Chat("gemini-2.5-pro").
        System("You are a helpful assistant.").
        User("What are Go's strengths for backend services?").
        GetResponse(context.Background())

    if err != nil {
        panic(err)
    }
    fmt.Println(resp.Output)
}

Set Your API Key

CLI Keystore
Environment Variable

# Store in the encrypted keystore (recommended)
iris keys set gemini
# Prompts for: Enter API key for gemini: AI...

# Primary variable
export GEMINI_API_KEY=AI...

# Fallback (also accepted)
export GOOGLE_API_KEY=AI...

Import

import "github.com/petal-labs/iris/providers/gemini"

Create the Provider

// From an API key string
provider := gemini.New("AI...")

// From GEMINI_API_KEY or GOOGLE_API_KEY environment variable
provider, err := gemini.NewFromEnv()
if err != nil {
    log.Fatal("GEMINI_API_KEY or GOOGLE_API_KEY not set:", err)
}

// From the Iris keystore (falls back to environment)
provider, err := gemini.NewFromKeystore()

Configuration Options

Option	Description	Default
`WithBaseURL(url)`	Override the API base URL	`https://generativelanguage.googleapis.com/v1beta`
`WithHTTPClient(client)`	Use a custom `*http.Client`	Default client
`WithHeader(key, value)`	Add a custom HTTP header	None
`WithTimeout(duration)`	Set the request timeout	60 seconds

provider := gemini.New("AI...",
    gemini.WithTimeout(90 * time.Second),
    gemini.WithHeader("X-Custom-Header", "value"),
)

Supported Features

Feature	Supported	Notes
Chat	✓	All Gemini models
Streaming	✓	Real-time token streaming
Tool calling	✓	Function calling with JSON Schema
Vision	✓	Image and video analysis
Reasoning	✓	Thinking mode with Gemini 2.0+
Image generation	✓	Imagen integration
Embeddings	✓	text-embedding-004
Code execution	✓	Built-in code interpreter
Grounding	✓	Google Search grounding

Available Models

Gemini 2.5 Series

Model	Context	Best For
`gemini-2.5-pro`	1M	Complex reasoning, long documents
`gemini-2.5-flash`	1M	Fast, cost-effective

Gemini 2.0 Series

Model	Context	Best For
`gemini-2.0-pro`	1M	Production workloads
`gemini-2.0-flash`	1M	Speed and efficiency
`gemini-2.0-flash-thinking`	32K	Explicit reasoning

Gemini 1.5 Series

Model	Context	Best For
`gemini-1.5-pro`	2M	Maximum context window
`gemini-1.5-flash`	1M	Balanced performance
`gemini-1.5-flash-8b`	1M	Most cost-effective

Embedding Models

Model	Dimensions	Best For
`text-embedding-004`	768	General purpose embeddings
`embedding-001`	768	Legacy embeddings

Basic Chat

resp, err := client.Chat("gemini-2.5-pro").
    System("You are a helpful coding assistant.").
    User("Explain Go's interface composition pattern.").
    Temperature(0.3).
    MaxTokens(1000).
    GetResponse(ctx)

if err != nil {
    log.Fatal(err)
}
fmt.Println(resp.Output)
fmt.Printf("Tokens: %d input, %d output\n", resp.Usage.InputTokens, resp.Usage.OutputTokens)

Streaming

Stream responses for real-time output:

stream, err := client.Chat("gemini-2.5-flash").
    System("You are a helpful assistant.").
    User("Explain Go's memory model.").
    GetStream(ctx)

if err != nil {
    log.Fatal(err)
}

for chunk := range stream.Ch {
    fmt.Print(chunk.Content)
}
fmt.Println()

// Check for streaming errors
if err := <-stream.Err; err != nil {
    log.Fatal(err)
}

// Get final response with usage stats
final := <-stream.Final
fmt.Printf("Total tokens: %d\n", final.Usage.TotalTokens)

Vision (Multimodal)

Gemini has strong multimodal capabilities for analyzing images and videos:

// Image from URL
resp, err := client.Chat("gemini-2.5-pro").
    System("You are a helpful image analyst.").
    UserMultimodal().
        Text("What's in this image? Describe it in detail.").
        ImageURL("https://example.com/photo.jpg").
        Done().
    GetResponse(ctx)

Image from Base64

imageData, err := os.ReadFile("diagram.png")
if err != nil {
    log.Fatal(err)
}
base64Data := base64.StdEncoding.EncodeToString(imageData)

resp, err := client.Chat("gemini-2.5-pro").
    UserMultimodal().
        Text("Explain this architecture diagram.").
        ImageBase64(base64Data, "image/png").
        Done().
    GetResponse(ctx)

Multiple Images

resp, err := client.Chat("gemini-2.5-pro").
    UserMultimodal().
        Text("Compare these UI mockups and suggest improvements.").
        ImageURL("https://example.com/mockup-v1.png").
        ImageURL("https://example.com/mockup-v2.png").
        Done().
    GetResponse(ctx)

Video Analysis

Gemini can analyze video content:

// Video from URL (YouTube, Google Drive, or direct link)
resp, err := client.Chat("gemini-2.5-pro").
    UserMultimodal().
        Text("Summarize the key points in this video.").
        VideoURL("https://www.youtube.com/watch?v=...").
        Done().
    GetResponse(ctx)

Supported Media Formats

Type	Formats	Max Size
Images	PNG, JPEG, GIF, WebP, HEIC, HEIF	20 MB
Video	MP4, MOV, MPEG, AVI, MKV, WEBM	2 GB
Audio	MP3, WAV, AIFF, AAC, OGG, FLAC	40 MB
Documents	PDF, TXT, HTML, CSS, JS, PY, etc.	50 MB

Thinking Mode (Reasoning)

Gemini 2.0+ models support thinking mode for explicit reasoning:

resp, err := client.Chat("gemini-2.0-flash-thinking").
    System("You are a world-class problem solver.").
    User("Solve this logic puzzle step by step...").
    GetResponse(ctx)

// Access the reasoning process
if resp.Thinking != "" {
    fmt.Println("=== Reasoning Process ===")
    fmt.Println(resp.Thinking)
    fmt.Println()
}

fmt.Println("=== Final Answer ===")
fmt.Println(resp.Output)

Enable Thinking on Other Models

resp, err := client.Chat("gemini-2.5-pro").
    User("Analyze this complex algorithm...").
    Thinking(true).
    ThinkingBudget(5000).  // Max tokens for reasoning
    GetResponse(ctx)

Tool Calling (Function Calling)

Define and use tools with Gemini:

// Define a tool
searchTool := core.Tool{
    Name:        "search_products",
    Description: "Search the product catalog for matching items",
    Parameters: map[string]interface{}{
        "type": "object",
        "properties": map[string]interface{}{
            "query": map[string]interface{}{
                "type":        "string",
                "description": "Search query for products",
            },
            "category": map[string]interface{}{
                "type":        "string",
                "enum":        []string{"electronics", "clothing", "home"},
                "description": "Product category filter",
            },
            "max_price": map[string]interface{}{
                "type":        "number",
                "description": "Maximum price in USD",
            },
        },
        "required": []string{"query"},
    },
}

// First request - Gemini decides to call tool
resp, err := client.Chat("gemini-2.5-pro").
    System("You are a helpful shopping assistant.").
    User("Find me a wireless keyboard under $100").
    Tools(searchTool).
    GetResponse(ctx)

if len(resp.ToolCalls) > 0 {
    call := resp.ToolCalls[0]

    // Execute the search (your implementation)
    results := searchProducts(call.Arguments)

    // Continue with tool result
    finalResp, err := client.Chat("gemini-2.5-pro").
        System("You are a helpful shopping assistant.").
        User("Find me a wireless keyboard under $100").
        Tools(searchTool).
        Assistant(resp.Output).
        ToolCall(call.ID, call.Name, call.Arguments).
        ToolResult(call.ID, results).
        GetResponse(ctx)

    fmt.Println(finalResp.Output)
}

Parallel Tool Calls

Gemini can call multiple tools simultaneously:

resp, err := client.Chat("gemini-2.5-pro").
    User("Check the weather in Tokyo and find flights from NYC").
    Tools(weatherTool, flightTool).
    ToolChoice(core.ToolChoiceAuto).
    GetResponse(ctx)

// Process all tool calls
for _, call := range resp.ToolCalls {
    switch call.Name {
    case "get_weather":
        // Handle weather
    case "search_flights":
        // Handle flights
    }
}

Code Execution

Use Gemini’s built-in code interpreter:

resp, err := client.Chat("gemini-2.5-pro").
    User("Calculate the compound interest on $10,000 at 5% for 10 years").
    CodeExecution(true).
    GetResponse(ctx)

// Access executed code and output
if resp.CodeExecution != nil {
    fmt.Println("Code:", resp.CodeExecution.Code)
    fmt.Println("Output:", resp.CodeExecution.Output)
}
fmt.Println("Answer:", resp.Output)

Google Search Grounding

Ground responses in real-time search results:

resp, err := client.Chat("gemini-2.5-pro").
    User("What are the latest Go releases and their features?").
    Grounding(core.GroundingGoogleSearch).
    GetResponse(ctx)

// Access grounding sources
for _, source := range resp.GroundingSources {
    fmt.Printf("Source: %s - %s\n", source.Title, source.URL)
}

fmt.Println("Answer:", resp.Output)

Image Generation (Imagen)

Generate images using Imagen:

resp, err := provider.GenerateImage(ctx, &core.ImageGenerateRequest{
    Model:   "imagen-3.0-generate-001",
    Prompt:  "A futuristic city skyline at sunset with flying cars",
    Size:    core.ImageSize1024x1024,
    N:       1,
})

if err != nil {
    log.Fatal(err)
}

fmt.Printf("Image URL: %s\n", resp.Images[0].URL)

Image Sizes

Size	Aspect Ratio
`1024x1024`	1:1 (square)
`1536x1024`	3:2 (landscape)
`1024x1536`	2:3 (portrait)

Embeddings

Generate embeddings for semantic search:

resp, err := provider.Embeddings(ctx, &core.EmbeddingRequest{
    Model: "text-embedding-004",
    Input: []core.EmbeddingInput{
        {Text: "Go is a statically typed language."},
        {Text: "Python is dynamically typed."},
    },
})

if err != nil {
    log.Fatal(err)
}

for i, emb := range resp.Embeddings {
    fmt.Printf("Embedding %d: %d dimensions\n", i, len(emb.Values))
}

Task-Specific Embeddings

Optimize embeddings for specific tasks:

resp, err := provider.Embeddings(ctx, &core.EmbeddingRequest{
    Model: "text-embedding-004",
    Input: inputs,
    TaskType: core.TaskTypeRetrieval,  // Optimized for RAG
})

// Task types: RETRIEVAL_QUERY, RETRIEVAL_DOCUMENT, SEMANTIC_SIMILARITY,
//             CLASSIFICATION, CLUSTERING

Long Context Processing

Leverage Gemini’s massive context window:

// Process an entire codebase
codebase := loadEntireCodebase()  // Could be 500K+ tokens

resp, err := client.Chat("gemini-1.5-pro").  // 2M context
    System("You are a senior software architect.").
    User(fmt.Sprintf(`Analyze this entire codebase and provide:
1. Architecture overview
2. Design patterns used
3. Potential improvements
4. Security concerns

Codebase:
%s`, codebase)).
    GetResponse(ctx)

Document Analysis

// Analyze a long document
document := loadLegalContract()  // 100K+ tokens

resp, err := client.Chat("gemini-2.5-pro").
    System("You are a legal analyst.").
    User(fmt.Sprintf(`Analyze this contract and identify:
1. Key terms and conditions
2. Potential risks
3. Unusual clauses
4. Missing protections

Contract:
%s`, document)).
    GetResponse(ctx)

Structured Output

Force JSON output with schema:

type ProductAnalysis struct {
    Name        string   `json:"name"`
    Category    string   `json:"category"`
    Features    []string `json:"features"`
    Pros        []string `json:"pros"`
    Cons        []string `json:"cons"`
    Rating      float64  `json:"rating"`
}

resp, err := client.Chat("gemini-2.5-pro").
    System("Analyze products and respond in JSON format.").
    User("Analyze the iPhone 15 Pro").
    ResponseFormat(core.ResponseFormatJSON).
    GetResponse(ctx)

var analysis ProductAnalysis
json.Unmarshal([]byte(resp.Output), &analysis)

Multi-Turn Conversations

// First turn
resp1, _ := client.Chat("gemini-2.5-pro").
    System("You are a helpful Go programming tutor.").
    User("What are channels in Go?").
    GetResponse(ctx)

// Second turn with history
resp2, _ := client.Chat("gemini-2.5-pro").
    System("You are a helpful Go programming tutor.").
    User("What are channels in Go?").
    Assistant(resp1.Output).
    User("How do buffered channels differ?").
    GetResponse(ctx)

Safety Settings

Configure content safety filters:

resp, err := client.Chat("gemini-2.5-pro").
    User(prompt).
    SafetySettings([]core.SafetySetting{
        {Category: core.HarmCategoryHarassment, Threshold: core.BlockMediumAndAbove},
        {Category: core.HarmCategoryDangerousContent, Threshold: core.BlockOnlyHigh},
    }).
    GetResponse(ctx)

Error Handling

resp, err := client.Chat("gemini-2.5-pro").User(prompt).GetResponse(ctx)
if err != nil {
    var apiErr *core.APIError
    if errors.As(err, &apiErr) {
        switch apiErr.StatusCode {
        case 400:
            log.Printf("Bad request: %s", apiErr.Message)
        case 401:
            log.Fatal("Invalid API key")
        case 403:
            log.Printf("API key doesn't have permission: %s", apiErr.Message)
        case 429:
            log.Printf("Rate limited. Retry after: %s", apiErr.RetryAfter)
        case 500, 503:
            log.Printf("Gemini service error: %s", apiErr.Message)
        default:
            log.Printf("API error %d: %s", apiErr.StatusCode, apiErr.Message)
        }
        return
    }

    if errors.Is(err, context.DeadlineExceeded) {
        log.Println("Request timed out")
    } else if errors.Is(err, context.Canceled) {
        log.Println("Request canceled")
    }
}

Best Practices

1. Choose the Right Model

Task	Recommended Model
General chat	gemini-2.5-flash
Complex reasoning	gemini-2.5-pro
Long documents	gemini-1.5-pro (2M context)
Fast responses	gemini-2.5-flash
Code analysis	gemini-2.5-pro
Explicit reasoning	gemini-2.0-flash-thinking

2. Optimize Context Usage

// Use smaller context models for simple tasks
resp, err := client.Chat("gemini-2.5-flash").  // Cheaper for short contexts
    User("What is 2+2?").
    GetResponse(ctx)

// Use large context models when needed
resp, err := client.Chat("gemini-1.5-pro").  // 2M context
    User(veryLongDocument).
    GetResponse(ctx)

3. Handle Rate Limits

client := core.NewClient(provider,
    core.WithRetryPolicy(&core.RetryPolicy{
        MaxRetries:        3,
        InitialInterval:   1 * time.Second,
        MaxInterval:       30 * time.Second,
        BackoffMultiplier: 2.0,
        RetryOn:           []int{429, 500, 503},
    }),
)

4. Use Grounding for Current Information

// For questions about recent events
resp, err := client.Chat("gemini-2.5-pro").
    User("What happened in tech news this week?").
    Grounding(core.GroundingGoogleSearch).
    GetResponse(ctx)

Notes

NewFromEnv() checks GEMINI_API_KEY first, then falls back to GOOGLE_API_KEY
Authentication uses the x-goog-api-key header
Gemini 1.5 Pro supports up to 2 million tokens of context
Video analysis processes frames and counts against the context window
The provider is safe for concurrent use after construction

Next Steps

Tools Guide

Learn advanced tool calling patterns. Tools →

Streaming Guide

Master streaming responses. Streaming →

Images Guide

Work with vision and image generation. Images →

Providers Overview

Compare all available providers. Providers →