genai

package module
v0.0.0-...-77279d1 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 21, 2025 License: Apache-2.0 Imports: 19 Imported by: 14

README

genai

The opinionated high performance professional-grade AI package for Go.

genai is intentional. Curious why it was created? See the release announcement at maruel.ca/post/genai-v0.1.0.

Go Reference codecov

Features

  • Full functionality: Full access to each backend-specific functionality. Access the raw API if needed with full message schema as Go structs.
  • Tool calling via reflection: Tell the LLM to call a tool directly, described as a Go struct. No need to manually fiddle with JSON.
  • Native JSON struct serialization: Pass a struct to tell the LLM what to generate, decode the reply into your struct. No need to manually fiddle with JSON. Supports required fields, enums, descriptions, etc. You can still fiddle if you want to. :)
  • Streaming: Streams completion reply as the output is being generated, including thinking and tool calling, via go 1.23 iterators.
  • Multi-modal: Process images, PDFs and videos (!) as input or output.
  • Web Search: Search the web to answer your question and cite documents passed in.
  • Smoke testing friendly: record and play back API calls at HTTP level to save 💰 and keep tests fast and reproducible, via the exposed HTTP transport. See example.
  • Rate limits and usage: Parse the provider-specific HTTP headers and JSON response to get the tokens usage and remaining quota.
  • Provide access to HTTP headers to enable beta features.

Design

  • Safe and strict API implementation. All you love from a statically typed language. The library's smoke tests immediately fail on unknown RPC fields. Error code paths are properly implemented.
  • Stateless. No global state, it is safe to use clients concurrently.
  • Professional grade. smoke tested on live services with recorded traces located in testdata/ directories, e.g. providers/anthropic/testdata/TestClient/Scoreboard/.
  • Trust, but verify. It generates a scoreboard based on actual behavior from each provider.
  • Optimized for speed. Minimize memory allocations, compress data at the transport layer when possible. Groq, Mistral and OpenAI use brotli for HTTP compression instead of gzip, and POST's body to Google are gzip compressed.
  • Lean: Few dependencies. No unnecessary abstraction layer.

Scoreboard

Provider 🌐 Mode ➛In Out➛ Tool JSON Batch File Cite Text Probs Limits Usage Finish
anthropic 🇺🇸 Sync, Stream🧠 💬📄📸 💬 ✅🪨🕸️ 🛑📏
bfl 🇩🇪 Sync 💬 📸 🌱
cerebras 🇺🇸 Sync, Stream🧠 💬 💬 ✅🪨 🌱📏🛑
cloudflare 🇺🇸 Sync, Stream🧠 💬 💬 💨 🌱📏 💨
cohere 🇨🇦 Sync, Stream🧠 💬📸 💬 ✅🪨 🌱📏🛑
deepseek 🇨🇳 Sync, Stream🧠 💬 💬 ✅🪨 ☁️ 📏🛑
gemini 🇺🇸 Sync, Stream🧠 🎤🎥💬📄📸 💬📸 ✅🪨🕸️ 🌱📏🛑
groq 🇺🇸 Sync, Stream🧠 💬📸 💬 ✅🪨🕸️ ☁️ 🌱📏🛑
huggingface 🇺🇸 Sync, Stream🧠 💬 💬 ☁️ 🌱📏🛑
llamacpp 🏠 Sync, Stream 💬📸 💬 ✅🪨 🌱📏🛑
mistral 🇫🇷 Sync, Stream 🎤💬📄📸 💬 ✅🪨 🌱📏🛑
ollama 🏠 Sync, Stream🧠 💬📸 💬 💨 🌱📏🛑
openaichat 🇺🇸 Sync, Stream🧠 🎤💬📄📸 💬📸 ✅🪨🕸️ 🌱📏🛑
openairesponses 🇺🇸 Sync, Stream🧠 💬📄📸 💬📸 ✅🪨🕸️ 🌱
perplexity 🇺🇸 Sync, Stream🧠 💬📸 💬 🕸️ 📐 📏
pollinations 🇩🇪 Sync, Stream 💬📸 💬📸 ✅🪨 ☁️ 🌱
togetherai 🇺🇸 Sync, Stream🧠 🎥💬📸 💬📸 ✅🪨 🌱📏🛑
openaicompatible N/A Sync, Stream 💬 💬 📏🛑
‼️ Click here for the legend of columns and symbols
  • 🏠: Runs locally.
  • Sync: Runs synchronously, the reply is only returned once completely generated
  • Stream: Streams the reply as it is generated. Occasionally less features are supported in this mode
  • 🧠: Has chain-of-thought thinking process
    • Both redacted (Anthropic, Gemini, OpenAI) and explicit (Deepseek R1, Qwen3, etc)
    • Many models can be used in both mode. In this case they will have two rows, one with thinking and one without. It is frequent that certain functionalities are limited in thinking mode, like tool calling.
  • ✅: Implemented and works great
  • ❌: Not supported by genai. The provider may support it, but genai does not (yet). Please send a PR to add it!
  • 💬: Text
  • 📄: PDF: process a PDF as input, possibly with OCR
  • 📸: Image: process an image as input; most providers support PNG, JPG, WEBP and non-animated GIF, or generate images
  • 🎤: Audio: process an audio file (e.g. MP3, WAV, Flac, Opus) as input, or generate audio
  • 🎥: Video: process a video (e.g. MP4) as input, or generate a video (e.g. Veo 3)
  • 💨: Feature is flaky (Tool calling) or inconsistent (Usage is not always reported)
  • 🌐: Country where the company is located
  • Tool: Tool calling, using genai.ToolDef; best is ✅🪨🕸️ - 🪨: Tool calling can be forced; aka you can force the model to call a tool. This is great. - 🕸️: Web search
  • JSON: ability to output JSON in free form, or with a forced schema specified as a Go struct
    • ✅: Supports both free form and with a schema
    • ☁️ :Supports only free form
      • 📐: Supports only a schema
  • Batch: Process asynchronously batches during off peak hours at a discounts
  • Text: Text features
    • '🌱': Seed option for deterministic output
    • '📏': MaxTokens option to cap the amount of returned tokens
    • '🛑': Stop sequence to stop generation when a token is generated
  • File: Upload and store large files via a separate API
  • Cite: Citation generation from a provided document, specially useful for RAG
  • Probs: Return logprobs to analyse each token probabilities
  • Limits: Returns the rate limits, including the remaining quota

Examples

The following examples intentionally use a variety of providers to show the extent at which you can pick and chose.

Text Basic ✅

examples/txt_to_txt_sync/main.go: This selects a good default model based on Anthropic's currently published models, sends a prompt and prints the response as a string. 💡 Set ANTHROPIC_API_KEY.

func main() {
	ctx := context.Background()
	c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
	msgs := genai.Messages{
		genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice. Answer succinctly."),
	}
	result, err := c.GenSync(ctx, msgs)
	fmt.Println(result.String())
}

This may print:

"Follow your passion and the money will follow."

This ignores market realities, financial responsibilities, and the fact that passion alone doesn't guarantee income or career viability.

Multiple Text Completions

examples/txt_to_txt_sync_multi/main.go: This shows how to do multiple message round trips adding additional follow-up messages from users. Set OPENAI_API_KEY.

func main() {
	ctx := context.Background()
	c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
	msgs := genai.Messages{
		genai.NewTextMessage("Let's play a word association game. You pick a single word, then I pick the first word I think of, then you respond with a word, and so on.")
	}
	result, err := c.GenSync(ctx, msgs)
    if err != nil {
        panic(err)
    }
    // Show the message from ChatGPT
	fmt.Println(result.String())
    // Save the message in the collection of messages to build up context
    msgs = append(msgs, result.Message)
    // Add another user message
    msgs = append(msgs, genai.NewTextMessage("nightwish"))
    // Get another completion
    result, err := c.GenSync(ctx, msgs)
    // ...and so on.
}
Text Streaming 🏎

examples/txt_to_txt_stream/main.go: This is the same example as above, with the output streamed as it replies. This leverages go 1.23 iterators. Notice how little difference there is between both.

func main() {
	ctx := context.Background()
	c, err := anthropic.New(ctx, &genai.ProviderOptions{}, nil)
	msgs := genai.Messages{
		genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice."),
	}
	fragments, finish := c.GenStream(ctx, msgs)
	for f := range fragments {
		os.Stdout.WriteString(f.Text)
	}
	_, err = finish()
}
Text Thinking 🧠

examples/txt_to_txt_thinking/main.go: genai supports for implicit reasoning (e.g. Anthropic) and explicit reasoning (e.g. Deepseek). The package adapters provide logic to automatically handle explicit Chain-of-Thoughts models, generally using <think> and </think> tokens. 💡 Set DEEPSEEK_API_KEY.

Snippet:

	c, _ := deepseek.New(ctx, &genai.ProviderOptions{Model: "deepseek-reasoner"}, nil)
	msgs := genai.Messages{
		genai.NewTextMessage("Give me a life advice that sounds good but is a bad idea in practice."),
	}
	fragments, finish := c.GenStream(ctx, msgs)
	for f := range fragments {
		if f.Reasoning != "" {
			// ...
		} else if f.Text != "" {
			// ...
		}
	}
Text Citations ✍

examples/txt_to_txt_citations/main.go: Send entire documents and leverage providers which support automatic citations (Cohere, Anthropic) to leverage their functionality for a supercharged RAG. 💡 Set COHERE_API_KEY.

Snippet:

	const context = `...` // Introduction of On the Origin of Species by Charles Darwin...
	msgs := genai.Messages{{
		Requests: []genai.Request{
			{
				Doc: genai.Doc{
					Filename: "On-the-Origin-of-Species-by-Charles-Darwin.txt",
					Src:      strings.NewReader(context),
				},
			},
			{Text: "When did Darwin arrive home?"},
		},
	}}
	res, _ := c.GenSync(ctx, msgs)
	for _, r := range res.Replies {
		if !r.Citation.IsZero() {
			fmt.Printf("Citation:\n")
			for _, src := range r.Citation.Sources {
				fmt.Printf("- %q\n", src.Snippet)
			}
		}
	}
	fmt.Printf("\nAnswer: %s\n", res.String())

When asked When did Darwin arrive home? with the introduction of On the Origin of Species by Charles Darwin passed in as a document, this may print:

Citation:

  • "excerpt from Charles Darwin's work 'On the Origin of Species'"
  • "returned home in 1837."

Answer: 1837 was when Darwin returned home and began to reflect on the facts he had gathered during his time on H.M.S. Beagle.

Text Websearch 🕸️

examples/txt_to_txt_websearch-sync/main.go: Searches the web to answer your question. 💡 Set PERPLEXITY_API_KEY.

Snippet:

	c, _ := perplexity.New(ctx, &genai.ProviderOptions{Model: genai.ModelCheap}, nil)
	msgs := genai.Messages{{
		Requests: []genai.Request{
			{Text: "Who holds ultimate power of Canada? Answer succinctly."},
		},
	}}

	// perplexity has websearch enabled by default so this is a no-op.
	//  It is needed to enable websearch for anthropic, gemini and openai.
	opts := genai.OptionsTools{WebSearch: true}
	res, _ := c.GenSync(ctx, msgs, &opts)
	for _, r := range res.Replies {
		if !r.Citation.IsZero() {
			fmt.Printf("Sources:\n")
			for _, src := range r.Citation.Sources {
				switch src.Type {
				case genai.CitationWeb:
					fmt.Printf("- %s / %s\n", src.Title, src.URL)
				case genai.CitationWebImage:
					fmt.Printf("- image: %s\n", src.URL)
				}
			}
		}
	}
	fmt.Printf("\nAnswer: %s\n", res.String())

Try it locally:

go run github.com/maruel/genai/examples/txt_to_txt_websearch-sync@latest

When asked Who holds ultimate power of Canada?, this may print:

Sources:

(...)

Image: https://learn.parl.ca/understanding-comprendre/images/articles/monarch-and-governor-general/house-of-commons.jpg

(...)

Answer: The ultimate power in Canada constitutionally resides with the monarch (King Charles III) as the head of state, with executive authority formally vested in him. However, (...)

Text Websearch (streaming) 🔍️

examples/txt_to_txt_websearch-stream/main.go: Searches the web to answer your question and streams the output to the console. 💡 Set PERPLEXITY_API_KEY.

go run github.com/maruel/genai/examples/txt_to_txt_websearch-stream@latest

Same as above, but streaming.

Log probabilities

examples/txt_to_txt_logprobs/main.go: List the alternative tokens that were considered during generation. This helps tune Temperature, TopP or TopK.

Try it locally:

go run github.com/maruel/genai/examples/txt_to_txt_logprobs@latest

When asked Tell a joke, this may print:

Provider huggingface
  Reply:
    Why don't scientists trust atoms?

    Because they make up everything!
  Logprobs:
    *    -0.000082: "Why"
         -9.625082: "Here"
        -11.250082: "What"
        -13.875082: "A"
        -14.500082: "How"
    *    -0.000003: " don"
        -14.125003: " do"
        -14.625003: " did"
        -14.625003: " dont"
        -14.875003: " didn"
    *    -0.000001: "'t"
        -14.000001: "’t"
        -18.062500: "'"
        -18.875000: "'T"
        -19.812500: "'s"
    *    -0.000002: " scientists"
        -14.250002: " Scientists"
        -14.250002: " eggs"
        -15.125002: " skeletons"
        -16.125002: " programmers"
    *    -0.000000: " trust"
        -16.250000: " trusts"
        -16.250000: " Trust"
        -17.250000: " like"
        -18.000000: " trusted"
    *    -0.000006: " atoms"
        -13.250006: "atoms"
        -13.500006: " stairs"
        -14.625006: " their"
        -15.000006: " electrons"
    *    -0.000011: "?\n\n"
        -12.125011: "?\n"
        -12.125011: "?"
        -14.750011: "?\n\n"
        -16.500011: " anymore"
(...)
Text Tools 🧰

examples/txt_to_txt_tool-sync/main.go: A LLM can both retrieve information and act on its environment through tool calling. This unblocks a whole realm of possibilities. Our design enables dense strongly typed code that favorably compares to python. 💡 Set CEREBRAS_API_KEY.

Snippet:

	type numbers struct {
		A int `json:"a"`
		B int `json:"b"`
	}
	msgs := genai.Messages{
		genai.NewTextMessage("What is 3214 + 5632? Call the tool \"add\" to tell me the answer. Do not explain. Be terse. Include only the answer."),
	}
	opts := genai.OptionsTools{
		Tools: []genai.ToolDef{
			{
				Name:        "add",
				Description: "Add two numbers together and provides the result",
				Callback: func(ctx context.Context, input *numbers) (string, error) {
					return fmt.Sprintf("%d", input.A+input.B), nil
				},
			},
		},
		// Force the LLM to do a tool call.
		Force: genai.ToolCallRequired,
	}

	// Run the loop.
	res, _, _ := adapters.GenSyncWithToolCallLoop(ctx, c, msgs, &opts)
	// Print the answer which is the last message generated.
	fmt.Println(res[len(res)-1].String())

When asked What is 3214 + 5632?, this may print:

8846

Text Tools (streaming) 🐝

examples/txt_to_txt_tool-stream/main.go: Leverage a thinking model to see the thinking process while trying to use tool calls to answer the user's question. This enables keeping the user updated to see the progress. 💡 Set GROQ_API_KEY.

Snippet:

	fragments, finish := adapters.GenStreamWithToolCallLoop(ctx, p, msgs, &opts)
	for f := range fragments {
		if f.Reasoning != "" {
			// ...
		} else if f.Text != "" {
			// ...
		} else if !f.ToolCall.IsZero() {
			// ...
		}
	}

``

When asked What is 3214 + 5632?, this may print:

# Reasoning

User wants result of 3214+5632 using tool "add". Must be terse, only answer, no explanation. Need to call add function with a=3214, b=5632.

# Tool call

{fc_e9b9677b-898c-46df-9deb-39122bd6c69a add {"a":3214,"b":5632} map[] {}}

# Answer

8846

Text Tools (manual)

examples/txt_to_txt_tool-manual/main.go: Runs a manual loop and runs tool calls directly. 💡 Set CEREBRAS_API_KEY.

Snippet:

	res, _ := c.GenSync(ctx, msgs, &opts)
	// Add the assistant's message to the messages list.
	msgs = append(msgs, res.Message)
	// Process the tool call from the assistant.
	msg, _ := res.DoToolCalls(ctx, opts.Tools)
	// Add the tool call response to the messages list.
	msgs = append(msgs, msg)
	// Follow up so the LLM can interpret the tool call response.
	res, _ = c.GenSync(ctx, msgs, &opts)
Text Decode reply as a struct ⚙

examples/txt_to_txt_decode-json/main.go: Tell the LLM to use a specific Go struct to determine the JSON schema to generate the response. This is much more lightweight than tool calling!

It is very useful when we want the LLM to make a choice between values, to return a number or a boolean (true/false). Enums are supported. 💡 Set OPENAI_API_KEY.

Snippet:

	msgs := genai.Messages{
		genai.NewTextMessage("Is a circle round? Reply as JSON."),
	}
	var circle struct {
		Round bool `json:"round"`
	}
	opts := genai.OptionsText{DecodeAs: &circle}
	res, _ := c.GenSync(ctx, msgs, &opts)
	res.Decode(&circle)
	fmt.Printf("Round: %v\n", circle.Round)

This will print:

Round: true

Text to Image 📸

examples/txt_to_img/main.go: Use Together.AI's free (!) image generation albeit with low rate limit.

Some providers return an URL that must be fetched manually within a few minutes or hours, some return the data inline. This example handles both cases. 💡 Set TOGETHER_API_KEY.

Snippet:

	msgs := genai.Messages{
		genai.NewTextMessage("Carton drawing of a husky playing on the beach."),
	}
	result, _ := c.GenSync(ctx, msgs)
	for _, r := range result.Replies {
		if r.Doc.IsZero() {
			continue
		}
		// The image can be returned as an URL or inline, depending on the provider.
		var src io.Reader
		if r.Doc.URL != "" {
			req, _ := c.HTTPClient().Get(r.Doc.URL)
			src = req.Body
			defer req.Body.Close()
		} else {
			src = r.Doc.Src
		}
		b, _ := io.ReadAll(src)
		os.WriteFile(r.Doc.GetFilename(), b, 0o644)
	}

Try it locally:

go run github.com/maruel/genai/examples/txt_to_img@latest

This may generate:

content.jpg

This generated picture shows a fake signature. I decided to keep this example as a reminder that the result comes from the data harvested that was created by real humans.

Image-Text to Video 🎥

examples/img-txt_to_vid/main.go: Leverage the content.jpg file generated in txt_to_img example to ask Veo 3 from Google to generate a video based on the image. 💡 Set GEMINI_API_KEY.

Snippet:

	// Warning: this is expensive.
	c, _ := gemini.New(ctx, &genai.ProviderOptions{Model: "veo-3.0-fast-generate-preview"}, nil)
	f, _ := os.Open("content.jpg")
	defer f.Close()
	msgs := genai.Messages{
		genai.Message{Requests: []genai.Request{
			{Text: "Carton drawing of a husky playing on the beach."},
			{Doc: genai.Doc{Src: f}},
		}},
	}
	res, _ := c.GenSync(ctx, msgs)
	// Save the file in Replies like in the previous example ...

Try it locally:

go run github.com/maruel/genai/examples/img-txt_to_vid@latest

This may generate:

content.avif

⚠ The MP4 has been recompressed to AVIF via compress.sh so GitHub can render it. The drawback is that audio is lost. View the original MP4 with audio (!) at content.mp4. May not work on Safari.

This is very impressive, but also very expensive.

Image-Text to Image 🖌

examples/img-txt_to_img/main.go: Edit an image with a prompt. Leverage the content.jpg file generated in txt_to_img example. 💡 Set BFL_API_KEY.

go run github.com/maruel/genai/examples/img-txt_to_img@latest

This may generate:

content2.jpg

Image-Text to Image-Text 🍌

examples/img-txt_to_img-txt/main.go: Leverage the content.jpg file generated in txt_to_img example to ask gemini-2.5-flash-image-preview to change the image with a prompt and ask the model to explain what it did. 💡 Set GEMINI_API_KEY.

Snippet:

	// Warning: This is a bit expensive.
	opts := genai.ProviderOptions{
		Model:            "gemini-2.5-flash-image-preview",
		OutputModalities: genai.Modalities{genai.ModalityImage, genai.ModalityText},
	}
	c, _ := gemini.New(ctx, &opts, nil)
	// ...
	res, _ := c.GenSync(ctx, msgs, &gemini.Options{ReasoningBudget: 0})

Try it locally:

go run github.com/maruel/genai/examples/img-txt_to_img-txt@latest

This may generate:

Of course! Here's an updated image with more animals. I added a playful dolphin jumping out of the water and a flock of seagulls flying overhead. I chose these animals to enhance the beach scene and create a more dynamic and lively atmosphere.

Wrote: content.png

content.png

This is quite impressive, but also quite expensive.

Image-Text to Text 👁

examples/img-txt_to_txt/main.go: Run vision to analyze a picture provided as an URL (source: wikipedia). The response is streamed out the console as the reply is generated. 💡 Set MISTRAL_API_KEY.

go run github.com/maruel/genai/examples/img-txt_to_txt@latest

This may generate:

The image depicts a single ripe banana. It has a bright yellow peel with a few small brown spots, indicating ripeness. The banana is curved, which is typical of its natural shape, and it has a stem at the top. The overall appearance suggests that it is ready to be eaten.

Image-Text to Text (local) 🏠

examples/img-txt_to_txt_local/main.go: is very similar to the previous example!

Use cmd/llama-serve to run a LLM locally, including tool calling and vision!

Start llama-server locally either by yourself or with this utility:

go run github.com/maruel/genai/cmd/llama-serve@latest \
  -model ggml-org/gemma-3-4b-it-GGUF/gemma-3-4b-it-Q8_0.gguf#mmproj-model-f16.gguf -- \
  --temp 1.0 --top-p 0.95 --top-k 64 \
  --jinja -fa -c 0 --no-warmup

Run vision 100% locally on CPU with only 8GB of RAM. No GPU required!

go run github.com/maruel/genai/examples/img-txt_to_txt_local@latest
Video-Text to Text 🎞️

examples/vid-txt_to_txt/main.go: Run vision to analyze a video. 💡 Set GEMINI_API_KEY.

Using this video:

video.avif

Try it locally:

go run github.com/maruel/genai/examples/vid-txt_to_txt@latest

When asked What is the word, this generates:

Banana

Audio-Text to Text 🎤

examples/aud-txt_to_txt/main.go: Analyze an audio file. 💡 Set OPENAI_API_KEY.

Try it locally:

go run github.com/maruel/genai/examples/vid-txt_to_txt@latest

When asked What was the word?, this generates:

The word was "orange."

Usage and Quota 🍟🧀🥣

examples/txt_to_txt_quota/main.go: Prints the tokens processed and generated for the request and the remaining quota if the provider supports it. 💡 Set GROQ_API_KEY.

Snippet:

	msgs := genai.Messages{
		genai.NewTextMessage("Describe poutine as a French person who just arrived in Québec"),
	}
	res, _ := c.GenSync(ctx, msgs)
	fmt.Println(res.String())
	fmt.Printf("\nTokens usage: %s\n", res.Usage.String())

This may generate:

« Je viens tout juste d’arriver au Québec et, pour être honnête, je n’avais jamais entendu parler du fameux « poutine » avant de mettre le pied dans un petit resto du coin. »

(...)

Tokens usage: in: 83 (cached 0), reasoning: 0, out: 818, total: 901, requests/2025-08-29 15:58:13: 499999/500000, tokens/2025-08-29 15:58:12: 249916/250000

In addition to the token usage, remaining quota is printed.

Text with any provider ⁉

examples/txt_to_txt_any/main.go: Let the user chose the provider by name.

The relevant environment variable (e.g. ANTHROPIC_API_KEY, OPENAI_API_KEY, etc) is used automatically for authentication.

Automatically selects a models on behalf of the user. Wraps the explicit thinking tokens if needed.

Supports ollama and llama-server even if they run on a remote host or non-default port.

Snippet:

	names := strings.Join(slices.Sorted(maps.Keys(providers.Available(ctx))), ", ")
	provider := flag.String("provider", "", "provider to use, "+names)
	flag.Parse()

	cfg := providers.All[*provider]
	c, _ := cfg.Factory(ctx, &genai.ProviderOptions{}, nil)
	p := adapters.WrapReasoning(c)
	res, _ := p.GenSync(...)

Try it locally:

go run github.com/maruel/genai/examples/txt_to_txt_any@latest \
    -provider cerebras \
    "Tell a good sounding advice that is a bad idea in practice."

Models 🗒

Snapshot of all the supported models at docs/MODELS.md is updated weekly.

Try it locally:

go install github.com/maruel/genai/cmd/...@latest

list-models -provider huggingface

Providers with free tier 💸

As of August 2025, the following services offer a free tier (other limits apply):

TODO

PRs are appreciated for any of the following. No need to ask! Just send a PR and make it pass CI checks. ❤️

Features
Providers

I'd be delighted if you want to contribute any missing provider being added, I'm particularly looking forward to these:

I'm also looking to disconnect more the scoreboard from the Go code. I believe the scoreboard is useful in itself and is not Go specific. I appreciate ideas towards achieving this, send them my way!

Thanks in advance! 🙏

Made with ❤️ by Marc-Antoine Ruel

Documentation

Overview

Package genai is the opiniated high performance professional-grade AI package for Go.

It provides a generic interface to interact with various LLM providers, while allowing full access to each provider's full capabilities.

Check out the examples for a quick start.

Example (GenSyncWithToolCallLoop_with_custom_HTTP_Header)
package main

import (
	"context"
	"fmt"
	"log"
	"net/http"
	"time"

	"github.com/maruel/genai"
	"github.com/maruel/genai/adapters"
	"github.com/maruel/genai/providers/anthropic"
	"github.com/maruel/roundtrippers"
)

func main() {
	// Modified version of the example in package adapters, with a custom header.
	//
	// As of June 2025, interleaved thinking can be enabled with a custom header.
	// https://docs.anthropic.com/en/docs/build-with-claude/extended-thinking#interleaved-thinking
	wrapper := func(h http.RoundTripper) http.RoundTripper {
		return &roundtrippers.Header{
			Transport: h,
			Header:    http.Header{"anthropic-beta": []string{"interleaved-thinking-2025-05-14"}},
		}
	}
	ctx := context.Background()
	c, err := anthropic.New(ctx, &genai.ProviderOptions{Model: "claude-sonnet-4-20250514"}, wrapper)
	if err != nil {
		log.Fatal(err)
	}
	msgs := genai.Messages{genai.NewTextMessage("What season is Montréal currently in?")}
	opts := genai.OptionsTools{
		Tools: []genai.ToolDef{locationClockTime},
		// Force the LLM to do a tool call first.
		Force: genai.ToolCallRequired,
	}
	newMsgs, _, err := adapters.GenSyncWithToolCallLoop(ctx, c, msgs, &opts)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("%s\n", newMsgs[len(newMsgs)-1].String())
}

var locationClockTime = genai.ToolDef{
	Name:        "get_today_date_current_clock_time",
	Description: "Get the current clock time and today's date.",
	Callback: func(ctx context.Context, e *location) (string, error) {
		if e.Location != "Montréal" {
			return "ask again with Montréal", nil
		}
		return time.Now().Format("Monday 2006-01-02 15:04:05"), nil
	},
}

type location struct {
	Location string `json:"location" json_description:"Location to ask the current time in"`
}

Index

Examples

Constants

View Source
const (
	// ModelNone explicitly tells the provider to not automatically select a model. The use case is when the
	// only intended call is ListModel(), thus there's no point into selecting a model automatically.
	ModelNone = "NONE"
	// ModelCheap requests the provider to automatically select the cheapest model it can find.
	ModelCheap = "CHEAP"
	// ModelGood requests the provider to automatically select a good every day model that has a good
	// performance/cost trade-off.
	ModelGood = "GOOD"
	// ModelSOTA requests the provider to automatically select the best state-of-the-art model
	// it can find.
	ModelSOTA = "SOTA"
)

Model markers to pass to ProviderOptions.Model.

View Source
const (
	// ModalityAudio is support for audio formats like MP3, WAV, Opus, Flac, etc.
	ModalityAudio = scoreboard.ModalityAudio
	// ModalityDocument is support for PDF with multi-modal comprehension, both images and text. This includes
	// code blocks.
	ModalityDocument = scoreboard.ModalityDocument
	// ModalityImage is support for image formats like PNG, JPEG, often single frame GIF, and WEBP.
	ModalityImage = scoreboard.ModalityImage
	// ModalityText is for raw text.
	ModalityText = scoreboard.ModalityText
	// ModalityVideo is support for video formats like MP4 or MKV.
	ModalityVideo = scoreboard.ModalityVideo
)

Variables

This section is empty.

Functions

This section is empty.

Types

type CacheEntry

type CacheEntry interface {
	GetID() string
	GetDisplayName() string
	GetExpiry() time.Time
}

CacheEntry is one file (or GenSync request) cached on the provider for reuse.

type Citation

type Citation struct {
	// CitedText is the text that was cited.
	CitedText string `json:"cited_text,omitzero"`
	// StartIndex is the starting character position of the citation in the answer (0-based).
	StartIndex int64 `json:"start_index,omitzero"`
	// EndIndex is the ending character position of the citation in the answer (0-based, exclusive).
	EndIndex int64 `json:"end_index,omitzero"`

	// Sources contains information about the source documents or tools that support this citation.
	Sources []CitationSource `json:"sources,omitzero"`
	// contains filtered or unexported fields
}

Citation represents a reference to source material that supports content. It provides a unified interface for different provider citation formats.

Normally one of CitedText or StartIndex/EndIndex is set.

func (*Citation) IsZero

func (c *Citation) IsZero() bool

func (*Citation) Validate

func (c *Citation) Validate() error

Validate ensures the citation is valid.

type CitationSource

type CitationSource struct {
	// Type indicates the source type.
	Type CitationType `json:"type,omitzero"`
	// ID is a unique identifier for the source (e.g., document ID, tool call ID).
	ID string `json:"id,omitzero"`
	// Title is the human-readable title of the source.
	Title string `json:"title,omitzero"`
	// URL is the web URL for the source, if applicable.
	URL string `json:"url,omitzero"`
	// Snippet is a snippet from the source, if applicable. It is the web search query for CitationWebQuery.
	Snippet string `json:"snippet,omitzero"`
	// StartCharIndex is the starting character position of the citation in the sourced document (0-based).
	StartCharIndex int64 `json:"start_index,omitzero"`
	// EndCharIndex is the ending character position of the citation in the the sourced document (0-based, exclusive).
	EndCharIndex int64 `json:"end_index,omitzero"`
	// StartPageNumber is the starting page number of the citation in the sourced document (1-based).
	StartPageNumber int64 `json:"start_page_number,omitzero"`
	EndPageNumber   int64 `json:"end_page_number,omitzero"`
	// StartBlockIndex is the starting block index of the citation in the sourced document (0-based).
	StartBlockIndex int64 `json:"start_block_index,omitzero"`
	EndBlockIndex   int64 `json:"end_block_index,omitzero"`

	// Date is the date of the source, if applicable.
	Date string `json:"date,omitzero"`
	// Metadata contains additional source-specific information.
	// For document sources: document index, page numbers, etc.
	// For tool sources: tool output, function name, etc.
	// For web sources: encrypted index, search result info, etc.
	Metadata map[string]any `json:"metadata,omitzero"`
	// contains filtered or unexported fields
}

CitationSource represents a source that supports a citation.

func (*CitationSource) IsZero

func (cs *CitationSource) IsZero() bool

func (*CitationSource) Validate

func (cs *CitationSource) Validate() error

Validate ensures the citation source is valid.

type CitationType

type CitationType int32

CitationType is a citation that a model returned as part of its reply.

const (
	// CitationWebQuery is an query used as part of a web search.
	CitationWebQuery CitationType = iota + 1
	// CitationWeb is an URL from a web search.
	CitationWeb
	// CitationWebImage is an URL to an image from a web search.
	CitationWebImage
	// CitationDocument is from a document provided as input or explicitly referenced.
	CitationDocument
	// CitationTool is when the provider refers to the result of a tool call in its answer.
	CitationTool
)

type Doc

type Doc struct {
	// Filename is the name of the file. For many providers, only the extension
	// is relevant. They only use mime-type, which is derived from the filename's
	// extension. When an URL is provided or when the object provided to Document
	// implements a method with the signature `Name() string`, like an
	// `*os.File`, Filename is optional.
	Filename string `json:"filename,omitzero"`
	// Src is raw document data. It is perfectly fine to use a bytes.NewReader() or *os.File.
	Src io.ReadSeeker `json:"bytes,omitzero"`
	// URL is the reference to the raw data. When set, the mime-type is derived from the URL.
	URL string `json:"url,omitzero"`
	// contains filtered or unexported fields
}

Doc is a document.

func (*Doc) GetFilename

func (d *Doc) GetFilename() string

GetFilename returns the filename to use for the document, querying the Document's name if available.

func (*Doc) IsZero

func (d *Doc) IsZero() bool

func (*Doc) MarshalJSON

func (d *Doc) MarshalJSON() ([]byte, error)

func (*Doc) Read

func (d *Doc) Read(maxSize int64) (string, []byte, error)

Read reads the document content into memory.

It returns the mime type, the raw bytes and an error if any.

func (*Doc) UnmarshalJSON

func (d *Doc) UnmarshalJSON(b []byte) error

func (*Doc) Validate

func (d *Doc) Validate() error

Validate ensures the block is valid.

type FinishReason

type FinishReason string

FinishReason is the reason why the model stopped generating tokens.

It can be one of the well known below or a custom value.

const (
	// FinishedStop means the LLM was done for the turn. Some providers confuse it with
	// FinishedStopSequence.
	FinishedStop FinishReason = "stop"
	// FinishedLength means the model reached the maximum number of tokens allowed as set in
	// OptionsText.MaxTokens or as limited by the provider.
	FinishedLength FinishReason = "length"
	// FinishedToolCalls means the model called one or multiple tools and needs the replies to continue the turn.
	FinishedToolCalls FinishReason = "tool_calls"
	// FinishedStopSequence means the model stopped because it saw a stop word as listed in OptionsText.Stop.
	FinishedStopSequence FinishReason = "stop"
	// FinishedContentFilter means the model stopped because the reply got caught by a content filter.
	FinishedContentFilter FinishReason = "content_filter"
	// Pending means that it's not finished yet. For use with ProviderGenAsync.
	Pending FinishReason = "pending"
)

type Job

type Job string

Job is a pending job.

type Logprob

type Logprob struct {
	ID      int64   `json:"id,omitempty"`   // Input token ID.
	Text    string  `json:"text,omitempty"` // Text in UTF-8.
	Logprob float64 `json:"logprob"`        // Log probability of the token. It should normally be non-zero but sometimes it is.
}

Logprob represents a single log probability information for a token.

One of ID or Text must be set.

func (*Logprob) GoString

func (l *Logprob) GoString() string

GoString returns a JSON representation of the reply for debugging purposes.

type Message

type Message struct {
	// Requests is the content from the user.
	//
	// It is normally a single message. It is more frequently multiple items when using multi-modal content.
	Requests []Request `json:"request,omitzero"`
	// User must only be used when sent by the user. Only some provider (e.g. OpenAI, Groq, DeepSeek) support it.
	User string `json:"user,omitzero"`

	// Replies is the message from the LLM.
	//
	// It is generally a single reply with text or a tool call. Some models can emit multiple content blocks,
	// either multi modal or multiple text blocks: a code block and a different block with an explanantion. Some
	// models can emit multiple tool calls at once.
	Replies []Reply `json:"reply,omitzero"`

	// ToolCallResult is the result for a tool call that the LLM requested to make.
	//
	// These messages are generated by the "computer".
	ToolCallResults []ToolCallResult `json:"tool_call_results,omitzero"`
	// contains filtered or unexported fields
}

Message is a part of an exchange with a LLM.

It is effectively a union, with the exception of the User field that can be set with In.

func NewTextMessage

func NewTextMessage(text string) Message

NewTextMessage is a shorthand function to create a Message with a single text block.

func (*Message) Accumulate

func (m *Message) Accumulate(mf Reply) error

Accumulate adds a Reply to the message being streamed.

It is used by GenStream. There's generally no need to call it by end users.

func (*Message) Decode

func (m *Message) Decode(x any) error

Decode decodes the JSON message into the struct.

Requires using either ReplyAsJSON or DecodeAs in the OptionsText.

Note: this doesn't verify the type is the same as specified in OptionsText.DecodeAs.

func (*Message) DoToolCalls

func (m *Message) DoToolCalls(ctx context.Context, tools []ToolDef) (Message, error)

DoToolCalls processes all the ToolCall in the Reply if any.

Returns a Message to be added back to the list of messages, only if msg.IsZero() is true.

func (*Message) GoString

func (m *Message) GoString() string

GoString returns a JSON representation of the reply for debugging purposes.

func (*Message) IsZero

func (m *Message) IsZero() bool

func (*Message) Reasoning

func (m *Message) Reasoning() string

Reasoning returns all the reasoning concatenated, if any.

func (*Message) Role

func (m *Message) Role() string

Role returns one of "user", "assistant" or "computer".

func (*Message) String

func (m *Message) String() string

String is a short hand to get the request or reply content as text.

It ignores reasoning or multi-modal content.

func (*Message) UnmarshalJSON

func (m *Message) UnmarshalJSON(b []byte) error

UnmarshalJSON adds validation during decoding.

func (*Message) Validate

func (m *Message) Validate() error

Validate ensures the message is valid.

type Messages

type Messages []Message

Messages is a list of valid messages in an exchange with a LLM.

The messages should be alternating between user input, assistant replies, tool call requests and computer tool call results. The exception in the case of multi-user discussion, with different Users.

func (Messages) Validate

func (m Messages) Validate() error

Validate ensures the messages are valid.

type Modalities

type Modalities []scoreboard.Modality

Modalities represents the modality supported by the provider in a specific scenario. It can be multiple modalities in multi-modals scenarios.

func (Modalities) String

func (m Modalities) String() string

func (Modalities) Validate

func (m Modalities) Validate() error

type Modality

type Modality = scoreboard.Modality

Modality is a modality supported by the provider.

It is aliased from scoreboard.Modality to avoid a circular dependency.

type Model

type Model interface {
	GetID() string
	String() string
	// Context returns the number of tokens the model can process as input.
	Context() int64
}

Model represents a served model by the provider.

Use Provider.ListModels() to get a list of models.

type Options

type Options interface {
	// Validate ensures the options object is valid.
	Validate() error
}

Options is options that can be provided to a Provider interface.

type OptionsAudio

type OptionsAudio struct {
	// Seed for the random number generator. Default is 0 which means
	// non-deterministic.
	Seed int64
	// contains filtered or unexported fields
}

func (*OptionsAudio) Validate

func (o *OptionsAudio) Validate() error

type OptionsImage

type OptionsImage struct {
	// Seed for the random number generator. Default is 0 which means
	// non-deterministic.
	Seed   int64
	Width  int
	Height int

	// PollInterval is the time interval to poll the image generation progress when using GenSync.
	PollInterval time.Duration
	// contains filtered or unexported fields
}

OptionsImage is a list of frequent options supported by most ProviderDoc. Each provider is free to support more options through a specialized struct.

func (*OptionsImage) Validate

func (o *OptionsImage) Validate() error

Validate ensures the completion options are valid.

type OptionsText

type OptionsText struct {
	// Temperature adjust the creativity of the sampling. Generally between 0 and 2.
	Temperature float64
	// TopP adjusts correctness sampling between 0 and 1. The higher the more diverse the output.
	TopP float64
	// MaxTokens is the maximum number of tokens to generate. Used to limit it
	// lower than the default maximum, for budget reasons.
	MaxTokens int64
	// TopLogprobs requests to return the top logprobs in the reply.
	TopLogprobs int64
	// SystemPrompt is the prompt to use for the system role.
	SystemPrompt string

	// Seed for the random number generator. Default is 0 which means
	// non-deterministic.
	Seed int64
	// TopK adjusts sampling where only the N first candidates are considered.
	TopK int64
	// Stop is the list of tokens to stop generation.
	Stop []string

	// ReplyAsJSON enforces the output to be valid JSON, any JSON. It is
	// important to tell the model to reply in JSON in the prompt itself.
	ReplyAsJSON bool
	// DecodeAs enforces a reply with a specific JSON structure. It must be a pointer to a struct that can be
	// decoded by encoding/json and can have jsonschema tags.
	//
	// It is important to request the model to "reply in JSON" in the prompt itself.
	//
	// It is recommended to use jsonschema_description tags to describe each
	// field or argument.
	//
	// Use jsonschema:"enum=..." to enforce a specific value within a set.
	//
	// Use omitempty to make the field optional.
	//
	// See https://github.com/invopop/jsonschema#example for more examples.
	DecodeAs any
	// contains filtered or unexported fields
}

OptionsText is a list of frequent options supported by most Provider with text output modality. Each provider is free to support more options through a specialized struct.

The first group are options supported by (nearly) all providers.

The second group are options supported only by some providers. Using them may cause the chat operation to return a base.ErrNotsupported.

The third group are options supported by a few providers and a few models on each, that will slow down generation (increase latency) and will increase token use (cost).

func (*OptionsText) Validate

func (o *OptionsText) Validate() error

Validate ensures the completion options are valid.

type OptionsTools

type OptionsTools struct {
	// Tools is the list of tools that the LLM can request to call.
	Tools []ToolDef
	// Force tells the LLM a tool call must be done, or not.
	Force ToolCallRequest
	// WebSearch specifies if websearch should be enabled. It is generally disabled by default except for
	// perplexity.
	//
	// # Warning
	//
	// This will become a structure to provide information about included and excluded domains, and the user's
	// location.
	WebSearch bool
}

func (*OptionsTools) Validate

func (o *OptionsTools) Validate() error

Validate ensures the completion options are valid.

type OptionsVideo

type OptionsVideo struct {
	// Duration of the video to generate, if supported.
	//
	// Veo 2 supports only between 5 and 8 seconds and Veo 3 only supports 8 seconds.
	Duration time.Duration

	// PollInterval is the time interval to poll the image generation progress when using GenSync.
	PollInterval time.Duration
	// contains filtered or unexported fields
}

func (*OptionsVideo) Validate

func (o *OptionsVideo) Validate() error

type Provider

type Provider interface {
	// Name returns the name of the provider.
	Name() string
	// ModelID returns the model currently used by the provider. It can be an empty string.
	ModelID() string
	// OutputModalities returns the output modalities supported by this specific client configuration.
	//
	// This states what kind of output the model will generate (text, audio, image, video). It varies per
	// provider and models. The vast majority of providers and models support only output modality like
	// text-only, image-only, etc.
	OutputModalities() Modalities
	// Capabilities returns the optional capabilities this provider supports.
	Capabilities() ProviderCapabilities
	// Scoreboard returns what the provider supports.
	//
	// Some models have more features than others, e.g. some models may be text-only while others have vision or
	// audio support.
	//
	// The client code may be the limiting factor for some models, and not the provider itself.
	//
	// The values returned here should have gone through a smoke test to make sure they are valid.
	Scoreboard() scoreboard.Score
	// HTTPClient returns the underlying http client. It may be necessary to use it to fetch the results from
	// the provider. An example is retrieving Veo 3 generated videos from Gemini requires the authentication
	// headers to be set.
	HTTPClient() *http.Client

	// GenSync runs generation synchronously.
	//
	// Multiple options can be mixed together, both standard ones like *OptionsImage, *OptionsText,
	// *OptionsTools and provider-specialized options struct, e.g. *anthropic.Options, *gemini.Options.
	GenSync(ctx context.Context, msgs Messages, opts ...Options) (Result, error)
	// GenStream runs generation synchronously, yielding the fragments of replies as the server sends them.
	//
	// No need to accumulate the fragments into a Message since the Result contains the accumulated message.
	GenStream(ctx context.Context, msgs Messages, opts ...Options) (iter.Seq[Reply], func() (Result, error))
	// ListModels returns the list of models the provider supports. Not all providers support it, some will
	// return an ErrorNotSupported. For local providers like llamacpp and ollama, they may return only the
	// model currently loaded.
	ListModels(ctx context.Context) ([]Model, error)

	// GenAsync requests a generation and returns a pending job that can be polled.
	//
	// Requires ProviderCapabilities.GenAsync to be set. Returns base.ErrNotSupported otherwise.
	GenAsync(ctx context.Context, msgs Messages, opts ...Options) (Job, error)
	// PokeResult requests the state of the job.
	//
	// When the job is still pending, Result.Usage.FinishReason is Pending.
	//
	// Requires ProviderCapabilities.GenAsync to be set. Returns base.ErrNotSupported otherwise.
	PokeResult(ctx context.Context, job Job) (Result, error)
	// CacheAddRequest caches a request.
	//
	// Requires ProviderCapabilities.Caching to be set. Returns base.ErrNotSupported otherwise.
	//
	// # Warning
	//
	// May be changed in the future.
	CacheAddRequest(ctx context.Context, msgs Messages, name, displayName string, ttl time.Duration, opts ...Options) (string, error)
	// CacheList lists the caches entries.
	//
	// Requires ProviderCapabilities.Caching to be set. Returns base.ErrNotSupported otherwise.
	//
	// # Warning
	//
	// May be changed in the future.
	CacheList(ctx context.Context) ([]CacheEntry, error)
	// CacheDelete deletes a cache entry.
	//
	// Requires ProviderCapabilities.Caching to be set. Returns base.ErrNotSupported otherwise.
	//
	// # Warning
	//
	// May be changed in the future.
	CacheDelete(ctx context.Context, name string) error
}

Provider is the base interface that all provider interfaces embed.

The first group contains local methods. Calling these methods will not make an HTTP request.

The second group is supported by the majority of providers.

The rest is supported by a limited number of providers.

Example (GenSync_audio)
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/maruel/genai"
	"github.com/maruel/genai/providers/gemini"
)

func main() {
	// Supported by Gemini, OpenAI.

	// Using a free small model for testing.
	// See https://ai.google.dev/gemini-api/docs/models/gemini?hl=en
	ctx := context.Background()
	c, err := gemini.New(ctx, &genai.ProviderOptions{Model: "gemini-2.5-flash-lite"}, nil)
	if err != nil {
		log.Fatal(err)
	}
	f, err := os.Open("internal/internaltest/testdata/mystery_word.mp3")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	msgs := genai.Messages{
		{
			Requests: []genai.Request{
				{Text: "What is the word said? Reply with only the word."},
				{Doc: genai.Doc{Src: f}},
			},
		},
	}
	resp, err := c.GenSync(ctx, msgs)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Heard: %v\n", strings.TrimRight(strings.ToLower(resp.String()), "."))
	// This would Output: Heard: orange
}
Example (GenSync_pdf)
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/maruel/genai"
	"github.com/maruel/genai/providers/gemini"
)

func main() {
	// Supported by Anthropic, Gemini, Mistral, OpenAI.

	// Using a free small model for testing.
	// See https://ai.google.dev/gemini-api/docs/models/gemini?hl=en
	ctx := context.Background()
	c, err := gemini.New(ctx, &genai.ProviderOptions{Model: "gemini-2.5-flash-lite"}, nil)
	if err != nil {
		log.Fatal(err)
	}
	f, err := os.Open("internal/internaltest/testdata/hidden_word.pdf")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	msgs := genai.Messages{
		{
			Requests: []genai.Request{
				{Text: "What is the word? Reply with only the word."},
				{Doc: genai.Doc{Src: f}},
			},
		},
	}
	resp, err := c.GenSync(ctx, msgs)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Hidden word in PDF: %v\n", strings.ToLower(resp.String()))
	// This would Output: Hidden word in PDF: orange
}
Example (GenSync_video)
package main

import (
	"context"
	"fmt"
	"log"
	"os"
	"strings"

	"github.com/maruel/genai"
	"github.com/maruel/genai/providers/gemini"
)

func main() {
	// Supported by Gemini, TogetherAI.

	// Using a free small model for testing.
	// See https://ai.google.dev/gemini-api/docs/models/gemini?hl=en
	ctx := context.Background()
	c, err := gemini.New(ctx, &genai.ProviderOptions{Model: "gemini-2.5-flash"}, nil)
	if err != nil {
		log.Fatal(err)
	}
	f, err := os.Open("internal/internaltest/testdata/animation.mp4")
	if err != nil {
		log.Fatal(err)
	}
	defer f.Close()
	// TogetherAI seems to require separate messages for text and images.
	msgs := genai.Messages{
		genai.NewTextMessage("What is the word? Reply with exactly and only one word."),
		{Requests: []genai.Request{{Doc: genai.Doc{Src: f}}}},
	}
	resp, err := c.GenSync(ctx, msgs)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Saw: %v\n", strings.ToLower(resp.String()))
	// This would Output: Saw: banana
}
Example (GenSync_vision)
package main

import (
	"bytes"
	"context"
	"fmt"
	"log"
	"os"

	"github.com/maruel/genai"
	"github.com/maruel/genai/providers/gemini"
)

func main() {
	// Supported by Anthropic, Gemini, Groq, Mistral, Ollama, OpenAI, TogetherAI.

	// Using a free small model for testing.
	// See https://ai.google.dev/gemini-api/docs/models/gemini?hl=en
	ctx := context.Background()
	c, err := gemini.New(ctx, &genai.ProviderOptions{Model: "gemini-2.5-flash-lite"}, nil)
	if err != nil {
		log.Fatal(err)
	}
	bananaJpg, err := os.ReadFile("internal/internaltest/testdata/banana.jpg")
	if err != nil {
		log.Fatal(err)
	}
	msgs := genai.Messages{
		{
			Requests: []genai.Request{
				{Text: "Is it a banana? Reply with only the word yes or no."},
				{Doc: genai.Doc{Filename: "banana.jpg", Src: bytes.NewReader(bananaJpg)}},
			},
		},
	}
	resp, err := c.GenSync(ctx, msgs)
	if err != nil {
		log.Fatal(err)
	}
	fmt.Printf("Banana: %v\n", resp.String())
	// This would Output: Banana: yes
}

type ProviderCapabilities

type ProviderCapabilities struct {
	// GenAsync indicates the provider supports GenAsync and PokeResult for batch operations.
	GenAsync bool
	// Caching indicates the provider supports CacheAddRequest, CacheList, and CacheDelete.
	Caching bool
	// contains filtered or unexported fields
}

ProviderCapabilities describes optional capabilities a provider supports.

type ProviderOptions

type ProviderOptions struct {
	// APIKey provides an API key to authenticate to the server.
	//
	// Most providers require an API key, and the client will look at an environment variable
	// "<PROVIDER>_API_KEY" to use as a default value if unspecified.
	APIKey string `json:"apikey,omitzero" yaml:"apikey,omitzero"`
	// AccountID provides an account ID key. Rarely used (only Cloudflare).
	AccountID string `json:"accountid,omitzero" yaml:"accountid,omitzero"`
	// Remote is the remote address to access the service.
	//
	// It is mostly used by locally hosted services (llamacpp, ollama) or for generic client (openaicompatible).
	Remote string `json:"remote,omitzero" yaml:"remote,omitzero"`
	// Model either specifies an exact the model ID to use, request the provider to select a model on your
	// behalf or explicitly ask for no model.
	//
	// To use automatic model selection, pass ModelCheap to use a cheap model, ModelGood for a good
	// everyday model or ModelSOTA to use its state of the art (SOTA) model. In this case, the provider
	// internally call ListModels() to discover models and select the right one based on its heuristics.
	// Provider that do not support ListModels, e.g. bfl or perplexity, will use an hardcoded list.
	//
	// When unspecified, i.e. with "", it defaults to automatic model selection with ModelGood.
	//
	// There are two ways to disable automatic model discovery and selection: specify a model ID or use
	// ModelNone.
	//
	// Keep in mind that as providers cycle through new models, it's possible a specific model ID is not
	// available anymore or that the default model changes.
	Model string `json:"model,omitzero" yaml:"model,omitzero"`
	// OutputModalities is the list of output modalities you request the model to support.
	//
	// Most provider support text output only. Most models support output of only one modality, either text,
	// image, audio or video. But a few models do support both text and images.
	//
	// When unspecified, it defaults to all modalities supported by the provider and the selected model.
	//
	// Even when Model is set to a specific model ID, a ListModels call may be made to discover its supported
	// output modalities for providers that support multiple output modalities.
	//
	// OutputModalities can be set when Model is set to ModelNone to test if a provider support this modality without
	// causing a ListModels call.
	OutputModalities Modalities `json:"modalities,omitzero" yaml:"modalities,omitzero"`
	// PreloadedModels is a list of models that are preloaded into the provider, to replace the call to
	// ListModels, for example with automatic model selection and modality detection.
	//
	// This is mostly used for unit tests or repeated client creation to save on HTTP requests.
	PreloadedModels []Model
	// contains filtered or unexported fields
}

ProviderOptions contains all the options to connect to a model provider.

All fields are optional, but some provider do require some of the fields.

func (*ProviderOptions) Validate

func (p *ProviderOptions) Validate() error

type ProviderPing

type ProviderPing interface {
	Provider
	// Ping enables confirming that the provider is accessible, without incurring cost. This is useful for local
	// providers to detect if they are accessible or not.
	Ping(ctx context.Context) error
}

ProviderPing represents a provider that you can ping.

type ProviderUnwrap

type ProviderUnwrap interface {
	Unwrap() Provider
}

ProviderUnwrap is exposed when the Provider is actually a wrapper around another one, like adapters.ProviderReasoning or ProviderUsage. This is useful when looking for other interfaces.

type RateLimit

type RateLimit struct {
	Type      RateLimitType
	Period    RateLimitPeriod
	Limit     int64
	Remaining int64
	Reset     time.Time
}

RateLimit contains the limit, remaining, and reset values for a metric.

func (*RateLimit) String

func (r *RateLimit) String() string

func (*RateLimit) Validate

func (r *RateLimit) Validate() error

type RateLimitPeriod

type RateLimitPeriod int32

RateLimitPeriod defines the time period for a rate limit.

const (
	PerOther RateLimitPeriod = iota // For non-standard periods
	PerMinute
	PerDay
	PerMonth
)

type RateLimitType

type RateLimitType int32

RateLimitType defines the type of rate limit.

const (
	Requests RateLimitType = iota + 1
	Tokens
)

type Reply

type Reply struct {
	// Text is the content of the text message.
	Text string `json:"text,omitzero"`

	// Doc can be audio, video, image, PDF or any other format, including reference text.
	Doc Doc `json:"doc,omitzero"`

	// Citation contains references to source material that support the content.
	Citation Citation `json:"citation,omitzero"`

	// Reasoning is the reasoning done by the LLM.
	Reasoning string `json:"reasoning,omitzero"`

	// ToolCall is a tool call that the LLM requested to make.
	ToolCall ToolCall `json:"tool_call,omitzero"`

	// Opaque is added to keep continuity on the processing. A good example is Anthropic's extended thinking, or
	// server-side tool calling. It must be kept during an exchange.
	//
	// A message with only Opaque set is valid. It can be used in combination with other fields. This field is
	// specific to both the provider and the model.
	//
	// The data must be JSON-serializable.
	Opaque map[string]any `json:"opaque,omitzero"`
	// contains filtered or unexported fields
}

Reply is a block of information returned by the provider.

Normally only one of the field must be set. The exception is the Opaque field.

Reply generally represents content returned by the provider, like a block of text or a document returned by the model. It can be a silent tool call request. It can also be an opaque block. A good example is traces of server side tool calling like WebSearch or MCP tool calling.

func (*Reply) GoString

func (r *Reply) GoString() string

GoString returns a JSON representation of the reply for debugging purposes.

func (*Reply) IsZero

func (r *Reply) IsZero() bool

IsZero returns true if the Reply is empty.

An empty reply is not valid.

func (*Reply) UnmarshalJSON

func (r *Reply) UnmarshalJSON(b []byte) error

func (*Reply) Validate

func (r *Reply) Validate() error

Validate ensures the block is valid.

type Request

type Request struct {
	// Text is the content of the text message.
	Text string `json:"text,omitzero"`

	// Doc can be audio, video, image, PDF or any other format, including reference text.
	Doc Doc `json:"doc,omitzero"`
	// contains filtered or unexported fields
}

Request is a block of content in the message meant to be visible in a chat setting.

It is effectively a union, only one of the 2 related field groups can be set.

func (*Request) UnmarshalJSON

func (r *Request) UnmarshalJSON(b []byte) error

func (*Request) Validate

func (r *Request) Validate() error

Validate ensures the block is valid.

type Result

type Result struct {
	Message
	Usage Usage
	// Logprobs is a list of multiple log probabilities, each for a token.
	//
	// The first item of each subslice is the chosen token. The next items are the candidates not chosen.
	//
	// Some providers only return the probability for the chosen tokens and not for the candidates.
	Logprobs [][]Logprob
}

Result is the result of a completion.

It is a Message along with Usage metadata about the operation. It optionally include Logprobs if requested and the provider support it.

type ToolCall

type ToolCall struct {
	ID        string `json:"id,omitzero"`        // Unique identifier for the tool call. Necessary for parallel tool calling.
	Name      string `json:"name,omitzero"`      // Tool being called.
	Arguments string `json:"arguments,omitzero"` // encoded as JSON

	// Opaque is added to keep continuity on the processing. A good example is Anthropic's extended thinking. It
	// must be kept during an exchange.
	//
	// A message with only Opaque set is valid.
	Opaque map[string]any `json:"opaque,omitzero"`
	// contains filtered or unexported fields
}

ToolCall is a tool call that the LLM requested to make.

func (*ToolCall) Call

func (t *ToolCall) Call(ctx context.Context, tools []ToolDef) (string, error)

Call invokes the ToolDef.Callback with arguments from the ToolCall, returning the result string.

It decodes the ToolCall.Arguments and passes it to the ToolDef.Callback.

func (*ToolCall) IsZero

func (t *ToolCall) IsZero() bool

func (*ToolCall) UnmarshalJSON

func (t *ToolCall) UnmarshalJSON(b []byte) error

func (*ToolCall) Validate

func (t *ToolCall) Validate() error

Validate ensures the tool call request from the LLM is valid.

type ToolCallRequest

type ToolCallRequest int

ToolCallRequest determines if we want the LLM to request a tool call.

const (
	// ToolCallAny is the default, the model is free to choose if a tool is called or not. For some models (like
	// llama family), it may be a bit too "tool call happy".
	ToolCallAny ToolCallRequest = iota
	// ToolCallRequired means a tool call is required. Don't forget to change the value after sending the
	// response!
	ToolCallRequired
	// ToolCallNone means that while tools are described, they should not be called. It is useful when a LLM did
	// tool calls, got the response and now it's time to generate some text to present to the end user.
	ToolCallNone
)

type ToolCallResult

type ToolCallResult struct {
	ID     string `json:"id,omitzero"`
	Name   string `json:"name,omitzero"`
	Result string `json:"result,omitzero"`
	// contains filtered or unexported fields
}

ToolCallResult is the result for a tool call that the LLM requested to make.

func (*ToolCallResult) UnmarshalJSON

func (t *ToolCallResult) UnmarshalJSON(b []byte) error

func (*ToolCallResult) Validate

func (t *ToolCallResult) Validate() error

Validate ensures the tool result is valid.

type ToolDef

type ToolDef struct {
	// Name must be unique among all tools.
	Name string
	// Description must be a LLM-friendly short description of the tool.
	Description string
	// Callback is the function to call with the inputs.
	// It must accept a context.Context one struct pointer as input: (ctx context.Context, input *struct{}). The
	// struct must use json_schema to be serializable as JSON.
	// It must return the result and an error: (string, error).
	Callback any
	// InputSchemaOverride overrides the schema deduced from the Callback's second argument. It's meant to be
	// used when an enum or a description is set dynamically, or with complex if/then/else that would be tedious
	// to describe as struct tags.
	//
	// It is okay to initialize Callback, then take the return value of GetInputSchema() to initialize InputSchemaOverride, then mutate it.
	InputSchemaOverride *jsonschema.Schema
	// contains filtered or unexported fields
}

ToolDef describes a tool that the LLM can request to use.

func (*ToolDef) GetInputSchema

func (t *ToolDef) GetInputSchema() *jsonschema.Schema

GetInputSchema returns the json schema for the input argument of the callback.

func (*ToolDef) Validate

func (t *ToolDef) Validate() error

Validate ensures the tool definition is valid.

For the Name field, it uses the rule according to https://docs.anthropic.com/en/docs/agents-and-tools/tool-use/implement-tool-use#example-simple-tool-definition

type Usage

type Usage struct {
	// Token usage for the current request.
	InputTokens       int64
	InputCachedTokens int64
	ReasoningTokens   int64
	OutputTokens      int64
	TotalTokens       int64
	// FinishReason indicates why the model stopped generating tokens.
	FinishReason FinishReason
	// Limits contains a list of rate limit details from the provider.
	Limits []RateLimit
}

Usage from the LLM provider.

func (*Usage) Add

func (u *Usage) Add(r Usage)

Add accumulates the usage from another result.

func (*Usage) String

func (u *Usage) String() string

Directories

Path Synopsis
Package adapters includes multiple adapters to convert one ProviderFoo interface into another one.
Package adapters includes multiple adapters to convert one ProviderFoo interface into another one.
Package base is awesome sauce to reduce code duplication across most providers.
Package base is awesome sauce to reduce code duplication across most providers.
cmd
cache-mgr command
Command cache-mgr fetches and prints out the list of files stored on the selected provider.
Command cache-mgr fetches and prints out the list of files stored on the selected provider.
list-models command
Command list-models fetches and prints out the list of models from the selected providers.
Command list-models fetches and prints out the list of models from the selected providers.
llama-serve command
Command llama-serve fetches a model from HuggingFace and runs llama-server.
Command llama-serve fetches a model from HuggingFace and runs llama-server.
scoreboard command
Command scoreboard generates a scoreboard for every providers supported.
Command scoreboard generates a scoreboard for every providers supported.
examples
aud-txt_to_txt command
img-txt_to_img command
img-txt_to_txt command
img-txt_to_vid command
txt_to_img command
txt_to_txt_any command
txt_to_txt_sync command
vid-txt_to_txt command
Package httprecord provides safe HTTP recording logic for users that was to understand the API and do smoke tests.
Package httprecord provides safe HTTP recording logic for users that was to understand the API and do smoke tests.
Package internal is awesome sauce.
Package internal is awesome sauce.
bb
Package bb is a separate package so it can be imported by genai while being internal and exported so cmp.Diff() isn't unhappy.
Package bb is a separate package so it can be imported by genai while being internal and exported so cmp.Diff() isn't unhappy.
internaltest
Package internaltest is awesome sauce for unit testing.
Package internaltest is awesome sauce for unit testing.
myrecorder
Package myrecorder has HTTP recording logic.
Package myrecorder has HTTP recording logic.
sse
Package sse provides Server-Sent Events (SSE) processing utilities.
Package sse provides Server-Sent Events (SSE) processing utilities.
Package providers is the root of all standard providers.
Package providers is the root of all standard providers.
anthropic
Package anthropic implements a client for the Anthropic API, to use Claude.
Package anthropic implements a client for the Anthropic API, to use Claude.
bfl
Package bfl implements a client for Black Forest Labs API.
Package bfl implements a client for Black Forest Labs API.
cerebras
Package cerebras implements a client for the Cerebras API.
Package cerebras implements a client for the Cerebras API.
cloudflare
Package cloudflare implements a client for the Cloudflare AI API.
Package cloudflare implements a client for the Cloudflare AI API.
cohere
Package cohere implements a client for the Cohere API.
Package cohere implements a client for the Cohere API.
deepseek
Package deepseek implements a client for the DeepSeek API.
Package deepseek implements a client for the DeepSeek API.
gemini
Package gemini implements a client for Google's Gemini API.
Package gemini implements a client for Google's Gemini API.
groq
Package groq implements a client for the Groq API.
Package groq implements a client for the Groq API.
huggingface
Package huggingface implements a client for the HuggingFace serverless inference API.
Package huggingface implements a client for the HuggingFace serverless inference API.
llamacpp
Package llamacpp implements a client for the llama-server native API, not the OpenAI compatible one.
Package llamacpp implements a client for the llama-server native API, not the OpenAI compatible one.
llamacpp/llamacppsrv
Package llamacppsrv downloads and starts llama-server from llama.cpp, directly from GitHub releases.
Package llamacppsrv downloads and starts llama-server from llama.cpp, directly from GitHub releases.
mistral
Package mistral implements a client for the Mistral API.
Package mistral implements a client for the Mistral API.
ollama
Package ollama implements a client for the Ollama API.
Package ollama implements a client for the Ollama API.
ollama/ollamasrv
Package ollamasrv downloads and starts ollama directly from GitHub releases.
Package ollamasrv downloads and starts ollama directly from GitHub releases.
openaichat
Package openaichat implements a client for the OpenAI Chat Completion API.
Package openaichat implements a client for the OpenAI Chat Completion API.
openaicompatible
Package openaicompatible implements a minimal client for "OpenAI-compatible" providers.
Package openaicompatible implements a minimal client for "OpenAI-compatible" providers.
openairesponses
Package openairesponses implements a client for the OpenAI Responses API.
Package openairesponses implements a client for the OpenAI Responses API.
perplexity
Package perplexity implements a client for the Perplexity API.
Package perplexity implements a client for the Perplexity API.
pollinations
Package pollinations implements a client for the Pollinations API.
Package pollinations implements a client for the Pollinations API.
togetherai
Package togetherai implements a client for the Together.ai API.
Package togetherai implements a client for the Together.ai API.
Package scoreboard declares the structures to define a scoreboard.
Package scoreboard declares the structures to define a scoreboard.
Package smoke runs a smoke test to generate a scoreboard.Scenario.
Package smoke runs a smoke test to generate a scoreboard.Scenario.
smoketest
Package smoketest runs a scoreboard in test mode.
Package smoketest runs a scoreboard in test mode.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL