Add Ollama provider #5051

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

rrrodzilla wants to merge 8 commits into tensorzero:main from Govcraft:feat/ollama-provider

rrrodzilla commented Dec 8, 2025 •

edited

Loading

Summary

This PR adds native Ollama provider support for local LLM inference via Ollama's /api/chat endpoint.

Changes

Add Ollama variant to ProviderType enum with model_name and api_base configuration
Implement InferenceProvider trait for Ollama with full feature support:
- Streaming and non-streaming inference
- Tool calling with function definitions
- JSON mode with schema enforcement
- All standard inference parameters (temperature, max_tokens, etc.)
Add ollama::model_name shorthand syntax for inline model configuration
Integrate Ollama into model routing table
Add Docker Compose services for E2E testing with qwen2:0.5b model
Add E2E test configuration and test suite

Known Limitations

Image/file inputs: Not yet supported. Ollama supports images via base64, but resolving LazyFile asynchronously requires additional work. File blocks are skipped with a warning.
Unsupported inference parameters: The following parameters are not supported by Ollama and will emit warnings if used:
- reasoning_effort
- service_tier
- thinking_budget_tokens
- verbosity

Local Test Results

97 tests passed (streaming, tool use, JSON mode, inference params, shorthand, embedded gateway)

Notes

Uses Ollama's native /api/chat endpoint (NDJSON streaming, not SSE)
Tests gracefully skip when Ollama is unavailable (returns empty provider vectors)
Docker Compose includes ollama-model-pull service to pull model on startup

Test plan

E2E tests with Ollama container pass (97/97)
CI runs E2E tests successfully
Ollama inference works with streaming and non-streaming modes
Tool calling and JSON mode function correctly

rrrodzilla added 8 commits

December 6, 2025 19:00


          feat(provider): add Ollama provider type

c2aa043

Register Ollama as a new provider type with configuration options
for model_name and api_base.


          feat(provider): implement Ollama inference provider

ce71c6a

Implement InferenceProvider trait for Ollama using native /api/chat endpoint:
- Streaming and non-streaming inference
- Tool calling with function definitions
- JSON mode with schema enforcement
- Support for all standard inference parameters


          feat(model): add Ollama shorthand and routing support

f70b18f

Add ollama::model_name shorthand syntax for inline model configuration
and integrate Ollama into the model routing table.


          feat(e2e): add Ollama service to Docker Compose

997a86d

Add Ollama container with qwen2:0.5b model for E2E testing:
- ollama service with healthcheck
- ollama-model-pull service to pull model on startup
- Volume for model persistence


          feat(e2e): add Ollama test configuration

a35affb

Add qwen2-0.5b-ollama model configuration and function variants
for E2E testing.


          test(e2e): add Ollama provider E2E tests

2dd7490

Add E2E tests for Ollama provider covering:
- Simple inference
- Streaming inference
- JSON mode
- Inference parameters
- Shorthand syntax
- Credential fallbacks


          Merge branch 'main' into feat/ollama-provider

0eedb02


          fix(ollama): add shorthand support and fix test configuration

d2ad769

- Add "ollama::" to SHORTHAND_MODEL_PREFIXES for shorthand model syntax
- Add ollama case in from_shorthand match for shorthand resolution
- Quote variant names with dots in TOML configs to prevent parsing issues
- Use correct model name (qwen2:0.5b) for shorthand test variant

virajmehta assigned Aaron1011

Member

virajmehta commented Dec 8, 2025

Hi @rrrodzilla, thanks for the PR. I am curious what was wrong with using the OpenAI client to call Ollama that warrants a separate provider here. Do you mind sharing?

Author

rrrodzilla commented Dec 8, 2025

Oh haha. I just saw there'd been an open issue for it that didn't have any movement and I went for it because I was bored! No worries. It was a good exercise. Can disregard. 😬

GabrielBianconi assigned GabrielBianconi and unassigned Aaron1011

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet