Skip to content

Conversation

@rrrodzilla
Copy link

@rrrodzilla rrrodzilla commented Dec 8, 2025

Summary

This PR adds native Ollama provider support for local LLM inference via Ollama's /api/chat endpoint.

Changes

  • Add Ollama variant to ProviderType enum with model_name and api_base configuration
  • Implement InferenceProvider trait for Ollama with full feature support:
    • Streaming and non-streaming inference
    • Tool calling with function definitions
    • JSON mode with schema enforcement
    • All standard inference parameters (temperature, max_tokens, etc.)
  • Add ollama::model_name shorthand syntax for inline model configuration
  • Integrate Ollama into model routing table
  • Add Docker Compose services for E2E testing with qwen2:0.5b model
  • Add E2E test configuration and test suite

Known Limitations

  • Image/file inputs: Not yet supported. Ollama supports images via base64, but resolving LazyFile asynchronously requires additional work. File blocks are skipped with a warning.
  • Unsupported inference parameters: The following parameters are not supported by Ollama and will emit warnings if used:
    • reasoning_effort
    • service_tier
    • thinking_budget_tokens
    • verbosity

Local Test Results

  • 97 tests passed (streaming, tool use, JSON mode, inference params, shorthand, embedded gateway)

Notes

  • Uses Ollama's native /api/chat endpoint (NDJSON streaming, not SSE)
  • Tests gracefully skip when Ollama is unavailable (returns empty provider vectors)
  • Docker Compose includes ollama-model-pull service to pull model on startup

Test plan

  • E2E tests with Ollama container pass (97/97)
  • CI runs E2E tests successfully
  • Ollama inference works with streaming and non-streaming modes
  • Tool calling and JSON mode function correctly

Register Ollama as a new provider type with configuration options
for model_name and api_base.
Implement InferenceProvider trait for Ollama using native /api/chat endpoint:
- Streaming and non-streaming inference
- Tool calling with function definitions
- JSON mode with schema enforcement
- Support for all standard inference parameters
Add ollama::model_name shorthand syntax for inline model configuration
and integrate Ollama into the model routing table.
Add Ollama container with qwen2:0.5b model for E2E testing:
- ollama service with healthcheck
- ollama-model-pull service to pull model on startup
- Volume for model persistence
Add qwen2-0.5b-ollama model configuration and function variants
for E2E testing.
Add E2E tests for Ollama provider covering:
- Simple inference
- Streaming inference
- JSON mode
- Inference parameters
- Shorthand syntax
- Credential fallbacks
- Add "ollama::" to SHORTHAND_MODEL_PREFIXES for shorthand model syntax
- Add ollama case in from_shorthand match for shorthand resolution
- Quote variant names with dots in TOML configs to prevent parsing issues
- Use correct model name (qwen2:0.5b) for shorthand test variant
@virajmehta
Copy link
Member

Hi @rrrodzilla, thanks for the PR. I am curious what was wrong with using the OpenAI client to call Ollama that warrants a separate provider here. Do you mind sharing?

@rrrodzilla
Copy link
Author

Oh haha. I just saw there'd been an open issue for it that didn't have any movement and I went for it because I was bored! No worries. It was a good exercise. Can disregard. 😬

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants