Modelito is a compact, dependency-light Python library that provides provider- agnostic abstractions and connectors for large language models (LLMs). It offers lightweight shims for OpenAI, Claude, Gemini, oMLX, local Ollama deployments, and local OpenAI-compatible servers (llama.cpp, vLLM, LM Studio), plus utilities for token counting, timeout estimation, and small helpers to manage Ollama servers when needed. The library is designed for easy integration into applications and CI pipelines.
To install the latest released version from PyPI:
pip install modelitopip install modelito does not install FastAPI/Uvicorn. Those are optional
and only needed for modelito-serve.
Release publishing uses PyPI trusted publishing through
.github/workflows/publish.yml. The PyPI project must have a matching trusted
publisher configured for repository krahd/modelito, workflow publish.yml,
and environment pypi. The workflow also checks that the tag version matches
pyproject.toml before building or publishing.
For development / contributor setup (editable install and dev dependencies):
pip install -e .[dev]
# Optional: add runtime extras for full functionality
pip install -e .[ollama,tokenization,openai,anthropic,gemini,grok]Run tests (for contributors):
pytest -qIf you need to test a preview build published to TestPyPI, use the TestPyPI index. TestPyPI packages are for testing only and may not be stable.
python -m pip install --index-url https://test.pypi.org/simple/ \
--extra-index-url https://pypi.org/simple modelito==<version>If installation from the index fails, download the wheel from the TestPyPI "Files" page and install it directly.
To build a source distribution and wheel locally:
python -m pip install --upgrade build
python -m buildInstall from the built wheel:
pip install dist/*.whlSee the docs/ folder for more details:
- ARCHITECTURE.md — Core design, Provider Protocol, and SDK hierarchy
- USAGE.md — Usage guide and examples
- local-openai-compatible.md — Using local OpenAI-compatible servers
- INSTALL.md, API.md — Installation and API reference
- RELEASE.md — Release checklist and publication steps
The diagrams below are intentionally compact and current-state oriented. Detailed architecture policy lives in docs/ARCHITECTURE.md.
This package provides compatibility shims and small, dependency-light implementations for common provider interfaces. When optional extras are installed the package will attempt to use real SDK clients; otherwise the shims provide safe offline-friendly fallbacks suitable for testing.
Recommended entry point:
from modelito import Client
from modelito.messages import Message
client = Client(provider="auto", prefer=["omlx", "ollama"], model="omlx")
response = client.chat([Message(role="user", content="Hello")])
print(response.text)Provided shims and utilities:
OpenAIProvider— hosted OpenAI / SDK-backed provider that can also target OpenAI-compatible APIs viabase_url.OpenAICompatibleHTTPProvider— shared HTTP base class for local or OpenAI-compatible runtimes.RawChatProvider— protocol for preserving raw OpenAI chat completion payloads without collapsing them into text.ClaudeProvider— will use the official Anthropic SDK when installed, falling back to deterministic behavior otherwise.GeminiProvider,GrokProvider— lightweight shims.OMLXProvider— thin oMLX preset built onOpenAICompatibleHTTPProvider.OllamaProvider— HTTP-aware provider that can call a local Ollama HTTP API through stdlib helpers and can fall back to the local Ollama CLI or deterministic test behavior when needed. Themodelito[ollama]extra installs optional support dependencies used by the broader Ollama service-management helpers.
The client layer recognises the same provider stack through ChatProvider,
MessageInput, and structured response helpers such as Client.chat() and
Client.chat_json().
OpenAICompatibleHTTPProvider, OMLXProvider, and OpenAIProvider also
expose raw_complete() and raw_stream() for OpenAI-compatible passthrough.
Client.chat_parsed() remains the structured JSON convenience path for Python
applications.
For quick diagnostics, use the provider readiness API or CLI:
from modelito import check_provider_ready
status = check_provider_ready("omlx", model="omlx")
print(status.ready, status.reason)The shared message normaliser is exported as flatten_message_inputs from
both modelito and modelito.messages for callers that need OpenAI-style
dict conversion.
python -m modelito doctor --provider omlx --model omlxServer mode for non-Python clients:
pip install "modelito[serve]"
modelito-serve --provider omlx --port 11436 --host 127.0.0.1 --strict--profile and --profile-path are currently treated as profile file paths;
--profile-path takes precedence when both are provided.
modelito-serve exposes OpenAI-compatible /v1/models,
/v1/chat/completions, and /v1/embeddings endpoints.
Pi integration uses HTTP only. Pi is a TypeScript/Node harness and does not
import Modelito directly; point Pi at the modelito-serve base URL via its
OpenAI-compatible custom provider configuration.
Example ~/.pi/agent/models.json provider entry:
{
"providers": {
"modelito": {
"baseUrl": "http://localhost:11436/v1",
"api": "openai-completions",
"apiKey": "modelito",
"authHeader": false,
"compat": {
"supportsDeveloperRole": false,
"supportsReasoningEffort": false
},
"models": [
{
"id": "omlx",
"name": "oMLX via Modelito",
"reasoning": false,
"input": ["text"],
"contextWindow": 8192,
"maxTokens": 4096,
"cost": {
"input": 0,
"output": 0,
"cacheRead": 0,
"cacheWrite": 0
}
}
]
}
}
}Tool-calling workflows require raw passthrough support. Modelito currently
implements that on OpenAICompatibleHTTPProvider and the hosted
OpenAIProvider; OMLXProvider inherits it automatically. OllamaProvider
raw passthrough is deferred. For tool-calling integrations today, prefer
oMLX/OpenAI-compatible raw providers or hosted OpenAI.
The package also exposes a small Ollama administration layer for local model
operations, including install backend detection, remote catalog metadata,
download lifecycle tracking, and explicit model readiness confirmation through
helpers such as detect_install_method, list_remote_model_catalog,
download_model_progress, and ensure_model_ready.
This software is provided "AS IS" and without warranties of any kind. See
the included LICENSE file for the full MIT license text.
CI / Integration Tests
This repository includes a consolidated GitHub Actions workflow at
.github/workflows/ci.yml. It runs linting/type checks and unit tests for pull
requests and pushes to main, and builds docs on non-PR runs.
Release publishing uses PyPI trusted publishing through
.github/workflows/publish.yml. The PyPI project must have a matching trusted
publisher configured for repository krahd/modelito, workflow publish.yml,
and environment pypi. The workflow also checks that the tag version matches
pyproject.toml before building or publishing.
Ollama integration tests are intentionally gated and will only run when you
explicitly enable them. To run integration tests locally or in CI set the
environment variable RUN_OLLAMA_INTEGRATION=1. Additional optional flags:
ALLOW_OLLAMA_INSTALL=1— permit the integration tests to attempt installing Ollama when missing.ALLOW_OLLAMA_DOWNLOAD=1— permit downloading remote models during integration tests.ALLOW_OLLAMA_UPDATE=1— permit running update flows during integration tests.
Example (local):
RUN_OLLAMA_INTEGRATION=1 pytest tests/test_ollama_integration.py -qProvider integration tests for external services (OpenAI, Anthropic, etc.) are intentionally not part of default hosted CI to keep pull requests fast and low noise. Use local/manual execution for those checks when needed.
There is a dedicated self-hosted Ollama workflow at
.github/workflows/integration-ollama.yml for maintainers who want broader
integration checks on controlled infrastructure.
modelito exposes a minimal structural Provider Protocol for the legacy
synchronous surface, but the recommended application entry point is Client
and its richer chat API. The Protocol is intentionally small to remain
compatible with existing duck-typed providers — it requires only:
list_models()->list[str]summarize(messages, settings=None)->str
All built-in providers shipped with the package (OpenAIProvider,
ClaudeProvider, GeminiProvider, OMLXProvider, OllamaProvider, GrokProvider) satisfy
the Provider protocol structurally. The Provider Protocol is decorated with
@runtime_checkable, so you can use isinstance() checks at runtime when
you need to enforce the contract in application code.
Example usage:
from modelito import Provider, OllamaProvider
p: Provider = OllamaProvider()
if isinstance(p, Provider):
from modelito.messages import Message
resp = p.summarize([Message(role="user", content="hello")])
print(resp)The package provides typed Message/Response dataclasses and exposes a
small set of optional Protocols for provider surfaces:
SyncProvider(alias:Provider) — existing synchronoussummarize()/list_models()surface.AsyncProvider— asyncacomplete()surface for providers that support awaitable calls.StreamingProvider— streamingstream()generator surface.EmbeddingProvider—embed()surface for vector embeddings.
Embeddings can also be selected at runtime through the dedicated Embedder
wrapper when you only need the embedding surface instead of the full text
generation client:
from modelito import Embedder
embedder = Embedder(provider="openai")
vectors = embedder.embed(["hello", "world"])
print(len(vectors), len(vectors[0]))
print(Embedder.available_embedders())modelito exposes Message and Response dataclasses; client and provider
helpers accept Message, plain strings, and OpenAI-style dict inputs.
from modelito import Provider, Message, OllamaProvider
# Create a provider
provider: Provider = OllamaProvider()
# Single request
resp = provider.summarize([Message(role="user", content="hello")])
print(resp)
# Streaming
for chunk in provider.stream([Message(role="user", content="tell me a story")]):
print(chunk, end="", flush=True)Use a bare Provider when:
- You manage conversation state yourself
- You're doing single-shot or stateless inference
- You need minimal abstraction
- You're building a custom application architecture
from modelito import Message, OllamaConnector, OllamaProvider
# Create a connector
conn = OllamaConnector(provider=OllamaProvider())
# Multi-turn conversation (state is tracked automatically)
res = conn.complete(
conv_id="chat_session_1",
new_messages=[Message(role="user", content="what's 2+2?")]
)
print(res.text)
# Second turn (history is remembered)
res = conn.complete(
conv_id="chat_session_1",
new_messages=[Message(role="user", content="and 3+3?")]
)
print(res.text)Use OllamaConnector when:
- You need automatic conversation history tracking
- You're building a multi-turn chatbot
- You want to manage per-conversation state without writing it yourself
For more details, see ARCHITECTURE.md
Modelito normalizes provider streaming into a simple incremental text stream. Providers may emit data at different granularities; the connector/streaming helpers attempt to normalize these into a sequence of text chunks that are safe to concatenate to form the final output. Common shapes you will encounter:
- Token-level: Backends (e.g., OpenAI SDK) may stream individual token deltas. These are emitted as short text fragments; consumers should append fragments in order to reconstruct the full output.
- Chunk-level: Some providers deliver logical chunks or events (for example, chunked JSON payloads). Modelito extracts the textual portion and yields it as incremental chunks.
- Line-delimited / SSE: HTTP services (like Ollama's
/api/generate) may send newline-delimited JSON or SSE frames. Modelito reads and normalizes the frames and yields textual content as it becomes available.
Behavioral notes:
- The
stream()generator yieldsstrpieces; each yielded item is intended to be appended to reconstruct the response incrementally. - When you need token-level control (e.g., streaming token-by-token), prefer providers that expose token deltas (OpenAI SDK). Modelito will still yield those token deltas as text fragments.
- Offline/deterministic fallbacks yield the full text in a single chunk.
Modelito includes internal/helper modules such as local model management,
API key helpers, mock providers, cache helpers, and batching utilities.
These modules are not currently presented as stable top-level package exports
in this README. Prefer the documented Client, provider adapters, connector,
and server entrypoints for application integrations.
See the tests/ directory for comprehensive coverage and usage examples.