// llm-powered data processing engine
Ondine is a batch processing engine for tabular data. Feed it a pandas or Polars DataFrame, pick any LLM, get structured columns back. Cost control, checkpointing, and anti-hallucination are built in.
Capabilities
Not a toy wrapper. Ondine handles checkpointing, budget limits, structured output, and anti-hallucination out of the box.
Usage
Drop into any pandas or Polars workflow. No schemas, no infrastructure, no boilerplate.
import polars as pl # or: import pandas as pd from ondine import Ondine df = pl.read_csv("reviews.csv") # 10,000 rows — pandas works too engine = Ondine( model="gpt-5.4-mini", prompt="Classify sentiment: positive, neutral, or negative.", batch_size=50, # 50 rows per API call → 200 calls, not 10,000 ) results = engine.run(df) print(results[["review", "sentiment"]])
from pydantic import BaseModel from ondine import Ondine class ReviewAnalysis(BaseModel): sentiment: str # "positive" | "neutral" | "negative" score: int # 1–10 key_topic: str engine = Ondine( model="gpt-5.4-mini", prompt="Analyze this customer review.", output_model=ReviewAnalysis, # fully typed, validated batch_size=50, ) results = engine.run(df) # results.sentiment, results.score, results.key_topic — all typed columns
engine = Ondine( model="gpt-5.4", prompt="Summarize this support ticket.", batch_size=20, max_cost=5.00, # hard budget limit in USD ) # Always estimate first est = engine.estimate(df) print(f"Estimated: ${est.total_cost:.4f}") print(f"Batches: {est.total_batches}") # Run — stops at $5.00, checkpoints progress results = engine.run(df, checkpoint="tickets.ckpt.json")
from ondine import Ondine, ContextStore # Ground LLM outputs against known facts store = ContextStore.from_dataframe( df, key_columns=["employee_id", "department"], ) engine = Ondine( model="gpt-5.4-mini", prompt="Score employee performance 1–10.", context_store=store, # validates outputs vs. known data batch_size=50, ) results = engine.run(df) # Hallucinated scores flagged — not silently written
How it works
pandas or Polars DataFrame in, AI-enriched DataFrame out. Ondine handles the pipeline between.
Get started
Works with pandas and Polars, any LLM provider, and your existing data workflow.