Integrate with an Existing LLM App
This tutorial walks you through adding Kaizen to a Python application that already uses an LLM. You’ll install the SDK, create a task, log feedback after each LLM call, and retrieve the optimized prompt once enough feedback has accumulated.
Prerequisites:
- A running CT server (
podman-compose upor Docker Compose) KAIZEN_API_KEYandKAIZEN_BASE_URLavailable- An existing Python app making LLM calls (we’ll use
litellmas the example)
Scenario
You have a support-ticket summarization function:
import litellm
def summarize_ticket(text: str) -> str:
response = litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize the following support ticket concisely."},
{"role": "user", "content": text},
],
)
return response.choices[0].message.contentYou want CT to automatically optimize the system prompt based on user feedback.
Install the SDK
pip install kaizen-sdkInitialize the client
Create a client once (at app startup) using environment variables:
Sync
import os
from kaizen_sdk import CTClient
client = CTClient(
api_key=os.environ["KAIZEN_API_KEY"],
base_url=os.environ.get("KAIZEN_BASE_URL", "http://localhost:8000"),
)Both KAIZEN_API_KEY and KAIZEN_BASE_URL can be set as environment variables. The constructor arguments are optional if the env vars are present.
Create a task
A task represents a specific LLM job in your application. Create it once during setup:
Sync
task = client.create_task(
name="summarize_ticket",
description="Summarize support tickets into one or two sentences",
feedback_threshold=50, # trigger optimization after 50 feedback entries
)
task_id = str(task.id)
print(f"Created task: {task.name} ({task_id})")Store task.id in your configuration or database — you’ll reference it every time you log feedback or retrieve prompts.
Log feedback after each LLM call
Before CT — a plain LLM call with no feedback:
def summarize_ticket(text: str) -> str:
response = litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize the following support ticket concisely."},
{"role": "user", "content": text},
],
)
return response.choices[0].message.contentAfter CT — log feedback with a score from your rating system:
Sync
from kaizen_sdk import CTClient
def summarize_ticket(text: str, user_rating: float | None = None) -> str:
response = litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": "Summarize the following support ticket concisely."},
{"role": "user", "content": text},
],
)
summary = response.choices[0].message.content
# Log feedback — score is optional, add it when users rate the output
client.log_feedback(
task_id=task_id,
inputs={"text": text},
output=summary,
score=user_rating, # 0.0–1.0 or None for unlabeled
source="app",
)
return summaryRetrieve the optimized prompt
Once optimization runs, replace the hardcoded system prompt with the one CT generated:
Sync
def summarize_ticket(text: str, user_rating: float | None = None) -> str:
# Retrieve the active (optimized) prompt — cached in memory for 5 minutes
prompt_obj = client.get_prompt(task_id)
system_prompt = prompt_obj.prompt_text or "Summarize the following support ticket concisely."
response = litellm.completion(
model="gpt-4o-mini",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": text},
],
)
summary = response.choices[0].message.content
client.log_feedback(
task_id=task_id,
inputs={"text": text},
output=summary,
score=user_rating,
)
return summaryget_prompt() returns the active prompt version from a local TTL cache (default: 5 minutes). On first call or cache miss, it fetches from the CT API. If no optimization has run yet, prompt_text may be None — always provide a fallback.
Monitor progress
Check how close you are to the optimization threshold:
Sync
tasks = client.list_tasks()
for t in tasks:
print(f"{t.name}: {t.threshold_progress} feedback entries")
if t.last_optimization:
print(f" Last optimized: {t.last_optimization}")
print(f" Active prompt score: {t.active_prompt_score:.2%}")Example output:
summarize_ticket: 37/50 feedback entriesOnce the count reaches the threshold (50 in this example), CT automatically triggers an optimization job.
Next Steps
- Speed up the first optimization: Seed historical data to bootstrap without waiting for production traffic
- Set up auto-PR: Have optimized prompts delivered as a GitHub or Bitbucket PR — see Auto-PR with GitHub or Auto-PR with Bitbucket Server
- Custom evaluators: Tune how CT judges prompt quality — see Custom Evaluators