Skip to Content
TutorialsIntegrate with Existing App

Integrate with an Existing LLM App

This tutorial walks you through adding Kaizen to a Python application that already uses an LLM. You’ll install the SDK, create a task, log feedback after each LLM call, and retrieve the optimized prompt once enough feedback has accumulated.

Prerequisites:

  • A running CT server (podman-compose up or Docker Compose)
  • KAIZEN_API_KEY and KAIZEN_BASE_URL available
  • An existing Python app making LLM calls (we’ll use litellm as the example)

Scenario

You have a support-ticket summarization function:

import litellm def summarize_ticket(text: str) -> str: response = litellm.completion( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Summarize the following support ticket concisely."}, {"role": "user", "content": text}, ], ) return response.choices[0].message.content

You want CT to automatically optimize the system prompt based on user feedback.


Install the SDK

pip install kaizen-sdk

Initialize the client

Create a client once (at app startup) using environment variables:

import os from kaizen_sdk import CTClient client = CTClient( api_key=os.environ["KAIZEN_API_KEY"], base_url=os.environ.get("KAIZEN_BASE_URL", "http://localhost:8000"), )

Both KAIZEN_API_KEY and KAIZEN_BASE_URL can be set as environment variables. The constructor arguments are optional if the env vars are present.

Create a task

A task represents a specific LLM job in your application. Create it once during setup:

task = client.create_task( name="summarize_ticket", description="Summarize support tickets into one or two sentences", feedback_threshold=50, # trigger optimization after 50 feedback entries ) task_id = str(task.id) print(f"Created task: {task.name} ({task_id})")

Store task.id in your configuration or database — you’ll reference it every time you log feedback or retrieve prompts.

Log feedback after each LLM call

Before CT — a plain LLM call with no feedback:

def summarize_ticket(text: str) -> str: response = litellm.completion( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Summarize the following support ticket concisely."}, {"role": "user", "content": text}, ], ) return response.choices[0].message.content

After CT — log feedback with a score from your rating system:

from kaizen_sdk import CTClient def summarize_ticket(text: str, user_rating: float | None = None) -> str: response = litellm.completion( model="gpt-4o-mini", messages=[ {"role": "system", "content": "Summarize the following support ticket concisely."}, {"role": "user", "content": text}, ], ) summary = response.choices[0].message.content # Log feedback — score is optional, add it when users rate the output client.log_feedback( task_id=task_id, inputs={"text": text}, output=summary, score=user_rating, # 0.0–1.0 or None for unlabeled source="app", ) return summary

Retrieve the optimized prompt

Once optimization runs, replace the hardcoded system prompt with the one CT generated:

def summarize_ticket(text: str, user_rating: float | None = None) -> str: # Retrieve the active (optimized) prompt — cached in memory for 5 minutes prompt_obj = client.get_prompt(task_id) system_prompt = prompt_obj.prompt_text or "Summarize the following support ticket concisely." response = litellm.completion( model="gpt-4o-mini", messages=[ {"role": "system", "content": system_prompt}, {"role": "user", "content": text}, ], ) summary = response.choices[0].message.content client.log_feedback( task_id=task_id, inputs={"text": text}, output=summary, score=user_rating, ) return summary

get_prompt() returns the active prompt version from a local TTL cache (default: 5 minutes). On first call or cache miss, it fetches from the CT API. If no optimization has run yet, prompt_text may be None — always provide a fallback.

Monitor progress

Check how close you are to the optimization threshold:

tasks = client.list_tasks() for t in tasks: print(f"{t.name}: {t.threshold_progress} feedback entries") if t.last_optimization: print(f" Last optimized: {t.last_optimization}") print(f" Active prompt score: {t.active_prompt_score:.2%}")

Example output:

summarize_ticket: 37/50 feedback entries

Once the count reaches the threshold (50 in this example), CT automatically triggers an optimization job.


Next Steps

Last updated on