Which LLM providers and SDKs does Khazad support?

Khazad covers SDKs built on httpx, including OpenAI, Anthropic, Gemini via google-genai, Mistral, and most OpenAI-compatible proxies (vLLM, Ollama, LiteLLM). SDKs using requests, aiohttp, or boto3 (AWS Bedrock) are not intercepted.

What are the privacy and security considerations?

Prompts are embedded and responses are stored in clear text in Redis. If prompts may contain PII or secrets, set a ttl, enable Redis AUTH/TLS, and treat the Redis instance with the same care as application logs.

What configuration should I start with?

Start at threshold=0.90 and raise it if you see wrong cache hits. Watch avg_hit_similarity in get_stats()—if it sits near your threshold, your traffic may be too diverse to cache safely.

Back to articlesLarge Language Models

Large Language Models

Khazad: Open-Source LLM Cache Cuts API Costs by ~50%

Hacker News10h ago5 min read

Key takeaway

Khazad is an open-source semantic cache that sits between Python applications and LLM APIs, replaying semantically similar cached responses instead of making redundant API calls. At a 0.50 hit rate, it reduces API call volume by ~50%, speeds up responses by ~96% on cache hits, and lowers costs by ~50%. It works transparently with OpenAI, Anthropic, Azure OpenAI, and compatible proxies, making it useful for teams running high-traffic FAQ bots, support tools, and development environments.

Summaries like this, in your inbox every morning.

3 Key Points

What happened
Khazad, an open-source semantic cache for LLM API calls, intercepts HTTP traffic at the transport layer and serves semantically equivalent cached responses via Redis Vector Sets. At a 0.50 hit rate, it delivers ~50% fewer API calls, ~96% faster responses on cache hits, and ~50% lower spend; it works transparently with zero changes to application code.
Why it matters
For teams running high-volume, repetitive LLM traffic—FAQ bots, support assistants, RAG systems, dev/test environments—Khazad offers cost and latency savings without rewriting application code. It supports multiple LLM providers (OpenAI, Anthropic, Azure OpenAI, and OpenAI-compatible proxies like Ollama and vLLM) through a single Python init() call.
What to watch
Khazad requires Python ≥3.10 and Redis 8 with Vector Sets support. It is httpx-only, so SDKs built on requests, aiohttp, or boto3 (AWS Bedrock) are not intercepted. Start with threshold=0.90 to control false positives, and treat the Redis instance with the same security care as application logs, since prompts are embedded and responses are stored in clear text.

FAQ

Which LLM providers and SDKs does Khazad support?: Khazad covers SDKs built on httpx, including OpenAI, Anthropic, Gemini via google-genai, Mistral, and most OpenAI-compatible proxies (vLLM, Ollama, LiteLLM). SDKs using requests, aiohttp, or boto3 (AWS Bedrock) are not intercepted.
What are the privacy and security considerations?: Prompts are embedded and responses are stored in clear text in Redis. If prompts may contain PII or secrets, set a ttl, enable Redis AUTH/TLS, and treat the Redis instance with the same care as application logs.
What configuration should I start with?: Start at threshold=0.90 and raise it if you see wrong cache hits. Watch avg_hit_similarity in get_stats()—if it sits near your threshold, your traffic may be too diverse to cache safely.

Discussion

No comments yet. Be the first to share your thoughts!

Earned vs Burned: New Claude skill measures AI delivery by business value, not effort

Hacker News6h ago

Stay ahead with AI news

Get curated AI news from 200+ sources delivered daily to your inbox. Free to use.

Get Started Free

Free · takes 30 seconds · unsubscribe anytime

1 minute a day. The AI essentials.

200+ sources · Email / LINE / Slack

Get it free →

Khazad: Open-Source LLM Cache Cuts API Costs by ~50%

Key takeaway

3 Key Points

FAQ

Discussion

Related Articles

NVIDIA launches AI toolkits for life sciences and robotics safety

OpenAI, Broadcom Launch Jalapeño AI Chip for LLM Inference

Open-source AI tool debuts for smart contract security audits

AI agents get Yocto/BitBake skills to reduce hallucinations

Zeus: open-source local AI agent with web and mobile UI

Earned vs Burned: New Claude skill measures AI delivery by business value, not effort

Stay ahead with AI news