Show HN: A context-aware semantic cache for reducing LLM app latency and cost

We're Tom and Adrian, the cofounders of Canonical AI. We were building a conversational AI product and wanted to use semantic caching. We tried out a few different projects, but none of them were accurate enough. The problem with the semantic caches we tried was that they didn't have a sense of the context of the user query. That is, the same user query could mean two different things, depending on what the query is referencing.

So we changed course and started working on a semantic cache that understands the context of the user query. We've developed a number of different methods to make the cache more aware of the context. These methods include multi-tenancy (i.e., user-defined cache scopes), multi-turn cache keys, metadata tagging, etc.

We'd love to hear your thoughts on it!

This post does not have any comments yet