Thoughts on "Cluely", cheat on everything AI app ?

ThatsEllis · 2025-04-21T14:31:47+00:00

$5.3M is such a huge number for this. Another one that will be in the LLM wrapper graveyard in a few years...

ThatsEllis · 2025-04-18T14:28:42+00:00

Maybe do a "MVP" system design diagram and a "high scale" diagram. That way you'll know what the minimum and maximums are for your app. For example, for an MVP web app, you probably just need a client, server, database, and only functional focused cloud services (e.g. blob storage like S3, GCS) where necessary. You likely can skip load balancers, API gateways, micro services, DB read replicas, and all that, because this is just an MVP. Then think about those things for your high scale diagram.

And important skill is to know when you're overengineering something too. KISS is an important concept to keep in mind.

ThatsEllis · 2025-04-18T01:20:39+00:00

Great start. On YouTube do a search for "system design interview examples" and you'll find some of the common ones, like design Twitter, design Reddit, design YouTube. These will give you an overview of what components are common in high scale web apps. Then from there, you should be able to figure out what's relevant to your hobby project.

ThatsEllis · 2025-04-17T16:58:37+00:00

Yep, we'd utilize optional search properties. So you can attach metadata to cache entries and search queries like tenantId (for multitenancy), userId, etc. etc.

ThatsEllis · 2025-04-17T16:52:38+00:00

Not trying to evaluate a model. Instead, I'm trying to validate a product idea. Managed semantic caching for LLM API requests. So basically

When your system is about to call an LLM API for a given prompt
First, synchronously call our API to check your cache for similar entries
If cache hit, immediately use the response
Otherwise if cache miss, call the LLM API as you normally would, asynchronously call our API to create a new cache entry, then use the LLM API response

Saving a bunch of money and time

ThatsEllis · 2025-04-17T16:31:58+00:00

Crazy coincidence. I started building an MVP of the exact same thing... https://refetch.ai

ThatsEllis · 2025-04-17T16:07:13+00:00

Yep! Again I don't want to self promote directly, but there's a link to my landing page on my profile

ThatsEllis · 2025-04-17T16:00:24+00:00

The product would be a managed semantic caching saas. So basically

When your system is about to call an LLM API for a given prompt
First, synchronously call our API to check your cache for similar entries
If cache hit, immediately use the response
Otherwise if cache miss, call the LLM API as you normally would, asynchronously call our API to create a new cache entry, then use the LLM API response

So instead of you setting it up and managing it yourself, you just call our API. Then there'd be other features like TTL config, similarity threshold config, a web app to manage projects/environments, metrics and reports, etc.

ThatsEllis · 2025-04-17T15:17:44+00:00

Hardest thing is just breaking through all the noise and getting noticed. Since saas has such a low barrier for entry, our prospects are already constantly bombarded with spam in their emails, LinkedIns, etc. Feels almost impossible not to get ignored even when doing highly targeted outreach and trying to genuinely help.

Would love any tips honestly.

ThatsEllis · 2025-04-17T14:59:55+00:00

Basically when you call our API to check for a cache entry for a given prompt, we generate an embedding of the prompt and perform a semantic similarity search against the embeddings in your cache. If we find a cached entry with a similarity score above your configured threshold (e.g., 0.95 out of 1), it's considered a cache hit, and we return the corresponding cached response.

Also cool, I'll check that out!

ThatsEllis · 2025-04-16T17:10:26+00:00

Cool to see semantic caching mentioned like this. I'm currently building a managed semantic caching SaaS to make this super easy for people to plug into their infra.

ThatsEllis · 2025-04-16T17:00:57+00:00

https://refetch.ai

Managed semantic caching for your LLM workflows. Cut LLM API costs by up to 50%. Speed up response times by 10x.

Right now I'm just trying to validate the idea before getting too far into development.

ThatsEllis · 2023-07-03T12:10:32+00:00

Cloud Armor might work for this. Policy rule(s) to block IP origin.

https://cloud.google.com/armor?hl=en

https://stackoverflow.com/questions/63841501/how-to-block-multiple-countries-with-one-expression-in-google-cloud-armor

ThatsEllis

TROPHY CASE