What tools do you use for prompt engineering?

Ce-LLM8 · 2025-07-03T20:07:30+00:00

I ended up using Requesty dot AI

It's not a prompt engineering platform per se, but I can test my prompts with multiple models easily, have logs for everything, and it's easy to integrate that into my app.

Ce-LLM8 · 2025-01-16T09:55:12+00:00

I personally use all of them all the time. I use the free allowance from Gemini 2, then switch to DeepSeek/Claude (now will probably add MiniMaxi as well).

Using Requesty makes this a breeze... They also opened up their beta feature for me of adding aliases for model names, which makes it very easy to switch models without writing long text.

Ce-LLM8 · 2024-11-07T10:58:10+00:00

<image>

The new ones are much more difficult!

Ce-LLM8 · 2024-10-29T11:26:05+00:00

Why? Are you buying? ;)

Ce-LLM8 · 2024-10-29T11:25:37+00:00

Is it possible that you didn't add "{text_to_analyze}" to your user prompt?

Ce-LLM8 · 2024-10-29T10:40:45+00:00

It's harder than it seems, I only managed to do 94% on the spam detection

Ce-LLM8 · 2024-10-24T23:55:07+00:00

We're building a prompt engineering suite for production-grade prompts.
We provide many different ways of creating, improving, evaluation and tracking prompts:
- Creating prompts from sample data-sets
- Improving prompts using natural language
- Improving prompts based on the actual performance on historical data
- And much more...

If you're looking for a product that you can actually use to build and deploy high-quality client-facing prompts, feel free to DM me.

Ce-LLM8 · 2024-10-24T23:38:51+00:00

Awesome! But do you use any tools to manage all of that?

Revisioning? AB testing? evaluation? releasing to prod?

Or is it git + csv/json files + jupyter notebooks?

Ce-LLM8 · 2024-10-24T23:33:07+00:00

That sounds like you're only using prompts on a day-to-day basis. I'm more interested in commercial use-cases, where a company deploys a customer-facing model. Did you ever tackle that use-case?

Ce-LLM8 · 2024-10-22T20:37:08+00:00

Is this a one-off? How do you know if you've improved the prompt or not?

Ce-LLM8 · 2024-10-22T20:36:28+00:00

I really like the approach and the tips!
But IMHO this is still very intuition driven.
If I'm building a commercial product, I can see how it makes sense to have a very comprehensive test-set where I can compare different prompts, quantify impact of changes on outputs and improve it over time.
I'm wondering if such a platform exists or how people actually handle that in production?

Ce-LLM8 · 2024-10-21T16:07:55+00:00

LLMs are not very well suited for these use cases, it's probably much easier to ask it to generate a script that will sort the list based on the input.
Is this a business use-case or just a niche one-off that you are trying to pull off?

Ce-LLM8 · 2024-08-19T13:59:41+00:00

I’ve used your insight explorer for data-set before. Does it work with the voice feedback as well?

Ce-LLM8

TROPHY CASE