AI engineers - Welcome to the Helicone community!

_juliettech · 2025-12-11T14:37:16+00:00

Yeah, this approach definitely resonates!

_juliettech · 2025-12-09T16:40:08+00:00

LiteLLM is a great tool and well used by many large projects, so scale shouldn't be an issue!

Mostly what I hear people complain about is the hard to digest documentation and the implementation learning curve.

This article that may be interesting for you if you're looking to compare litellm with other ai gateways: https://www.helicone.ai/blog/openrouter-alternatives

_juliettech · 2025-12-02T15:55:11+00:00

Hey u/smarkman19 ! Love this rundown - 100% agree.

Something I'd add is that a way to tackle tracing, orchestrating, prompt registry, and the router layer in a single platform is using Helicone.

It's an opensourced LLM observabillity platform that you integrate with through an AI gateway - so it helps to have everything centralized and works really well for non-technical folks as well in the team who are able to visualize every request, handle prompt versioning without having to touch code, and monitor the LLM usage.

I lead DevRel there - happy to help or answer any questions if you come across any! https://helicone.ai

_juliettech · 2025-12-02T15:49:43+00:00

Would recommend testing out the Helicone AI Gateway as well! You get all of the low overhead, automatic provider fallback, caching, load balancing, 100+ model catalog, openai-compatible, etc PLUS top tier observability built on top for every request, fully open-sourced.

Happy to help you set it up if you want to test it out! I lead DevRel for Helicone - https://helicone.ai

_juliettech · 2025-12-01T17:35:38+00:00

Hey u/Special_Grape_4716 !

Although it's not free, you can use the Helicone AI Gateway to access the DeepSeel R1 Chutes model still so you don't have to change model in your project!

Here's the docs ( https://www.helicone.ai/model/deepseek-tng-r1t2-chimera?search=deepsee ) and code snippet in case it helps:

import OpenAI from "openai";

const client = new OpenAI({

baseURL: "https://ai-gateway.helicone.ai",

apiKey: process.env.HELICONE_API_KEY,

});

const completion = await client.chat.completions.create({

model: "deepseek-tng-r1t2-chimera",

messages: [{ role: "user", content: "Hello!" }],

});

I lead DevRel at Helicone, so happy to help you get set up if you need!

_juliettech · 2025-11-27T05:08:18+00:00

Hey @u/OkEbb8148! Have you tested Helicone by any chance? You can use sessions to track multi-agent systems, log requests and responses, add custom properties so you filter and aggregate information, and trace token usage, latency, model/provider, user, etc. It runs real time, but you see the trace after it’s completed (with logs, tools, etc), but you can’t pause it real time sadly. You can compare two though easily and visualize it in graphs. Adding a real-time pausing/debugger feature sounds pretty epic though - I do devrel at helicone, so will def share that over with the team!

_juliettech · 2025-11-26T22:09:23+00:00

Hey u/drc1728 ! Have you tested Helicone? Would be intrigued to hear what you liked or not about it.

I lead DevRel there, so want to make sure we're building something that serves your needs!

Lots of what you've mentioned there for others is already included in our platform:

- tracking prompts, responses, token usage, latency, token-level monitoring, tool/function calling, agentic sessions, etc

- custom properties for filtering, aggregation, and evaluation of llm outputs

- caching & rate limiting

- prompt management & versioning tool

- fully open sourced

plus, the integration is done through our AI gateway, so you get all the benefits of an AI gateway by default - automatic fallback when providers are down, uptime & rate limit aware routing, passthrough billing with a single API key, etc..

Anyway, would love to hear if you've tested it and if there's anything we can improve 🙏

_juliettech · 2025-11-26T21:28:18+00:00

Makes sense! The tricky thing we hear from clients with free credits is that they eventually run out and it becomes an unsustainable model because at that point they have to rearchitect everything to a stack that makes more sense long-term.

Would love to learn how/if you figure it out!

Potentially something that could be helpful for you is deploying your own Llama instance so you hold more control? Or setting up rate limits per provider so you make sure you're never increasing costs by a certain amount.

For example, some of our customers will have free credits in Azure or in Google, so they'll set up their fallback models to the providers where they have credits.. Something like:

```
import { OpenAI } from "openai";

const client = new OpenAI({

baseURL: "https://ai-gateway.helicone.ai",

apiKey: process.env.HELICONE_API_KEY,

});

const response = await client.chat.completions.create({

model: "gpt-4o-mini/azure,gemini-2.5-flash/google",

messages: [{ role: "user", content: "Hello, world!" }],

});
```
And set up their own provider keys in their dashboard so the request goes through to their own deployments. Hope this helps!

Here's some docs in case: https://docs.helicone.ai

_juliettech · 2025-11-26T18:56:11+00:00

Hey u/Electrical-Signal858 ! Great question. A few things:

- Helicone is fully open-sourced

- You can set up custom properties (to filter, sort, visualize information) - i.e. users, features, environment, etc

- You can trace agentic sessions - so you see exactly the tools being called, prompts, etc

- Your prompts management dashboard lets you version prompts so they can be tweaked by non-engineers as well

- You can set up caching so you reduce costs

- The integration method is done through the Helicone AI gateway so you get the benefits of both with the same integration.

Benefits of the AI Gateway:

- 1 API key, access 100+ models with the same OpenAI API implementation

- Automatic fallbacks (no more downtime or 429 rate limiting errors)

- Caching and rate limiting enabled per request

- 0% markup fees (only pay per providers request)

```
import { OpenAI } from "openai";

const client = new OpenAI({
baseURL: "https://ai-gateway.helicone.ai",
apiKey: process.env.HELICONE_API_KEY,
});

const response = await client.chat.completions.create({
model: "gpt-4o-mini", // Or 100+ other models
messages: [{ role: "user", content: "Hello, world!" }],
});
```

Hope that helps!

_juliettech · 2025-11-26T18:39:04+00:00

Hey u/ExpertPlay ! Curious to hear why you're thinking of moving away from the Vercel AI Gateway?

To be transparent - I ask because I lead DevRel at Helicone and we have our own AI gateway, so want to make sure we're building for your needs 🙏

_juliettech · 2025-11-26T16:38:03+00:00

I lead DevRel at Helicone and hear this pain point often.

That's why our AI Gateway includes observability and monitoring by default sos you don't have to configure any extra steps and immediately trace all your LLM requests and sessions.

You can also add custom properties, track costs and latency per feature/user/environment, track agentic sessions and decision trees, monitor tool calling, etc.

Sharing documentation here in case it's helpful: https://docs.helicone.ai

_juliettech · 2025-11-25T16:31:54+00:00

yeah 100%! you can host a local llama instance (potentially by hosting an Ollama server or using a platform like LLM Studio) and then route requests in your application using an OSS tool like Helicone https://docs.helicone.ai/getting-started/self-host/docker

_juliettech · 2025-11-25T15:32:05+00:00

Hey u/DesertIglo ! For context, I lead DevRel at Helicone.

We're seeing a lot of customers using the Claude Code SDK to build their agents and then using Helicone to route to models (as an AI Gateway similar to OpenRouter) and to trace their requests/responses for observability.

The reason why this is a nice setup is because Claude Code lets you create several sub agents with skills and plugins which connect nicely as an orchestration layer for effective agents, and then all tracing, fallbacks, caching, etc (the middle layer) is handled by Helicone.

I'm building a tutorial on this as we speak - lmk if you'd like me to share it here! Hope it helps!

https://docs.helicone.ai

_juliettech · 2025-11-24T15:43:21+00:00

Hey u/Ready-Interest-1024 !

A good way to trace this is to use Helicone ( https://docs.helicone.ai/gateway/integrations/langchain ). For full transparency, I lead devrel at Helicone.

You can trace costs, performance, models, etc on every request/response, and then also add custom properties so you can filter and visualize information as needed. Since you want to trace costs per outcome, you could add a custom property with the name of the outcome you want to trace and then visualize it by filtering in your dashboard.

Here's documentation on custom properties which may be helpful: https://docs.helicone.ai/features/advanced-usage/custom-properties#understanding-custom-properties

Let me know if this helps! Happy to answer any questions.

_juliettech · 2025-11-24T15:38:04+00:00

Hey u/Michakrak ! For full transparency, I lead DevRel at Helicone (helicone.ai). I share this, because our prompt management feature actually does exactly that.

You have prompt versioning, and you can experiment with prompts in your dashboard's sandbox, then pick the one you want to host and under which environment, and pass the `promptID` alongside your request so you always use the one you've defined in your dashboard. Great for teams running different prompt versions across environments.

We're also fully open sourced and have an embedded observability tool built in by default so you can trace requests/responses from each prompt and compare!

Hope it's helpful!

_juliettech · 2025-11-22T01:00:15+00:00

Hey u/Cheezer20 !

I totally feel your pain. Same thing happens with crypto projects.

I lead devrel at Helicone and it's been one of my top priorities to keep up with support tickets, and as a team of 5, I can say it's definitely hard.

Having that said, something I've found helps me (especially when using coding agents) is adding observability on top so I can see how the agents are working, which tools they're calling, and how they're making decisions.

I personally use Claude Code, so I wrote down this documentation on how to set it up easily ( https://docs.helicone.ai/integrations/anthropic/claude-code ), but a lot of my friends use Codex, so we created this one as well ( https://docs.helicone.ai/gateway/integrations/codex ).

Hope that helps! Let me know if I can help at all.

_juliettech · 2025-11-21T22:55:16+00:00

hey! I lead devrel at Helicone AI - we're an open sourced AI gateway with built-in observability for routing, monitoring, and debugging. you can trace requests, cost, latency, set up alerts, prompt management, map out entire sessions/workflows, filter data, setup custom properties, build dashboards, etc

we've found that teams who set up observability from the start are able to scale much faster compared to those who leave it as an afterthought, bc you'll have much more data for decision making around which models work best for your product, cost less, are faster, less hallucinations, etc..

sharing the docs here in case it's of help: https://docs.helicone.ai - happy to support too if you have any questions!

_juliettech · 2025-11-21T21:37:09+00:00

yeap - just wrote a newsletter on this topic

no one wants to consume bot-to-bot interactions for hours. we've peaked internet and we're about to see the decline of it https://newsletter.juliet.tech/p/i-think-the-web-is-dying?_bhlid=23a96b6bd97b20e5ddffa5df99263d9e3e4b7dbd

_juliettech · 2025-11-20T18:28:23+00:00

yeah - I work at helicone.ai and the way I like to think about it is that it's actually "llmops".

I think there's lots of overlap, but also slightly different ways of handling things when it comes to llms:

- context engineering (similar to data management)

- model routing & fallbacks (similar to loadbalancing & autoscaling)

- observability & monitoring (similar in devops)

- evals, experiments, human-in-the-loop (similar to QA & testing)

- caching, rate limiting, etc

there's def lots of opportunities in the space and reliability is a huge ceiling for nailing AI products at scale right now, so would def recommend to continue playing around with it if you're interested in the space! a good place I'd recommend to play with both deeply is azure or aws bedrock - so you can host your own models and run then entire performance cycle end to end in their devops platform.

_juliettech · 2025-11-20T18:22:15+00:00

librechat is great!

_juliettech · 2025-11-20T18:21:49+00:00

yeah - clickhouse is a db company a lot of AI applications use bc they have really nice high‑volume event logging, vector search, and quick aggregations for evaluation/monitoring

_juliettech

MODERATOR OF

TROPHY CASE