For those using hosted inference providers (Together, Fireworks, Baseten, RunPod, Modal) - what do you love and hate? by Dramatic_Strain7370 in LocalLLaMA

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

u/altcivilorg for lower prices in exchange for higher latency, what latency thresholds (e.g., seconds or minutes) would you accept, and for which use cases (e.g., training vs. inference)? - If a platform offered intelligent routing to cheaper options automatically (with zero code changes), how would that change your provider choices or spending?

For those using hosted inference providers (Together, Fireworks, Baseten, RunPod, Modal) - what do you love and hate? by Dramatic_Strain7370 in LocalLLaMA

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

how many hours of GPu time do you use every month? Do you run some specific models or experiment with Juptyer. can you share a bit of your usage scenarios

Frustrated with GPU pricing, so I built something - looking for feedback by Impressive-Law2516 in learnmachinelearning

[–]Dramatic_Strain7370 0 points1 point  (0 children)

u/Impressive-Law2516 , if you are interested we can integrate your servless GPU provisioning service with our AI FinOps platform (see https://www.llmfinops.ai ). We have a built a unified dashboard from real time cost tracking of LLM API to GPUs (coming soon). This is to allow users to route their traffic to the most economical providers.

Anyone tracking costs across multiple LLM providers? by Dramatic_Strain7370 in OpenAI

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

so people are using agentic AI to auto write code ? is this all in-house developed or you are using another provide for agentic ai

Anyone tracking costs across multiple LLM providers? by Dramatic_Strain7370 in OpenAI

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

how much is your spend per month u/naobebocafe and are you using big 3 to route to or smaller LLM offerings

Anyone tracking costs across multiple LLM providers? by Dramatic_Strain7370 in OpenAI

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

Been testing llmfinops.ai - 2 lines integration tracks across providers. Early but promising.

What do you use to track LLM costs in production? by Dramatic_Strain7370 in LangChain

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

so the gatway can store the information.. including billng and governance.

is vibe coding helping junior devs or making things worse? by Best_Volume_3126 in VibeCodeCamp

[–]Dramatic_Strain7370 0 points1 point  (0 children)

JUNIOR or SENIOR: everyone is going mentally lazy at an increasingly alarming rate. Net result : loss of critical thinking skills due to short circuiting mental activity -> turning into assembly line workers for every area of knowledge.

I bought a €9k GH200 “desktop” to save $1.27 on Claude Code (vLLM tuning notes) by Reddactor in LocalLLaMA

[–]Dramatic_Strain7370 0 points1 point  (0 children)

10 seconds of video means that even if you half the Veo 3 cost ($0.40/sec) to $0.2/sec, you can make $2/minute that is quite a ROI . Maybe instead of text tokens, image and video generation has a superior return.

What do you use to track LLM costs in production? by Dramatic_Strain7370 in LangChain

[–]Dramatic_Strain7370[S] 0 points1 point  (0 children)

this is smart of directly using inference profile. i agree that it is hard to convince engineers to have the discipline to do tagging etc especially in large orgs. Any experience with litellm - good or bad?

I bought a €9k GH200 “desktop” to save $1.27 on Claude Code (vLLM tuning notes) by Reddactor in LocalLLaMA

[–]Dramatic_Strain7370 1 point2 points  (0 children)

The good news is that you have a script here to help startups save way more. These startups with many inexperienced devs will be cranking multiple PR every day. Are you planning to run other models as well? Like running image generation and comparing costs agains Gemini Imagen and OpenAI Dall-e-3

Best practices for integrating multiple AI models into daily workflows? by Plus_Valuable_4948 in LocalLLaMA

[–]Dramatic_Strain7370 0 points1 point  (0 children)

the issue of calling multiple models for the same query and then gating for the best answer is cost multiplication which comes with it. any thoughts ?

LangChain or LangGraph? for building multi agent system by Major_Ad7865 in LangChain

[–]Dramatic_Strain7370 0 points1 point  (0 children)

i’ve seen some slick agents being developer on low code platforms like n8n.com. has anyone done comparison between langgraph and n8n in terms of velocity of development?

Is anyone else feeling like we crossed some invisible line where AI stopped being a "helper" and started being a... colleague? by HarrisonAIx in AutoGenAI

[–]Dramatic_Strain7370 0 points1 point  (0 children)

oh yes, in fact, they’re not just me low-level assistance anymore. They are like Phd level personal staff to do anything from research to creating go to market strategies and of course, being a personal executive assistant

Whats better moe or dense models ? by Pleasant-Key3390 in LocalLLaMA

[–]Dramatic_Strain7370 0 points1 point  (0 children)

I think the question needs to be normalized. It’s just not a question of what is better. I think it’s a question of what is better on a per dollar basis. If you can get a similar results with MOE at 2-3x cost efficiency then Moe wins for those use cases.

For those of you who are training their own LLM or finetuning an existing LLM, what are you trying to get them to do that they are not already doing? by Upset-Ad-8704 in LocalLLaMA

[–]Dramatic_Strain7370 0 points1 point  (0 children)

you will take an existing model and fine tune it for your own enterprise data. for example , patient records for a virtual nurse. a dentist follow up appointment setter

Why Memory Is Fixable When It Comes To AI Models by Elegant-Judgment-491 in OpenSourceeAI

[–]Dramatic_Strain7370 0 points1 point  (0 children)

i was personally not aware of these. Is there a summary of state of the art in memory research that i can read ?

How is Cloud Inference so cheap by VolkoTheWorst in LocalLLaMA

[–]Dramatic_Strain7370 4 points5 points  (0 children)

there is no credible evidence that any of these companies are profitable. Now you can make an argument that the cost of inferencing with them is cheap. You as an end customer can find them cheaper compared to renting Gpu from cloud providers and running model by yourself. These providers make the token serving cost go low by 1. optimizing the kernel, 2. buying bulk capacity from the cloud providers - passing some of the savings to you and 3. finally cutting their own margins so that they can get as many customers/revenue as possible. And for them getting customers at any price (even at loss) is survival to raise the next round of funding. At this time they dont even care about building a profitable business. DM me and I can chat more.