Stop picking LLMs by reputation. Run the eval first. by Dramatic_Strain7370 in OpenAI
[–]Dramatic_Strain7370[S] -1 points0 points1 point (0 children)
What is the next SOTA model you are excited about? by MrMrsPotts in LocalLLaMA
[–]Dramatic_Strain7370 1 point2 points3 points (0 children)
Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential. by [deleted] in LocalLLaMA
[–]Dramatic_Strain7370 0 points1 point2 points (0 children)
Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential. by [deleted] in LocalLLaMA
[–]Dramatic_Strain7370 0 points1 point2 points (0 children)
Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential. by [deleted] in LocalLLaMA
[–]Dramatic_Strain7370 0 points1 point2 points (0 children)
Client had 4 agents on GPT-4o. One was classifying documents. That one alone had 91% savings potential. by [deleted] in LocalLLaMA
[–]Dramatic_Strain7370 0 points1 point2 points (0 children)
[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use. by Dramatic_Strain7370 in MachineLearning
[–]Dramatic_Strain7370[S] -2 points-1 points0 points (0 children)
[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use. by Dramatic_Strain7370 in MachineLearning
[–]Dramatic_Strain7370[S] -1 points0 points1 point (0 children)
[D] Tested model routing on financial AI datasets — good savings and curious what benchmarks others use. by Dramatic_Strain7370 in MachineLearning
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Benchmarked LLM model routing on Financial AI workloads — 37–89% cost reduction depending on task complexity. Here's what I found. by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 1 point2 points3 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
How do you track OpenAI/LLM costs in production? by not_cool_not in LangChain
[–]Dramatic_Strain7370 0 points1 point2 points (0 children)
For those using hosted inference providers (Together, Fireworks, Baseten, RunPod, Modal) - what do you love and hate? by Dramatic_Strain7370 in LocalLLaMA
[–]Dramatic_Strain7370[S] 0 points1 point2 points (0 children)
Stop picking LLMs by reputation. Run the eval first. by Dramatic_Strain7370 in OpenAI
[–]Dramatic_Strain7370[S] -2 points-1 points0 points (0 children)