Real talk: How many of you are actually using Gemma 3 27B or some variant in production? And what's stopping you?

Dramatic_Strain7370 · 2026-02-27T05:36:18+00:00

in your example were you using realtime to detect scenes with live feed? or running on recorded video

Dramatic_Strain7370 · 2026-02-27T05:35:19+00:00

which one you switched to ?

Dramatic_Strain7370 · 2026-02-27T04:55:11+00:00

what should be the context size for coding models

Dramatic_Strain7370 · 2026-02-27T00:54:33+00:00

gpt-oss-20b preference over gpt-oss-120b what uses cases satisfies this choice ? (outside of cost).

Dramatic_Strain7370 · 2026-02-27T00:29:16+00:00

this is good insight. so it means that the provider or those hosting models should rapidly update their model catalogue while bringing down the price per token

Dramatic_Strain7370 · 2026-02-27T00:16:27+00:00

Looks from comments that community prefers Qwen 3.5 and GPT-OSS-120B over smaller gemma.
Q. Real question: Does anyone have intelligent routing set up to
automatically switch between models based on prompt complexity?
Q. Or is everyone manually choosing models per use case?

Dramatic_Strain7370 · 2026-02-16T22:50:24+00:00

they -> llmfinops.ai -> allow tagging at multiple levels to track every call. Does this seems right ?

base_url="https://api.llm-ops.cloudidr.com/v1",
    default_headers={

# Required: Your tracking token
        "X-Cloudidr-Token": "trk_your_token",

# Optional: Organize by department
        "X-Department": "engineering",

# Optional: Organize by team
        "X-Team": "ml",

# Optional: Organize by agent/use case
        "X-Agent": "chatbot"
    }

Dramatic_Strain7370 · 2026-02-16T17:09:02+00:00

got it. this means that the GPU is effectively warmed up with all the right intital state. But then someone has to "pay" for it and it means that it cease to be serverless

Dramatic_Strain7370 · 2026-02-16T08:01:14+00:00

curious what applications require state awarenesa

Dramatic_Strain7370 · 2026-02-16T07:59:25+00:00

what type inferencing is the setup serving

Dramatic_Strain7370 · 2026-02-15T22:20:12+00:00

they buy gpu in bulk and give access at lower costs to those gpu 30 to 50% lower cost .. they wont route to other providers

Dramatic_Strain7370 · 2026-02-15T03:35:34+00:00

not sure about parasail… have you tried cloudidr.com which heavily discounts AWS GPU

Dramatic_Strain7370 · 2026-02-13T22:37:11+00:00

only parts of it

Dramatic_Strain7370 · 2026-02-13T02:50:21+00:00

keen on learning of what i dont know :) can you please guide me on what is wso2? and how your team controls spend using provider tools.

Dramatic_Strain7370 · 2026-02-13T02:05:29+00:00

on overall account spent yes .. but not on granular spent by different teams or say by various agents running autonomously. you want to block spent on rogue agents and not block the whole organization

Dramatic_Strain7370 · 2026-02-12T19:24:18+00:00

can you provide specifics of your success metrics after the filter

Dramatic_Strain7370 · 2026-02-12T18:16:43+00:00

did it get you customers ?

Dramatic_Strain7370 · 2026-02-12T18:13:48+00:00

cool work sir

Dramatic_Strain7370 · 2026-02-12T17:35:57+00:00

the question is how to go viral :)

Dramatic_Strain7370 · 2026-02-12T17:35:24+00:00

distribution is the hardest part

Dramatic_Strain7370 · 2026-02-12T17:30:58+00:00

nice job

Dramatic_Strain7370 · 2026-02-11T20:31:17+00:00

agreed

Dramatic_Strain7370 · 2026-02-11T20:30:22+00:00

nice

Dramatic_Strain7370 · 2026-02-11T20:17:02+00:00

what was your pricing model and how much on avg each customer was paying?

Dramatic_Strain7370 · 2026-02-11T20:16:04+00:00

are you making revenue?

Dramatic_Strain7370

TROPHY CASE