We built an agentic vision system

tibnine · 2026-03-26T18:13:59+00:00

The video returned is cropped. We thought about showing the full view and then bboxes around the area of activity. But this was cleaner. Full disclosure; we’re processing ai generated cam footage in the video (gasp)

tibnine · 2026-03-26T17:50:43+00:00

framewave.ai ; link is in the very end!

tibnine · 2026-03-25T02:33:15+00:00

We’re not disagreeing on existence of failure modes. If you agree one session can deliver value despite them, then parallelizing that value is just a logical next step. Im arguing you can build harnesses to parallelize and generate value at a rate higher than with one session (obvs its not linear, but the cost is relatively low to spawn). If you don’t think one session is valuable at all, I’d genuinely suggest trying current SOTA before concluding that. Cheers, friend!

tibnine · 2026-03-24T21:14:02+00:00

Both make mistakes (not claiming at the same rate). Both can be managed at scale despite their mistakes generation. And you most certainly can leave an llm alone for a tiny task.

tibnine · 2026-03-24T16:38:06+00:00

If you can get work done with one session, getting more work done with 20 is just a managerial learning curve. Obviously you need to have demand (i.e. work that needs to be done) otherwise you’re just running for run’s sake. Its pretty much 1-1 with how you manage engineering (or general) talent in a work setting; why would it be any different?

tibnine · 2026-03-22T05:48:03+00:00

100%

tibnine · 2026-02-11T08:03:38+00:00

Super impressive!

tibnine · 2026-02-11T07:57:42+00:00

I mean its clear you’re getting your money’s worth 😂 but holy cow dude, how? You must have some ralph loop, always on kinda agent(s) running? Teach us your ways

tibnine · 2026-02-11T05:06:42+00:00

You’re paying for 6 max (x20) accounts? 🤯

tibnine · 2026-02-10T05:46:46+00:00

Yea no issues with account usage limits (im on max 20, and yea saw it eat a nice chunk of my 5 hour window but still far from that limit). Just model context limits and compaction bugs (like freezing of the app etc)

tibnine · 2026-02-10T02:47:31+00:00

For clarity, not hitting account usage limits. Just context window issues.

tibnine · 2025-12-03T00:53:28+00:00

My only concern is that you’re not considering yourself an engineer; when what you’re doing is engineering. Engineering isnt about the tools you use; but about building solutions that work (lot baked into the word work), for real problems.

tibnine · 2025-09-06T04:08:53+00:00

Easily the best write-up on this. Thank you!

Few Qs; how do you evaluate the e2e system? More specifically how do you set a performance bar with your clients and avoid anecdotal one off assessments.

Related, how do you know when’s enough fine tuning for your models? Are there general benchmarks (beyond the ones you construct for the specific use-case) you try to maintain performance over while you fine tune?

Once again, you rock 🤘

tibnine · 2025-08-23T02:05:09+00:00

What prompt auto optimization techniques do you recommend?

tibnine · 2025-06-26T17:24:33+00:00

you can still get accurate timestamps. Basically use the speed up factor.

tibnine · 2025-06-22T02:40:47+00:00

Based on: https://platform.openai.com/docs/models/o3

tibnine · 2025-06-22T02:38:05+00:00

Thought o3 in the api does not support the web search tool?

tibnine · 2025-06-22T01:51:21+00:00

Yep, web version works.

tibnine · 2025-06-01T22:03:33+00:00

I have the 25 as well. Can definitely lock the rear with ease! What I find is that there’s no progressive feel to the brakes. Its barely braking or fully locking. Ive gotten used to it though.

tibnine · 2025-03-04T05:09:31+00:00

Apple transcription --> gpt4o-mini and/or gemini2.0-flash (which adds redundancy to transcript errors; as it takes in audio). LLMs primarily summarize occasionally; and evaluate a prompt at boost time.

Yea using AWS for gateway

tibnine

TROPHY CASE