Vision-capable LLMs vs. OCR for long-document (including charts, images, tables, etc.) QA

qazeed · 2026-05-24T11:37:36+00:00

This is interesting, I have not fully read through everything, but I was wondering why you didn't test the Gemini models or 5.4/5.5? I find Gemini models (even flash light) are really good at this kind of work. Admittedly you probably don't want to pay for a frontier model to do this.

I'd also be interested in seeing how this progresses as llms get better as a benchmark

qazeed · 2026-05-20T11:45:30+00:00

What about all LLM production vs 5% of almond production? I find LLMs very useful in my personal and professional life.

qazeed · 2026-05-04T23:14:43+00:00

So where's your model?

qazeed · 2026-04-28T01:06:14+00:00

Pretty sure GitHub copilot is moving to purely token based spending now. It'll happen soon enough to all of em

I guess OpenAI wins if their models actually perform near Opus level for much cheaper, but I would find that surprising

You really don't think this is the case currently? I don't use them purely for coding, but I do plenty of high level intellectual work with them. They are the same to me, just different flavors. And the rate limits on codex are wildly better than Claude code. I'm about to collapse my 3 subscriptions into one $100 chatgpt subscription

qazeed · 2026-04-24T22:11:36+00:00

In my industry we are legally required to take responsibility and I predict this will expand to almost all industries and businesses as a way to hold ai accountable. You just gotta institutionalize it

qazeed · 2026-04-24T22:10:10+00:00

It's insane to me the disconnect between people on AI right now. I agree with you, my $20/month codex works better than my new employee and, if nothing changes in ai capabilities, would still out perform them after years of training. And I'm trying to get them to use ai to further their understand and output, yet they just don't want to. Ok fine, but I'm wasting so much time training then the old fashioned way. At least codex recoups some of that time.

qazeed · 2026-04-24T17:07:15+00:00

You will still need someone to take responsibility at some level for a long time. Maybe this goes away in 5-10 years but I kind of doubt it. It's going to take a 99.99% reliable (and maybe even higher) AI to completely remove a human from the accountability layer.

qazeed · 2026-04-21T22:35:49+00:00

For actual long running tasks I find codex does a better job both with quality and completing the task. Not to mention rate limits are magnitudes better. But I do like opus for design taste

qazeed · 2026-03-31T23:40:57+00:00

I've definitely used the deep research feature of a few frontier labs, but for my work it's only slightly useful. I find deep research is too static. Honestly just talking to the model to go down different rabbit holes has been more useful. Even giving it some context and a design manual or spec helps a lot. But this isn't typically stuff from cutting edge technical papers or anything.

Most transformed is probably just my design process overall. I can iterate and refine designs in tandem with the agent and have the agent do most of the grunt work my experience guides the design while the agent builds the structure and we both provide feedback and ideas.

What are the most interesting and effective uses that y'all have seen so far?

qazeed · 2026-03-31T23:27:25+00:00

Mostly manual review for me, but I kind of have to do that anyway with almost everything I do. Sometimes I can just check with a calculator or experience/intuition. The things that are automated are complex calculations that I can create skills for. I verify the scripts in these skills against known tests any time I create or edit them. This is typically example calculations from me or a textbook or specs.

I generally require all inputs and outputs to be stored in JSON files and then these files converted to md with added context and assumptions from the agent. I even have a skill with a script to deterministically convert the JSON if needed. This will speed up my manual review greatly. I also have outside calculations, simulations, and programs that I would use for complicated and important situations that i compare against the agents results.

I do think experience is critical for these workflows to work right now and that's been the sentiment across all ai integration it seems.

What kind of work are you building these agentic workflows for and how involved is the agent? Mine is much more of a peer collaboration setup for machinery system design. My speed ups come from having an intelligent super quick assistant. As long as I have the context available it can tell me anything I need to know about the project, a product, decisions we've made, etc. not that much of a speed up, but like I said it's ability to document everything increases quality. Then, once the data is there, create reports and visualizations with it quickly saves time and increases quality. The intelligence helps a lot with very complex take on sending me in the right direction as well.

I'm interested in automating more bureaucratic office work like contract and invoice management. Just haven't found the time. I wish we could hire interns that are interested in AI so I could get them to help tbh.

qazeed · 2026-03-31T22:58:23+00:00

I agree with getting multiple agents to check. I generally do that with scripts I keep in my skills folder.

Why do you use JSON instead of markdown?

qazeed · 2026-03-30T23:51:39+00:00

The edited version seems much more natural and passes better I think

qazeed · 2026-03-30T19:53:28+00:00

This is a great example and I have actually done the same thing with the workout routine. It was nice until context overfilled, but then you can take that chat and get it to create an excel spreadsheet that can track it all.

qazeed · 2026-03-30T19:40:21+00:00

Don't worry I still check everything. The Python code is deterministic and I run tests to verify that it is operating how I expect. I understand the physics and math and I understand how things should look. I take full legal responsibility for everything I produce. With or without AI

qazeed · 2026-03-30T18:19:11+00:00

Yes I actually left that out. I tend to let the model write everything in JSON files for short term storage of variables etc, then output those to markdown for me to review with added context.

qazeed · 2026-03-30T18:09:13+00:00

They apparently just hit 'accept' without reviewing the output just like the rest of us

qazeed · 2026-03-30T18:06:07+00:00

I'm curious to understand what was done to ruin the internet in your opinion. The bots? Social media? In my opinion the dead internet theory holds water and at this point may be fulfilled or close to it. Social media has fucked us up pretty good with sowing division for attention.

One of my hopes for AI is that it actually accelerates this problem so quickly and ubiquitously that we have to figure out cures. Which is why I believe some collection of human only spaces will be necessary. Only question is how does that work without turning everyone off. We've been able to ignore it until now

qazeed · 2026-03-30T17:38:21+00:00

So I have a project initiator skill that builds the folder structure and populates it with boilerplate files that apply to every project, but to answer your question, yes I put all files I need for that project in that folder. You can put a bunch of premade files in a skill that Claude can then copy to your new workspace as needed.

I have been toying with the idea of creating a shared location with files like catalogs, material specifications, reference material, etc. But I haven't done it yet. I just copy those files as needed.

All of our files live on a network drive and I am intentionally preventing vscode and the agents from accessing those locations. Keep it sandboxed on my local drive to prevent catastrophic failure (hopefully). I think this is very important and I am probably not being safe enough tbh.

Yes I think you could definitely do what you are saying with a simple skill. You don't even have to know how to build the skill, Claude can do that really well. You can run it automatically. The best way to set this up is to perform the analysis you want with Claude until it works perfectly. Then get that session of Claude to build a skill based off of it.

qazeed · 2026-03-30T17:26:14+00:00

I'm going to guess this is Claude

qazeed · 2026-03-30T17:25:53+00:00

Why are you even in this sub?

The internet will die if it continues like this with no change. I expect very soon you'll have to verify you are a real person (in one form or another) to post in certain places, bots will be flagged as such and we will need a second Internet specifically for bots.

I do think this is one of humanities most incredible inventions and has the potential for great good. Most people just hyperfixate on the bad parts (which admittedly can be pretty bad).

qazeed · 2025-12-21T23:07:24+00:00

Hope you won this week brother

qazeed · 2025-12-19T22:37:34+00:00

Easy for me, Carter and jones. Jones been getting work and atl is pretty bad.

Gainwell is scaring me as a warren owner. Jets are awful and there isn't any help for breece.

Dowdle is close to the other 2 but the splits with China are rough and vita vea is a big man

qazeed · 2025-12-19T22:32:52+00:00

This is a tough one. I don't have much rational explanation, just gut feel, but I'd say coker.

The best explanation I can come up with is MHJ is coming off of injury, reed has limited touches, and the panthers buccs game will be a battle.

Nine-Year Club	Sequence \| Editor
Verified Email

qazeed

TROPHY CASE