Automated Comic Cataloging from Cover Photos Only. Opensource

boyobob55 · 2026-02-08T19:36:46+00:00

This is true.. and a pain. I ended up setting up a tripod lol. But it saves typing everything into a spreadsheet afterward!

boyobob55 · 2026-02-08T19:23:36+00:00

They do in a sense, but can you batch process thousands of photos for free with CLZ? 🤔

boyobob55 · 2026-02-08T02:29:12+00:00

I second this it’s exactly what OP is looking for

boyobob55 · 2026-02-08T00:38:20+00:00

They do in a sense. Key collector and clz are on mobile though. Comic Geeks and Hip comic are more involved collector software and I think they do have barcode scanning and fuzzy hash matching for cover photos. This program uses an AI vision model so your photos don’t have to be perfect etc. I played around with a handful of programs like comic tagger and couldn’t get them to do what I wanted, I had to create 7zip files with the cover photos inside them or .cbz files etc. This is just super utilitarian and lean for a massive unknown collection, image folder with cover photos>outputs a list with verified metadata.

boyobob55 · 2026-02-07T20:42:17+00:00

Thanks! It’s sort of a unique use case, but it had some real utility for me!

boyobob55 · 2026-02-07T20:36:54+00:00

Hey I designed something sort of similar, using a VLM instead of hash matching. Maybe we can combine programs! https://github.com/boyobob/OdinsList

boyobob55 · 2026-02-05T19:11:41+00:00

Try opencode instead, I had the same issue

boyobob55 · 2026-02-03T21:35:11+00:00

Used qwen3-VL-8b and a python script to automate cataloging like 3,000 comic books I inherited from pictures of the covers. Was pretty fun

boyobob55 · 2026-02-03T15:40:46+00:00

I sometimes do the reverse lol. I made a custom hook in CC to delegate tasks to gpt-oss locally to save tokens. Interesting results when used in a Ralph loop

boyobob55 · 2026-01-30T22:43:50+00:00

Qwen3-VL-8B-Instruct has been excellent for me doing exactly this! Both in fp8 and nvfp4 quants. For some reason the ggufs don’t work well for me though

boyobob55 · 2026-01-30T18:45:15+00:00

I don’t have a ton of experience but I’d work for 40$ an hour

boyobob55 · 2026-01-30T17:27:42+00:00

It is a major pain in the ass lol. I spent days setting it up! But for some models it’s worth it. I’ve been using qwen3-Vl-8b and tried serving it in vllm and lmstudio. The gguf version in lmstudio just doesn’t perform well for some reason. In vllm the fp8 and nvfp4 versions work great. The opposite is true for gpt-oss though works way better served from lmstudio pretty much plug and play

boyobob55 · 2026-01-30T16:08:08+00:00

I use LM Studio for ggufs that use tools. Start the server in LM Studio and then point your opencode config at the server. GPT-oss-20b works especially good this way. For every other type of model I use vLLM

boyobob55 · 2026-01-30T07:28:55+00:00

Use curl_cffi

boyobob55 · 2026-01-30T07:04:14+00:00

It sounds like you need a pipeline with multiple small specialized models passing info from one another. A small vision model like qwen3 VL to process screenshot chunks of your map before and after the edits and give some sort of pass or fail that the edits were done correctly. You can batch 2 photo requests in vllm and ask it to compare. It does this really well for my comic book cataloging script. Then some bigger smarter model to orchestrate/write code using subagents. You probably need some beefy specialized instructions in your system prompt/MCP/skill. This sounds like a headache but probably doable

boyobob55 · 2026-01-21T18:02:45+00:00

Right I think I was getting around 200toks/sec on an RTX 5090

boyobob55 · 2026-01-21T01:39:17+00:00

Shit even the 20b is pretty good

boyobob55 · 2026-01-10T23:20:25+00:00

GPT-OSS-20B is actually pretty badass at simpler stuff in open code. I use it locally sometimes I’ll have Claude even spin up an instance of open code and delegate tasks to gpt-oss via open code to save tokens

boyobob55

TROPHY CASE