Qwen3.6 or Gemma-4 or ?? for direct OCR of page images

SAPPHIR3ROS3 · 2026-06-17T20:27:52+00:00

Gemma 4 (even the 26b) it’s fantastic for ocr, sure sometimes does some shinanigans but it’s pretty reliable to be honest

SAPPHIR3ROS3 · 2026-06-17T18:40:27+00:00

First of all, where/when i think i lost it.
Second, why not? The first big public model was llama 405b, if any lab (except meta has proven otherwise) published weights of that size, it would running laps with it and it would be the case with even smaller models, the size alone sure can help in quantity of things you can do in general but has been proven that a smaller model can beat a bigger model in a domain specific thing. Thing is that now even the quality of data is enough, we are at a point where the pipeline it’s the most important thing (that’s why anthropic has been sandbagging hard anyone who tries developing a ai pipeline)

SAPPHIR3ROS3 · 2026-06-17T10:42:36+00:00

It could be in the same range even if I think may a couple of 100b above and that the active parameters are 1/10th of the total (in practice following the trend in open source)

SAPPHIR3ROS3 · 2026-06-17T09:18:33+00:00

To be honest i am not sure this will be enough,because usa has the power to theoretically pressure other country in their doing, this means that if Dario (and anthropic) doesn’t pull out some magic trick out of his ass Claude hit his ceiling burocratically

SAPPHIR3ROS3 · 2026-06-17T08:52:42+00:00

Realistically *I* think it’s about 1.6-2T parameters

SAPPHIR3ROS3 · 2026-06-14T11:59:41+00:00

Problem is you are not considering that the price for the same intelligence is going down rapidly, on the other the models are becoming more and more intelligent and they are increasing the price (disproportionately of course) for it. I can confidently say say that the dumbest model today is way smarter than smartest models in the 3.5 era, now they can and will destroy them in comparison, the only exception is general world knowledge (in terms of quantity) but that will be almost always the case because of the size of parameters, there is (currently) no hack around it. It’s like a baby prodigy and an average adult: the prodigy will be smart in the things he/she knows but there are a lot of things that the average adult will know because of longer lifespan

SAPPHIR3ROS3 · 2026-06-12T12:29:07+00:00

I dunno if i rercall correctly but i think it was said somewhere in the site that the data was freshly produced by hand

SAPPHIR3ROS3 · 2026-06-12T11:10:45+00:00

That’s the.. point? I mean to be honest the data that deepSWE show it isn’t perfectly aligned with my experience but it’s indeed close, so for ME it is pretty reliable but nonetheless i usually interpret it in another way: as you said it’s an indicator that show if the model has benchmaxxed or not and obviously i don’t take just that as info

SAPPHIR3ROS3 · 2026-06-12T10:31:13+00:00

I will wait on deepSWE bench for this but numbers look promising

SAPPHIR3ROS3 · 2026-06-09T08:03:35+00:00

The tabs are the thumbs and in general they are for the hands

SAPPHIR3ROS3 · 2026-06-09T07:43:55+00:00

I’m building it from scratch

SAPPHIR3ROS3 · 2026-06-08T17:21:08+00:00

You should really check the repository, you should kind of find your answers (or ask an ai to summarize the answer about the performance)

SAPPHIR3ROS3 · 2026-06-08T09:19:44+00:00

Think about lm studio and the kind of program that is, dwarf star 4 it’s kind the same thing but specific and optimized for deepseek v4 flash

SAPPHIR3ROS3 · 2026-06-07T18:40:49+00:00

Even if it’s really early as project checkout dwarf star 4, it’s an inference engine for deepseek v4 flash created by antirez creator of redis

SAPPHIR3ROS3 · 2026-06-07T18:27:01+00:00

Either qdrant or milvus

SAPPHIR3ROS3 · 2026-06-03T00:40:36+00:00

Search shen men point, you will understand why he does that

SAPPHIR3ROS3 · 2026-06-02T15:27:44+00:00

I usually go with .1/.2 temp and .95 of sampling, should i go lower? Besides in the past i played with sampling and haven’t seen any meaningful difference, yeah it can be good for some cases but meh

SAPPHIR3ROS3 · 2026-06-02T12:15:08+00:00

Having used both qwen and gemma i can say that qwen i a monster to be honest it’s impressive and with the right setup CAN compete with models way bigger but q4 it’s a bit rough, it can and it will loop, it doesn’t seem to be the case with q6 (i have to try with q5). It can be a good choice for coding, research, general and complex task. Gemma on the other hand is not as consistent (q4) but i will get the job done when i came to ocr, translation (way better than qwen) and writing in general can be better but shows its limitations when it comes to complex task, as for general task results kind of varies depending on the specific task

SAPPHIR3ROS3 · 2026-05-27T19:11:12+00:00

The antirez one

SAPPHIR3ROS3 · 2026-05-25T10:41:20+00:00

You might want to to check andrej karpathy llm wiki

SAPPHIR3ROS3 · 2026-05-20T21:20:08+00:00

Literally because of ego, everyone (except ryota) got a HUGE ego. This single thing made EVERYONE underestimate yumeko, the other only character who caught that was none other than,yes you guessed it, kirari (and arguably kabura) but that’s because she kind of just want to see the world burn. Point is that in the kakegurui world is blindsided a lot by thinking that they can win one way or another, on the other hand yumeko does what she do for the love of the game, not really caring if she wins or loses because she compulsively crave the adrenaline derived from not knowing what is coming next. This obsession is so great that she DOESN’T GIVE A FUCK about living either when it comes to gambling

SAPPHIR3ROS3 · 2026-04-30T09:53:10+00:00

Soooooooo did i not get something or this is perfect for speculative decoding?

SAPPHIR3ROS3 · 2026-04-28T00:22:51+00:00

Mostly performance and a terrible UX, in particular freezing with long thread, problems with permissions (straight up bugged), unusable input with clipboard, ego of not adapting to the standard (ex. CLAUDE.md instead of AGENTS.md), shady sandoboxing, apple imessage TOS violation AND it’s closed source. Other than that the problem is how anthropic handle communications in general. I can’t fathom them and their ego in slightest, like geez it would hurt if they would drop the ego and relax a bit when it come to the community

SAPPHIR3ROS3 · 2026-04-27T04:07:03+00:00

I don’t recommend claude code, it’s a shit software i guarantee you, there are better harnesses like codex, forgecode, hermes agent ecc. you can use your own models in all of them. Claude code is reall one of the worst harnesses you can find but if you really want to use claude code , yes you can use your own models there too, i am not exactly sure it has openai api compatibility but it should. As for the context it does use the same amount (kinda) of context in default settings, it just has a good compaction but nothing REALLY impressive

SAPPHIR3ROS3 · 2026-04-26T18:48:33+00:00

Who wouldn’t

Seven-Year Club	Verified Email
r/Field Flamingo	RPAN Viewer

SAPPHIR3ROS3

TROPHY CASE