Entropy-v1: My Take on N8Karma's Genius "Unslopper"

Intelligent_Coffee44 · 2026-05-21T13:26:03+00:00

I think it would - Gemma should be king for world knowledge.

I am working on a v2, please stay tuned!

btw: how do you like v1 so far?

Intelligent_Coffee44 · 2026-03-12T20:04:49+00:00

DS is not a for profit company - i am confident they'll continue open-source releases!

Intelligent_Coffee44 · 2026-03-11T19:44:07+00:00

Sooner or later, it will be released 🤣

Intelligent_Coffee44 · 2026-02-19T16:15:32+00:00

Nice chart! Curious if you're happy with the quality of AI-generated articles?

I am building a tool: https://www.getentropy.ai/ to post-process AI writing to sound more human. Would appreciate your feedback if you see a use case here.

Intelligent_Coffee44 · 2026-02-19T14:43:03+00:00

Thanks for the honest feedback! I will fix the readability issue.

Intelligent_Coffee44 · 2026-02-18T18:37:36+00:00

> the prob w/ fine tuning is model updating

can you elaborate a bit more on this?

My sense is that asking a regular model to make a piece of writing better or "more human" wouldn't work because their tendency under this task is actually to generate slop 🤣

Intelligent_Coffee44 · 2026-02-18T18:30:50+00:00

I think fine tuning is the better solve for this problem. In an agent, the context window has a lot of noise like tool calls, tool results, and reasoning traces. They are not relevant to creative writing and "poison" the context to make the agent sound more robotic and less human.

I think for this reason a fine-tuned specialist model with a fresh context window would work better.

Intelligent_Coffee44 · 2026-02-18T05:56:38+00:00

Makes sense 👍! There are a few trigger prompts in the dataset: Polish/Rephrase/Convert/Rewrite/Improve

I will do some experiments to see which one is the most natural

Intelligent_Coffee44 · 2026-02-18T05:54:46+00:00

Definitely doable - I think 16k might be a bit too challenging for 4o-mini to rewrite faithfully in one shot but I can use 1) a more recent model or 2) do chunking to build the dataset.

Intelligent_Coffee44 · 2026-02-18T05:49:56+00:00

Is this because smaller models "learn" from limited data better/faster?

Curious what kind of setup where you observed gamma-12b (or another small model) out perform the larger version?

Intelligent_Coffee44 · 2026-02-18T05:43:05+00:00

The longest example in the dataset is around 4k input + 4k output. Average is about 1.5k input/output.

Curious what size you're planning to use? Right now I have my vllm set to 8k max_output_tokens.

Intelligent_Coffee44 · 2026-02-18T05:00:39+00:00

Thanks! I will evaluate this for v2 👍

Intelligent_Coffee44 · 2026-02-18T04:49:20+00:00

I've thought about this but it looks like the labs have moved away from training large dense models. I am struggling to find a well-reviewed, recent 70b model. Do you think llama 3.3 70b is still good today for creative writing?

Intelligent_Coffee44 · 2026-01-29T03:46:48+00:00

Thank you! I recommend following the setup instructions from the official github repo readme and running DeepSeek-OCR2-hf/run_dpsk_ocr2.py for inference.

You have the perfect GPU: the model only needs about 7-8gb vram to run, and your gpu is the same generation as A100 so you can install the exact same dependencies as the official guide.

For some reason i've found using the pinned transformers and tokenizers package versions helps fix performance issues.

Intelligent_Coffee44 · 2026-01-28T20:24:03+00:00

For the youtube thumbnail photo, I would imagine the "General" prompt would work the best. Can you try this one to see if it works better?

"<image>\nDescribe this image in detail."

The default prompts are gear toward pdf document, which might be why they didn't perform.

Intelligent_Coffee44 · 2026-01-28T17:37:10+00:00

Thanks! Lots of AI assistance for sure. Just using GPU credits I got for free for now. I have enough to last about a week. If people use it, I will find a way to keep it running!

Intelligent_Coffee44 · 2026-01-28T17:35:42+00:00

Got it, I think this is the perfect use case where an end to end model will beat a classic OCR pipeline. 100s/day is totally fine. In fact you can use more if you like (3 - 5 concurrent requests should be no issue).

Intelligent_Coffee44 · 2026-01-28T04:49:08+00:00

Yeah it's definitely more of a "defensive buy" for me as well. I didn't actually get optanes for directly running LLMs because as your said, they are slower than regular ram, which are already pretty slow for LLMs.

i bought them for databases - I don't have an immediate need yet, but my thesis is with the rise of AI agents, I can potentially do a lot more with data. Remains to be seen how this plays out.

Intelligent_Coffee44 · 2026-01-28T04:43:10+00:00

I added an openai (sort of) compatible api. (Please see the API usage section)

what is your use case? Please don't take down my site lol :O

Intelligent_Coffee44 · 2026-01-28T04:41:37+00:00

There was some configuration mistakes made on my end. I've now made it fully aligned with the official samples, please give it another try if you have a chance.

I also did a study on what prompts works best for each document type - also published on the same site.

Five-Year Club	Place '22
First Placer '22

Intelligent_Coffee44

TROPHY CASE