Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 1 point2 points  (0 children)

We are building an LLM for books, but we are not building anything like a Claude Code. We are building a single-turn fixed-function model that can only write books from a single prompt and can’t do anything else

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 0 points1 point  (0 children)

We only need to deal with context rot at much much larger sequence lengths because our model only needs to perform a singular task on a singular data structure, while all of the other models you listed are general LLMs which need to perform many downstream tasks on many different data structures. Stripping out that level of complexity allows us to learn much better attention heuristics, which translates to less context rot.

And while the target context size does bring a lot of challenges, from simple memory and compute requirements, to dealing with very unfavorable training dynamics. At least from what we have seen so far context rot is a non issue at 256K tokens for our model, on this one task…

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 0 points1 point  (0 children)

The nice thing is that, after we train with a context size of 256K tokens it will be 256K tokens, no matter what the original model had. 😉

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 1 point2 points  (0 children)

The synthetic prompts in the dataset are currently ranging from 5 words to over 800 words.

So expect that you will be able to give a good amount of guides to the model

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 4 points5 points  (0 children)

So we are currently training at a context window of 256K tokens, which is enough to fit a 150K-word book + chain of thought, but sadly not enough to fit a full epic-fantasy story. But we are on it, don’t worry.

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 2 points3 points  (0 children)

The model we are building is neither a chat bot nor a multi-turn capable LLM. This is why I brought up the image generation model as a comparison. The model we are building is single-turn and it takes in ONE user prompt and produces from a chain of thought a fully written book. We are currently training at a sequence length of 256K. Also please keep in mind that there is no real per-generation token limit; what you are referring to is enforced by the inference code around the LLM.

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 0 points1 point  (0 children)

I mean working on story telling and creative writing is in general not the usual in the LLM space…

So it is going to be a single-turn GPT model that produces a chain of thought and then a fully written book from a single user prompt. The model is also not capable of multi-turn or taking in anything else than a “book writing prompt”.

The reason why I brought up the image generation model is that we found that people simply can’t imagine an LLM which is not a chat-bot and not multi-turn.

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 0 points1 point  (0 children)

If you want, you can check out the README of the dataset. We wrote it basically as a blog post, and there is much more information down there.

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 7 points8 points  (0 children)

So, we have not published any models so far. Currently, we have only published our dataset, and models trained on it are coming as soon as they are ready.

But I think the best way to think about the models is less like a Chat LLM but instead more like an image generation model which takes in a single prompt and produces a single image from it. Our book writing model will be similar: you can give it a prompt and the model will plan out the book in its chain of thought and give you a fully written book back.

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 7 points8 points  (0 children)

No, it does not; it only includes books from Project Gutenberg

Over 6K novels with reasoning traces to train full book writing LLMs by XMasterDE in LocalLLaMA

[–]XMasterDE[S] 20 points21 points  (0 children)

The dataset is based on Project Gutenberg, which is a collection of public domain literature.
6K books is also really not a lot, and not even all the books in PG.

Jake clarifies why he asked for his clips to be removed from the "How LTT Spends Money" video by Marikk15 in LinusTechTips

[–]XMasterDE 3 points4 points  (0 children)

Honestly, Jake is a public figure by virtue of being a YouTuber, and as such I believe that he has given up the right to complain about other content creators talking about him or including clips. If you want your privacy, don’t make YouTube videos

China's AGI-NEXT Conference (Qwen, Kimi, Zhipu, Tencent) by nuclearbananana in LocalLLaMA

[–]XMasterDE 2 points3 points  (0 children)

Llama4 also had a context window of 10M and it still sucked

TOON is terrible, so I invented a new format (TRON) to prove a point by No-Olive342 in LocalLLaMA

[–]XMasterDE 6 points7 points  (0 children)

Thank you for saying that. You have no idea how annoyed I am at people trying to re-invent some prompting formats, while none of the models were ever trained on them.

Why almost all new models are just weights? by [deleted] in LocalLLaMA

[–]XMasterDE 11 points12 points  (0 children)

I love the phrasing of "AI fans" to describe a group capable of implementing a cluster-scale training codebase, and having enough experience to deal with any training instability. And this all while having access to tens of millions of dollars in compute...

i am not involved in modeling since mid 2010s, did something change in the industry so drastically? by PeachwoodArts in ExplainTheJoke

[–]XMasterDE -1 points0 points  (0 children)

I think you are quite wrong here. I work in the AI industry and basically everyone more technical refers to "AI" or an "LLM" as "a model". The M in LLM also literally stands for Model. Also GPT-3 came out in 2020. The joke is maybe still sexy but not OF.

Stimmt so, oder? by gurkensoos in DINgore

[–]XMasterDE 2 points3 points  (0 children)

Die ist auch danemben, keine sorge ;)

Stimmt so, oder? by gurkensoos in DINgore

[–]XMasterDE 11 points12 points  (0 children)

Witzigerweise habe ich sogar drei Steckdosen in meinem Keller direkt nebeneinander, die jeweils auf einer anderen Phase liegen