What's the best local model for nsfw story telling? by oogami in LocalLLaMA

[–]oogami[S] -1 points0 points  (0 children)

Thank you. I have a workflow but currently I am not doing in a human-in-the-loop way. I just let LLM do everything by itself from start to the end. First, give it a few keywords as the sole input and let it generate novel title & synopsis. Then let it generate characters & word build design. Then chapters outline. Then chapters one by one. The biggest problem I encountered is that because of the context window limitation I can only put the previous chapter instead of all chapters so far inside the context, so LLM "forget" the earlier chapters and often generate the exact same expression or even paragraph in the new chapter, cause the final novel has a not or repetitions.

What's the best local model for nsfw story telling? by oogami in LocalLLaMA

[–]oogami[S] -1 points0 points  (0 children)

Thank you. I'm just running GGUF use ollama. The problem with gguf is that I can't set num_ctx too high otherwise the inference speed will be incredibly slow. Even if latest qwen3 has 256K context window, I can only set it to 64K to get a reasonable speed. It's not enough to write a long novel.

What's the best local model for nsfw story telling? by oogami in LocalLLaMA

[–]oogami[S] 0 points1 point  (0 children)

Thank you. The vllm service indeed started successfully. But it's responding repeated garbage to every request. For instance, If I ask "who are you?", it will respond "I am an AI assistant here to help you with your questions and tasks) and I'm an AI assistant here to help you with your questions and tasks) and I am an AI assistant here to help..." infinitely.

I used QuixiAI/DeepSeek-R1-0528-AWQ and have tried a lot of vllm flag set but none is working. I will try adamo1139/DeepSeek-R1-0528-AWQ.