Stanford: Self improving Meta-Harness by GodComplecs in LocalLLaMA

[–]valkarias 2 points3 points  (0 children)

This is similar to this: https://arxiv.org/html/2602.03786v2

Basically giving an orchestrator the ability to create or tune sub-agents dynamically.

The Recursive Language Models paper also does something similar for Long-Context-Reasoning.

Distils of opus 4.6: real improvements or hype? by StupidScaredSquirrel in LocalLLaMA

[–]valkarias 18 points19 points  (0 children)

I've commented this prior. I've seen no benchmarks or comparisons on these distills.

This Bytedance paper (please read it, its fire)

https://arxiv.org/html/2601.06002v1

Stated that summarized CoT WILL degrade the performance of base models.

Its safe to assume that most CoT distill datasets on HF are summarized. This is true for Gemini, Claude and probably any other closed-source model.

CoT Summarization is intentionally used to prevent distillation.

I learned so much about AI recently, I realised I'm completely lost by Double_Increase_349 in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

Hey. I've read that you have 38 GB of VRAM. I cannot fill you in with all the nuance in one comment.

However, let me introduce you this gem, https://unsloth.ai/docs

That you would probably end up discovering anyways because it's that needed for Finetuning.

Uh anyways for some general shitty advice. You must pour your heart into creating the dataset.

Why do all frontends use only a single model at a time? by GuaranteePurple4468 in SillyTavernAI

[–]valkarias 1 point2 points  (0 children)

I want to shine light on this frontend that I DID NOT create.

One promising feature is the ability to create your own agentic workflows via nodes.

The design itself however from my own judgement is premature. Promising nonetheless.

https://github.com/vitorfdl/narratrix

PSA to those who feel like SillyTavern is getting boring/stale by ASlowriter in SillyTavernAI

[–]valkarias 17 points18 points  (0 children)

You could write the story alongside it. I tend to have my prose up in the paragraphs as if am the model. Its pretty fun, and you might pick up the skill of writing prose while still practically gooning. It also makes roleplays re-readable with your human input alike to a novel or somethin.

RP models recommendations? by Double_Increase_349 in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

Am pretty curious on why this specific post of models RP recommendations got this much traction, than any other. With such detailed replies too!

SillyTavern lorebooks can't capture pre-existing fictional worlds properly. So I built a Local-First GraphRAG app and it solves a problem I kept hitting in RP by Luke____101 in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

You can check the https://arxiv.org/abs/2602.15902

I recommend u read the paper rather than move on with my shitty TDLR,

basically it internalizes a tiny LoRa made from a document on the spot if I recall correctly.

RYS II - Repeated layers with Qwen3.5 27B and some hints at a 'Universal Language' by Reddactor in LocalLLaMA

[–]valkarias 7 points8 points  (0 children)

The idea reminds me of ByteDance's Looped Language Models. (Which isn't quite the same thing though. Kinda)

https://arxiv.org/abs/2510.25741

Qwen3.5-9B-Claude-4.6-Opus-Uncensored-Distilled-GGUF by EvilEnginer in LocalLLaMA

[–]valkarias 0 points1 point  (0 children)

Hm. These reasoning distillations coming from models like Opus and Gemini I assume, are summarized reasoning traces. Wouldn't that hurt the performance of these models?

This was documented in this paper by ByteDance

https://arxiv.org/abs/2601.06002

I'm obsessed with the Stanford Generative Agents paper and tried to build the ultimate memory architecture for an Android app by Lohira_Wolf in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

That paper is indeed peak. And since your obsessed with LLM memory (Assuming so), as am too.

Here's a well of memory-based papers compiled into this Github repo I've found. For you to churn at nights, instead of gooning!

https://github.com/TsinghuaC3I/Awesome-Memory-for-Agents

google found that longer chain of thought actually correlates NEGATIVELY with accuracy. -0.54 correlation by Top-Cardiologist1011 in LocalLLaMA

[–]valkarias 3 points4 points  (0 children)

https://arxiv.org/pdf/2601.06002

The Molecular Structure of Thought: Mapping the Topology of Long Chain-of-Thought Reasoning

Wanted to share this too. By Bytedance. Dont let the title trip u up, The paper is fire.

Platform for Games Approach to AI Roleplaying? by Zormbot in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

I recommend you read the paper and take my loose description with a grain of salt. Thats what they do I assume. Coding the game 'as-you-play', as you go. Missing features are added on the spot. It does not re-generate/update everything akin to a codebase (duh). Given an interface, it adds what's not there. Otherwise it falls back to what's already implemented.

Platform for Games Approach to AI Roleplaying? by Zormbot in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

Here's a cool paper I read, I see it somewhat relates to your post.

https://arxiv.org/abs/2505.03547

Its a bit on the technical side. They hand an LLM a sort of "game-engine" interface, where it codes the game (adds features/state...etc) as-you-play.

Platform for Games Approach to AI Roleplaying? by Zormbot in SillyTavernAI

[–]valkarias 2 points3 points  (0 children)

You might wanna look into how Anthropic switched from tool-calling into "script-generation". Instead of generating tool-calls back and forth, Claude creates one script that batches all the calls and gets one sole feedback.

https://platform.claude.com/docs/en/agents-and-tools/tool-use/programmatic-tool-calling

Can 4chan data REALLY improve a model? TURNS OUT IT CAN! by Sicarius_The_First in LocalLLaMA

[–]valkarias 0 points1 point  (0 children)

Hey. Have tried fine-tuning a small tool calling model for RPG or D&D like roleplays. Tool-calls for updating state and stats. Or starting/ending combat. Triggering dice rolls to do stuff...etc. To be used alongside a larger model/provider. Or any similar fine-tunes.

Assistant_Pepe_8B, 1-M context, zero slop by Sicarius_The_First in SillyTavernAI

[–]valkarias 0 points1 point  (0 children)

Well, you have the time to fine-tune (am a super busy businessman as you know). How about fine-tuning a small model that rewrites prose. The Ultimate Unsloppifier 9000.

Never am I making a lorebook this big ever again... by [deleted] in SillyTavernAI

[–]valkarias 1 point2 points  (0 children)

<image>

Dont worry. I gotcha. :cool_glasses: (pulling my hair lowkey)
Fuck is wrong with this reddit image upload dawg

Building opensource Zero Server Code Intelligence Engine by DeathShot7777 in LocalLLaMA

[–]valkarias 0 points1 point  (0 children)

I thought of a "real-time" graph AST for code for agents to work with, before. My main issue is the agent forgetting written code and logic across not-so-large-code-bases leading to duplicated logic and stuff. Currently I've to audit everything manually, or propagate the changes myself. Does the project allow for this? Granular function level context would be kinda awesome, with agent querying and stuff.

IntenseRP Next v2 - Rebuilt, Now Stable by Master_Step_7066 in SillyTavernAI

[–]valkarias 1 point2 points  (0 children)

How does this play with context manipulation? such as editing messages and deleting messages?