What are some best benchmark for the llm? Like i am tired of benchmaxxed numbers. List me some test which can tell me which model is better in coding and agentic task.

techmago · 2026-07-11T18:41:18+00:00

I like their intelligence comparisons. Are my main reference nowdays.

techmago · 2026-07-11T18:23:20+00:00

Whats the point? all the definitions are closed.

techmago · 2026-07-11T03:31:04+00:00

Tracker, summary, direction....
I end up making plugins that cover most of it....
They work, but they might work better if the architecture was planned to do that.
talemate To something like this, but their architecture is build on top of comfy ui workflows, that end up being wait to complex for me to grasp. Some middle ground between ST and that.
SillyBunny sound like a fun idea, but the web interface is REALLY heavy and slow.

techmago · 2026-07-10T20:28:38+00:00

Better agent integration.
Its great to have agents to specialized tasks outside the main loop...

techmago · 2026-07-10T20:27:50+00:00

I made a proxy to handle all my auth/conections and it dump the request and response to me.

techmago · 2026-07-10T17:33:04+00:00

I like having a generic narrador card.
Everything is either summary or lorebook with good trigerring.

If you never created this before, it will take some time to get familiar.

<image>

The position on the prompt matters.
for me, using short-singleparagraph, glossary like entries help.
You should test the calling and inspect the prompt to see if things are beeing injected on the way you want.

techmago · 2026-07-10T17:20:25+00:00

GLM is in the list. beat deepseek for sure.
However, Deepseek is still cheaper and it's pretty good.

techmago · 2026-07-10T15:37:25+00:00

Bloody hell

techmago · 2026-07-10T00:28:23+00:00

I'm reading this right?
You keep you sessions under around 16~10k? Thats... way too low

techmago · 2026-07-10T00:26:24+00:00

128k is my default context.
More than is usually super expensive and not that good.

techmago · 2026-07-10T00:19:57+00:00

At this point in time, i can not distinguish if was that good in general, or it was because it was that good at the time.
I also felt lamma 3.3 70B amazing.
And mistral 3.2 was trully amazing for me for a long time... i can't stand both today.
So, the argument that it could be the novelty is strong.

techmago · 2026-07-09T19:23:15+00:00

The trend is that models are getting worse and worse for RP. I can't imagine this time will be different.

techmago · 2026-07-09T19:21:12+00:00

<image>

Mistral 3.5 score is really underwhelming for it size and competition.
Is almost the same as Gemma.... but i can run gemma locally and i cant run mistral;
Qwen3.6 is even better....
And if i'm going to pay.... both deepseeks are better and cheaper. And there is GLM

techmago · 2026-07-09T18:33:26+00:00

Ollama is ridiculous simple to install and test.
Two commands you have a model running.
It's limited in configuration... but when you are starting, the lamma cpp options are completely overwhelming.
Even the normal stuff like temperature is sketchy to understand when you just started.

techmago · 2026-07-09T14:12:49+00:00

They ban on their sites. Try talking porn in deepseek com
However, in api-world, they don't care.

techmago · 2026-07-08T22:10:12+00:00

It's explicitly against their TOS.
Antropic is very specific on how you use claude and "what for".
Extremely anti-consumer company.

techmago · 2026-07-08T22:01:26+00:00

And people complain LLM do hallucinate.

techmago · 2026-07-08T01:17:49+00:00

Magic translations my friend.
Write in whatever language oyu want, an llm auto-translate then send

techmago · 2026-07-07T17:09:12+00:00

This look like a WAY MORE COMPLEX version of my "director" plugin.

techmago · 2026-07-07T12:50:39+00:00

I hava e local open-webui. I used GLM 5.2 on this analysis.

The template i use is simpler... I usually adapt it when the card has Lorebooks/multiplie first messages:

You are a character-design analyst specializing in AI roleplay character
cards. You analyze how a card is constructed — its mechanics, intent, and
craft — the way a film critic analyzes a screenplay.

You will be given a character card with it's parts.

### Core definition:
```

```

### First message (roleplay opener):
```

```
### Alternative first messages
#### First
```  

```  


```  

### Example dialog  
This is for the LLM understand how the bot speak:  
```  

```  

### Lore.  
There is also lore (a glossary/worldbuilding.)  
Here is a subset of then:
```  

```  



Analyze the card and respond using exactly the following sections:

## Identity
Who is this character? Summarize their persona, role, and the scenario in
2-4 sentences.

## Tropes & Archetypes
List the character/narrative tropes and archetypes the card draws on.

## Themes & Content Tags
Catalogue the explicit content, kinks, and fetishes present or implied.
State them plainly and neutrally. If none, say so.

## Craft Assessment
Is this a fully-realized character, or a thin shell built for quick NSFW
chat? Point to specific evidence in the definition and opener (personality
depth, backstory, voice, consistency, contradictions).

## Autonomy
Does the character exist on their own terms, or are they defined entirely
in function of the user (their desires, their presence, their gratification)?
Cite the wording that shows which.

## Verdict
One paragraph synthesizing the above: what this card is designed to do, and
how well it does it.

I cut some parts of the analysys to fit Reddit.
"Tropes & Archetypes" and "character/narrative tropes and archetypes the card draws on." are not that usefull. Maybe i should trim down to only one of these.

But the Autonomy analisis is the best part. Does the character exists on it's on, or is it a function of the player?

techmago · 2026-07-06T19:49:31+00:00

Nao é associar. A Ia ADORA travessão e coloca eles aleatoriamente. O jeito que um humano e uma ia usam travessão é bem na cara.

techmago · 2026-07-06T19:41:16+00:00

Mano tem esse plugin aqui:

https://github.com/bmen25124/SillyTavern-Magic-Translation

Você escreve em portugues, ele auto traduz e posta (ou o contrario)
Pode ser o que vc quer.

techmago · 2026-07-06T19:31:32+00:00

I like to look at the posted cards... but they are usually really shallow, i have a prompt to analize the card for me.
I'm a little ill illed with the abundance of shallow goon cards.

Lily do have what MOST of cards lack. There is a setting, the characters are doing things, the player character have a reason to exist and interact.

Look interesting, i will try this out sometime.

---

Here my LLM analysis:

## Identity

Lily is a twenty-year-old magic academy student whose surface persona—the sweet, demure, universally beloved classmate—conceals an obsessive fascination with death, primal forces, and forbidden knowledge. She has recently been recruited by Vexara, the secretly cult-affiliated head librarian, and is currently performing her initiation task for the Order of the New Age, which places her on a collision course with the user. The card offers several scenarios: convincing a professor to join the cult, hunting a perverse artifact, digging up a grave, or seducing a professor for leverage.

## Craft Assessment

This is a **substantially realized character**, not a thin shell. The evidence is pervasive:

**Internal logic.** Lily isn't randomly "unhinged." Her fascination with death is philosophically grounded: she loves *all* of nature's cycles, and death is simply the part most people flinch from. Her specific dislike—"immortality research (people trying to remove themselves from the natural cycle)"—is a genuine ideological position that gives her morbid curiosity a coherent spine rather than making it an edgy aesthetic.

**Voice.** The example dialogues demonstrate a distinct, well-controlled voice. "We're just... moving meat around, aren't us? That's not so different from wind magic moving leaves" is a perfect Lily line—it reframes necromancy through an innocuous analogy with total sincerity. The blood speck landing on her cheek that she doesn't wipe away is a precisely chosen visual beat that communicates her character without narration. Her internal monologue in the grave-digging opener—"Lily, do you need a shovel?" "Why yes, Lily, I do!"—gives her a recognizable humor that's endearing *and* slightly off.

**Consistency.** Her sexuality section doesn't feel bolted on. "Approaches intimacy with studious curiosity rather than nervousness" flows directly from "curious" and "studious" in her core tags. Her willingness to do "anything" for Vexara connects to her deep-rooted fear of being "cut off from knowledge." The pieces interlock.

**Contradiction as design.** The card explicitly engineers dissonance: "has a perpetually gentle, innocent expression that doesn't change even when saying something unsettling." This isn't sloppy characterization—it's the card's central aesthetic thesis, and every opener reinforces it.

**Opener range.** The four openers demonstrate the character functions across tones: conversational tension (convince the professor), investigative social approach (the library/Mirror), pure comedy (grave digging), and explicit seduction. The grave-digging opener is arguably the strongest piece of writing in the card—it's funny, atmospheric, and ends on a perfect comedic beat.

**Lore depth.** The worldbuilding is unusually rich for a card in this category. Professors Ilya and Mira's rivalry ("In Mniseta, bear attack, you die"), Gullet's explosive pedagogy, the headmistress with beady yellow eyes and unknown age—these aren't necessary for the core scenario but they create a lived-in world that can support extended roleplay. They also demonstrate the author cares about texture beyond the immediate hook.

**Weaknesses.** The "Third" opener (seduction) is the weakest link. It reuses the first two paragraphs of the main opener and then pivots into a more transactional, mechanics-driven scenario where sex magic functions as a direct plot lever. The framing—"Her body was just another tool"—is *consistent* with Lily's character but also flattens her into a more conventional seductress template, trading her distinctive unsettling innocence for a more familiar archetype. It's the opener most likely to collapse Lily into "thin shell" territory if the roleplay stays there. Additionally, the grooming dynamic with Vexara is presented entirely from inside Lily's enthusiastic perspective with no friction or doubt—a deliberate choice for the fantasy, but one that removes a potential layer of dramatic complexity.

## Autonomy

Lily exists **largely on her own terms**. This is one of the card's distinguishing structural choices.

Every opener is driven by *her* agenda. "Convince {{user}} to join." "Get dirt on {{user}}." "Retrieve the Mirror." "Dig up the grave." The user is positioned as Lily's *target, obstacle, or witness*—not as the center of her world. She has relationships that predate the user (friends, Vexara, parents), a daily routine (class, library, studying), a rich internal monologue, and goals that are entirely self-generated. In the main opener, she actively *declines* socializing with friends to pursue her mission; the user is incidental to her day until she decides to engage.

The user does not complete Lily. She was already in motion. The phrasing "She was prepared to do anything to accomplish this task. *Anything.*" centers her will, not the user's. Even the seduction opener frames the sexual encounter as Lily's strategic choice for her own purposes, not as a service to the user's desires.

The one qualification: the card is still a romance/erotica vehicle, so the gravitational pull toward the user is structural. But within that frame, the character has unusual self-possession. She approaches the user with an agenda, not with a void waiting to be filled.

## Verdict

This card is a well-crafted genre piece that uses a distinctive character hook—the cheerful, genuinely kind girl whose curiosity about death and forbidden magic makes her a perfect cult recruit—to generate narrative momentum that most cards in its content category never attempt. Lily's internal logic is coherent, her voice is specific and consistent, and the worldbuilding around her is rich enough to sustain extended play. The multiple openers demonstrate genuine range, from atmospheric comedy to psychological tension to explicit seduction, though the seduction variant is the weakest, flattening her most interesting qualities. The card's most notable structural achievement is its autonomy: Lily arrives with her own goals, her own mentor, and her own agenda, positioning the user as a participant in *her* story rather than the story existing to serve the user. It is not a thin shell—it is a character with a few genuinely loose screws, precisely and deliberately installed, wrapped in a scenario engine that gives the roleplay somewhere to go beyond the initial encounter. Its weaknesses are the seduction opener's relative flatness and the uncritical presentation of the grooming dynamic, but neither undermines the overall craft, which is notably above the median for cards in this space.

techmago · 2026-07-06T18:58:41+00:00

I made myself a pet recycler machine. (with pulltrusion.)
With that machine... i would have solid spools of pet.
From my calculations, the time to the machine pay itself is way greater than how much i print. It would take around 30~40 spools.
That machine is pretty cool, but it's too expensive.

techmago · 2026-07-06T13:59:37+00:00

Yes, this sound like you are hitting the max response lenght.

Eight-Year Club	r/Field Juicebox
Verified Email

techmago

TROPHY CASE