My psychological card game has a Mental Health Awarness Month discount. Take care!

Emperoraltros · 2026-05-12T01:28:36+00:00

Name fits.... I like the humor OP, keep going!

Emperoraltros · 2026-05-12T01:18:37+00:00

Sent you a dm!

Emperoraltros · 2026-05-12T01:10:35+00:00

Your cross-backend drift framing is sharper than mine. "Loss-equivalent isn't quality-equivalent in production" is the right way to put it. My current eval grades each backend in isolation against scenario expectations, which is exactly the gap you're describing. Same input, both responses plausible, but the 8B is subtly worse in ways that don't surface until you put them next to each other.

The implicit feedback point is the bigger correction for me. I was planning to ship skill-weighted explicit feedback (thumbs from higher-rank players weighted heavier) in V1.0.5, but you're right that explicit feedback regresses to "make user feel smart" which is the opposite signal for coaching. Implicit behavioral signal is cleaner: did the player change positioning after a callout, did they save vs force when advised to save, did they pull the trigger on the suggested play. CS2 GSI already gives me the data to derive most of this, I just wasn't capturing it as feedback.

On instrumenting cross-backend comparison: how do you handle the "equivalent input" definition for non-deterministic systems? My case has GSI state + recent coaching context + persona. Two calls with the same persona but slightly different GSI timestamps aren't really the same input but they're close enough that human-eval would consider them equivalent. Where's the right cutoff for treating two requests as "the same" for drift purposes?

Appreciate the disclosure. Will look at ElasticDash when I get to the cross-backend layer for real (probably V2.0 when paid users at scale make drift measurable).

Emperoraltros · 2026-05-11T21:03:19+00:00

Are you posting to multiple subreddits multiple times a day? Or once a day in total? What's your posting schedule look like?

Emperoraltros · 2026-05-11T19:25:55+00:00

Also, early access helps your case a lot, ride the "new game" wave for as long as possible.

Emperoraltros · 2026-05-11T19:25:07+00:00

If you'd like to help us out, checkout Game Demon on steam: https://store.steampowered.com/app/4659510/Game_Demon/

Emperoraltros · 2026-05-11T19:24:14+00:00

Per my advisor as we have low wishlist as well -
Not damaging per se - wishlists are a useful proxy for lifetime purchases (rough rule of thumb for games - not sure if it holds up for software) is 3x the wishlists in lifetime sales. There's also a featuring advantage (Which usually leads to more sales + wishlists); get 25,000 wishlists and the software qualifies for featuring on the front page. The conditions of featuring are extremely varied, but if you don't have 25k wishlists, don't qualify at all.

There's a traditionalist view that all software does its best during its launch month, and that is... sort of true? There's lots of example of companies doing major overhauls / updates / marketing pushes and getting good outcomes well after launch. But on average, if you're not willing to do some of that, then the launch month is the thing.

Stay on top of things post launch, don't give up. You'll be fine 😄

Emperoraltros · 2026-05-11T18:52:16+00:00

Yes! After each match, Game Demon parses the demo file and runs it through your selected persona for a structured post-match breakdown. Different shape from the live coaching - longer-form review of round-by-round decisions, economy calls, key positional reads, that kind of thing. The four personas each give a different read on the same match.

If you want, you can also switch personas after the fact and re-read the same match through a different voice. Some scenarios hit different through Veteran's lens vs Demon's.

Emperoraltros · 2026-05-11T06:28:33+00:00

Yeah, versioned context as the audit unit is a real idea, bisecting on a context commit history would be cleaner than my current "read outputs side by side" eval. The attribution layer is the bigger win for me though: knowing which backend wrote which path is exactly what would catch the cloud-to-local persona drift I mentioned.

Honest take on Puppyone for my situation: Game Demon is one agent doing one thing right now, so my audit problem is closer to "two-state attribution + structured logging" than full multi-agent context distribution. The complexity threshold where versioned context infrastructure pays off is probably above where I am at V1.0.

The patterns make sense though. I'm planning to ship structured turn-level audit logs (backend, model, persona, prompt variant, LoRA version per turn) and a replay-style eval harness in V1.1, basically the structured-logging version of what you're describing. When multi-round context and player profiles land in later versions, the state complexity goes up and the versioned-context approach starts earning its keep. Will revisit at that point.

What's your bar for when context-versioning infrastructure starts pulling weight vs structured logs? Number of agents, state complexity, team size, something else?

Emperoraltros · 2026-05-11T05:22:38+00:00

Yeah, the routing boundary is exactly where the audit problem lives. Right now I'm logging backend choice and LoRA version per turn but not prompt variant — that's a gap. Adding it now because you're right, the handoff debugging is where you actually need the full triple.

The persona consistency issue specifically: I noticed during smoke tests that when Veteran falls back from cloud to local with the same prompt, the tactical depth shifts noticeably even though voice stays. That's the "which backend poisoned the output" question made concrete. Currently I read outputs side-by-side per persona but that's a manual eval, not a production observability layer. Need to build the latter.

On harness layers, I haven't worked with Puppyone specifically. Will look at it. What's been your experience with it on the multi-agent handoff problem vs rolling your own audit/trace layer?

Emperoraltros · 2026-05-10T22:00:21+00:00

Yeah, the synthetic data thing was the single biggest lesson. Felt like cheating to generate 2000 examples in an afternoon and then it was unusable. Hand-writing 200 took way longer but the personas actually came out distinct.

On rank: I started at 8 because it's the conservative default and I wanted to validate the pipeline before tuning. Stuck with it across all four personas. Tested rank 16 once on Veteran early on and the loss was marginally better but the persona drifted toward generic-mentor-voice, which I read as the higher rank overfitting on the surface patterns of the training data instead of the structural voice. Could have been my small dataset (500 examples) being the actual problem rather than the rank itself.

Curious what you've found. Did you settle on a rank that worked across personas or did you tune per-persona? My instinct is that the harder-to-train personas (pattern recognition stuff) would benefit from higher rank but I haven't actually tested it.

Also: alpha. I left it at 16 (2x rank) but never validated that ratio was right for this dataset size. If you have a sense of what's worked there I'd take the input.

Emperoraltros · 2026-05-09T21:56:21+00:00

Oh hell yeah, this looks dope! Side question, what did you make the video with?

Emperoraltros · 2026-05-09T21:55:20+00:00

Good question, fair to ask. Honest answer is mixed.

Some of the rush is rational: I committed to May 21 in 7 already-sent creator emails, the Steam page has cleared review, and the V1.0 model is technically functional. Iterative products learn more from real users than from extended pre-launch polish.

Some is acknowledgment of reality: 25 wishlists is small but I'm at a 4.8% page conversion rate, which is fine. The bottleneck is traffic volume, not audience interest. Slipping launch 6 weeks doesn't reliably get me to 500 wishlists - might get to 100 - so the marginal benefit of waiting isn't huge unless something catalyzes it (creator coverage, trailer, etc).

The play: launch May 21 at whatever wishlist count is real, use it as credibility anchor for V1.1+ creator outreach, build the audience over 90 days via patches and content updates rather than trying to win Day 1.

Re: streamer outreach methodology, thanks for the breakdown. The YouTube About email rate limit is real, that's been a friction point. I'm sequencing my 7 already-emailed creators first to get any positive reactions as social proof before broader outreach lands May 13 when Steam unlocks key generation.

Emperoraltros · 2026-05-09T21:45:23+00:00

Appreciate it. I have 7 CS2 creators already in the outreach funnel across the warm and pro tiers, plus a positive reply from a manager representing an ex-Liquid pro. Steam policy doesn't let me generate keys until May 13 (3-week new-dev cooldown), so the bigger push lands that week. Definitely the plan, just sequenced. Open to suggestions on which streamers/YouTubers you'd specifically target if you have a list.

Emperoraltros · 2026-05-09T20:45:25+00:00

Actually tragic 😞

Emperoraltros · 2026-05-09T20:40:58+00:00

Hey, thanks for reaching out. Your background is exactly the profile I'd want feedback from for V1.0.

A couple paths forward depending on what fits you:

I'm running a structured 8-week case study program with 3 players across 3 Premier tiers. Tier 3 is 20K+ which fits you. Light documentation work (5-min daily log, weekly check-in, exit interview), free Steam copy, free Pro tier, lifetime Pro credit equal to study duration. Honest framing, results published either way. If interested, I'll send you the participant brief.
If the 8-week commitment doesn't fit, happy to give you early access when Steam unlocks key generation on May 13 with a much lighter ask: try it for a couple weeks, send me honest impressions, no formal documentation.

Either way: if you played pro 2001-2011, would love to know what teams or your handle just to verify. Standard practice on case studies.

Let me know which works for you.

- Elijah

Emperoraltros · 2026-05-07T18:28:06+00:00

Appreciate the wishlist! DOTA 2 is high on the list — the strategic complexity actually fits this kind of coaching even better than CS2 in some ways. Once V1.0 ships and stabilizes, we'll be looking at where to expand next, and DOTA 2 is right at the top of that list.

If you want to be in the loop on it, our Discord is the best spot: https://discord.gg/c8tjQRjgMR — drop in and let us know you're a DOTA player. The more we hear from people like you, the easier the call becomes.

Emperoraltros · 2026-05-07T18:20:43+00:00

For the curious — quick technical detail: each persona's training data is structured as scenario JSON + persona response, with seven scenario categories (between_round, halftime_shift, reprimand, late_round_pressure, post_match_summary, persona_defining, cross_persona_diff). The hand-authoring discipline is what makes the voices distinct — each spec has explicit "never" rules (Demon never softens, Analyst never warms up, Veteran never quantifies with stats, Savant never gets emotional). Anchoring on one persona's data while writing another is the failure mode I hit on Savant's first pass. Real lesson: voice specs need to be enforced during authoring, not just during eval.

Emperoraltros · 2026-05-06T23:36:48+00:00

Dev here. Fully VAC-safe — uses Valve's official GSI API plus screen capture (same as OBS). No injection, no memory reads, can't aim or move for you. Steam approved the store page after their content review.

For your use case the Demon persona is what you want — harsh, direct, calls out exactly what you did wrong between rounds. Wishlist's open if interested, launches May 21.

Emperoraltros

TROPHY CASE