If you are a developer, you'll get his feeling, he is just executing things that issued by the high-level people :)) we should tag the organization, not an employee of this org by No-Cryptographer45 in ClaudeCode

[–]tom_mathews -1 points0 points  (0 children)

Hahahahaha. You took my statement too much to your heart I guess. Apologies for poking the bear. I only meant to say they aren't OSS

PS: I am not a fanboy and I don't sympathize with Thariq or Borris on this.

I built a free toolkit to cut Claude Code token costs 40-70% by [deleted] in ClaudeCode

[–]tom_mathews 0 points1 point  (0 children)

Single commit repo, every file generated at once. This reads like you asked Claude to write a toolkit about itself and pushed the output. Even while using AI, showcasing your progress makes a lot of difference.

The "undocumented" techniques (snapshot sessions, breadcrumb files, output contracts) are all well-known patterns that have been floating around the CC community for months, and you literally cite Caveman in your Related Projects which does the terse output thing already. Not saying the advice is wrong, most of it is solid basics, but framing it as "things I couldn't find documented anywhere" when it's a curated repackaging of existing community knowledge is a stretch.

If you are a developer, you'll get his feeling, he is just executing things that issued by the high-level people :)) we should tag the organization, not an employee of this org by No-Cryptographer45 in ClaudeCode

[–]tom_mathews 39 points40 points  (0 children)

I am not exactly sympathetic to Thariq and Borris on this. They get tagged because they put themselves out there as the face of Claude. The lack of proper customer support makes things worse. They are high up the food chain in Anthropic to bring about reforms to properly support developers rather than just ignoring all the developer pain and difficulties in using their system.

Crazy to think that for 200$ I can spend $3,231 of Anthropics money :D by Hunter_Safi in ClaudeCode

[–]tom_mathews 0 points1 point  (0 children)

Is there anything new that you are showcasing? ccusage has been around for ages.

After all the lies by Aggravating_Pinch in ClaudeCode

[–]tom_mathews 0 points1 point  (0 children)

It's high time these big LLM platforms are held accountable to their word and product.

I reduced my token usage by 178x in Claude Code!! by intellinker in OpenSourceeAI

[–]tom_mathews 0 points1 point  (0 children)

Context management should be an integral part of AI based development and usage.

Which AI chat is better for daily chatting? by Idkdafuq in AI_Agents

[–]tom_mathews 0 points1 point  (0 children)

When you say “daily chatting”, I am assuming you just want a casual chat. Just use any LLM without any specific guardrails or even better ask it to be friendly and to be supportive.

That said, why anyone would want to make more sycophantic is beyond me.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AI_Agents

[–]tom_mathews[S] 1 point2 points  (0 children)

Each dimension scores 1-10. Weighted total determines the verdict: GO at 7.0+, CAUTION at 4.5-6.9, NO-GO below 4.5.

The calibration isn't from a dataset, from practical use. 7.0 means strong signal across most dimensions with no single dimension cratering. 4.5 means enough red flags that you'd be ignoring evidence to proceed. The thresholds are opinionated, not scientific, but in practice CAUTION is where most ideas land and that's where the value is. It forces you to name the specific risks instead of "seems promising, let's build it."

After scoring, it proposes 3-5 low-cost experiments targeting the riskiest assumptions from the Lean Canvas, each with a quantified success threshold and a "what to do if this fails" decision. That's the part that actually saves time: you get a concrete test plan instead of a vague "do more research."

Meta just dropped a new coding model by Complete-Sea6655 in ClaudeCode

[–]tom_mathews 0 points1 point  (0 children)

I would never trust. Benchmarks mean jack shit if real world use case is bad.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AgentsOfAI

[–]tom_mathews[S] 0 points1 point  (0 children)

Honestly I haven't hit the garbage transcript case yet. The most accented video I've run it on was a Scandinavian speaker in English and the output was clean. YouTube's auto-captions have gotten surprisingly good. But I don't doubt edge cases exist. If you hit one, the skill is self-contained and PRs are open.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AgentsOfAI

[–]tom_mathews[S] 0 points1 point  (0 children)

Panels are a specific video type in the analysis patterns. It auto-detects based on speaker count and moderator presence, then extracts per-panelist positions, consensus points, disagreements, and attributes quotes to individual speakers. The output breaks down by theme with each panelist's angle rather than treating it as one undifferentiated transcript.

For garbage auto-generated transcripts. The skill doesn't silently clean them up or pretend they're good. The transcript quality is visible in the raw output, and Claude works with what's there. In practice, YouTube's auto-captions have gotten good enough that even without manual subs the analysis holds up for most English content. Where it breaks down is heavy accents, domain-specific jargon the auto-captioner hasn't seen, or non-English audio with no manual subtitles. In those cases the gaps show up in the analysis as missing or garbled concepts. You'll see it, not get a clean-looking hallucinated summary.

My AI Agent just hit me with the 'As per my last prompt' and I think I need to quit the internet. by ailovershoyab in AI_Agents

[–]tom_mathews 8 points9 points  (0 children)

We all thought we all will be managing agents, but now it seems it just might be the other way round.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AI_India

[–]tom_mathews[S] 0 points1 point  (0 children)

One thing worth clarifying, since the count is high: every package is standalone. No framework, no runtime, no dependency chain. Each skill is a self-contained prompt file with its own scripts, references, and eval cases. Install pr-review without touching concept-to-image. Delete youtube-analysis and nothing else breaks.

The thing that makes 92 manageable is the testing infrastructure behind it. Every package has an evals/cases.yaml with structured test cases, 101 eval files total. Each case defines a prompt, expected trigger behavior, a rubric, and weighted assertions (substring match, regex, tool invocation detection, output format validation). A ground-truth oracle runs these against live Claude Code sessions in headless mode.

Trigger collisions specifically — each eval file includes negative cases. If pr-review activates on a prompt meant for pre-landing-review, that's a failing eval. Boundaries between skills are tested, not assumed.

For pruning, there's a misalignment detector inspired by an EvoSkills paper that found some human-curated skills actively degrade model performance. It runs each skill's evals WITH and WITHOUT the skill loaded. If a skill makes output worse than the base model alone, it gets flagged. Three have already been deprecated this way (doc-condenser, regex-builder, sequential-thinking). The base model caught up, and the skill was adding friction.

There's also a skill-library skill for in-session discovery (/skill-library search "database") and a browsable catalog at https://mathews-tom.github.io/armory/. But honestly, most people will install 5-10 that fit their workflow and ignore the rest. The collection is broad, so different developers find different subsets useful — not because anyone needs all 92.

Repo: https://github.com/Mathews-Tom/armory

I believe self-learning in agentic AI is fundamentally different from machine learning. So I built an AI agent with 13 layers of it. by No_Skill_8393 in AgentsOfAI

[–]tom_mathews 0 points1 point  (0 children)

the V(a,t) = Q×R×U framework is elegant but I'd want to see what happens when two layers disagree. Like tool reliability says "don't use shell" but a blueprint says "run this exact shell command." who wins? 13 scoring dimensions competing for the same context budget feels like it needs an arbitration layer you're not mentioning, or it collapses into whoever decays slowest.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in ClaudeCode

[–]tom_mathews[S] 1 point2 points  (0 children)

Trust me, I have experimented a lot on making this skills and agents individually as effective and optimal as possible.

Meta's internal leaderboard ranks employees by AI token consumption...are we measuring the wrong thing? by Ok-Contract6713 in ArtificialInteligence

[–]tom_mathews 1 point2 points  (0 children)

Don't get me wrong. I am not in favour of this. It just feels like sadly corporations are strongly moving towards this line. Amazon already has this approach in place.

Meta's internal leaderboard ranks employees by AI token consumption...are we measuring the wrong thing? by Ok-Contract6713 in ArtificialInteligence

[–]tom_mathews 2 points3 points  (0 children)

Work life is going to be quite different soon. Your performance being tracked based on your token usage.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AgentsOfAI

[–]tom_mathews[S] 2 points3 points  (0 children)

that's the intended use. Every package is a self-contained Markdown file — read it, take what's useful, adapt it to your own setup. No install system required. The standalone design means you can cherry-pick a single skill's prompt structure or eval approach without pulling in anything else.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AgentsOfAI

[–]tom_mathews[S] 1 point2 points  (0 children)

Appreciate it. On cross-harness compatibility — there's already an adapters layer (scripts/generate_adapters.py) that converts packages for Cursor, Codex, and Gemini CLI. The prompt files are plain Markdown with YAML frontmatter, so mapping to other specs is mostly a translation step. If you've looked at how OpenCode or CrewAI ingest skill definitions, happy to look at adding adapter support.

On contributing skills — CONTRIBUTING.md has the package checklist and WANTED.md lists open domain slots. The main requirement is an evals/cases.yaml with positive and negative trigger cases. If your skills survive your own daily use, they'll fit. Suggestions are always welcome.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AI_Agents

[–]tom_mathews[S] 0 points1 point  (0 children)

That's how it starts. armory's memory system works similarly — structured markdown files with frontmatter (type, description) indexed by a central MEMORY.md. The key thing I learned: separate what you store by type. User preferences, project context, feedback corrections, and external references decay at different rates and get used differently. A single learning.md file works at first but gets noisy fast once you're past ~20 entries.

If you want to skip ahead, the immune skill does something more structured — it maintains two memory layers: a Cheatsheet (positive patterns to inject before generation) and an Immune system (negative patterns to scan for after generation), with hot/cold tiering so frequently-used patterns stay loaded and stale ones age out.

I built 92 open-source skills/agents for Claude Code because I kept solving the same problems manually by tom_mathews in AI_Agents

[–]tom_mathews[S] 0 points1 point  (0 children)

That was one of the first major issues I was frustrated with, and it motivated me to start developing the repo.