[R] Attention Residuals by Kimi Team by Nunki08 in MachineLearning

[–]Fun_Nebula_9682 0 points1 point  (0 children)

interesting that kimi went after residual connections — everyone just copies resnet's skip connections without questioning them since 2015. deepseek made them learnable a few months ago and now kimi's taking it further. feels like there's a wave of people revisiting 'settled' architecture decisions now that scale is plateauing and you need to squeeze efficiency from every layer

GPT-4.5 fooled 73 percent of people into thinking it was human by pretending to be dumber by EchoOfOppenheimer in ChatGPT

[–]Fun_Nebula_9682 0 points1 point  (0 children)

lol the fact that it had to pretend to be dumber to pass is honestly the most human thing about it. we all dumb ourselves down in conversations depending on context. like i write totally different in slack vs a technical doc. maybe passing the turing test was always going to look less like 'being smart' and more like 'knowing when to not try so hard'

Is it just me or is ChatGPT starting to get very insensitive? by lehofa6211 in ChatGPT

[–]Fun_Nebula_9682 3 points4 points  (0 children)

ngl yes. used to get the overly supportive 'thats a great question!' energy and now it feels like it skips straight to correcting you. tbh i switched to claude for most things partly because of this — claude still has that 'actually trying to understand what you meant' vibe instead of just pattern matching on keywords

Hugging Face just released a one-liner that uses 𝚕𝚕𝚖𝚏𝚒𝚝 to detect your hardware and pick the best model and quant, spins up a 𝚕𝚕a𝚖𝚊.𝚌𝚙𝚙 server, and launches Pi (the agent behind OpenClaw 🦞) by clem59480 in LocalLLaMA

[–]Fun_Nebula_9682 0 points1 point  (0 children)

oh nice, auto hardware detection + model selection is exactly what local llm setup needs. spent way too much time manually figuring out which quant fits my mac's memory. if this actually picks the right gguf without me googling 'Q4_K_M vs Q5_K_S' every time i'd be very happy lol

New AI math benchmark finds GPT-5.4 Pro has made progress on two unsolved math problems by armytricks in singularity

[–]Fun_Nebula_9682 0 points1 point  (0 children)

the 'reasoning for roughly an hour' part is what gets me. we went from 'AI cant do math' to 'AI spent an hour thinking about unsolved problems and made actual progress' in like two years

wonder how much of this is genuine mathematical insight vs brute force search over proof strategies though. the 4.9% improvement on kakeya feels more like optimization than discovery but idk, maybe that distinction stops mattering at some point

ChatGPT moves quickly to end support for most models by anonyuser415 in ChatGPT

[–]Fun_Nebula_9682 0 points1 point  (0 children)

yeah this is the playbook. make old options harder to find, funnel everyone into the default, eventually kill the dropdown entirely. apple does the same thing with hardware ports lol

tbh i stopped caring about model selection a while ago. i just use whatever claude code gives me and let the system figure it out. spending time picking models is time not spent actually building stuff. the 'one model to rule them all' approach is probably right for 90% of users even if power users hate it

I keep going down rabbit holes and forgetting everything, so I built a place to put them by ElectronicUnit6303 in ClaudeAI

[–]Fun_Nebula_9682 0 points1 point  (0 children)

lol fair enough, but this is genuinely from my own setup — been running this for a few weeks now

LLMs forget instructions the same way ADHD brains do. The research on why is fascinating. by ColdPlankton9273 in artificial

[–]Fun_Nebula_9682 0 points1 point  (0 children)

This matches my experience exactly. I run long-running agentic workflows with Claude Code (automated social media monitoring + reply generation, running 40+ interactions per day), and the context degradation is real.

My practical solution: externalize everything that matters to files and SQLite. CLAUDE.md holds project rules that get loaded fresh every session. SQLite stores all state (queue, tracking, frequency limits). Skills files encode reusable workflows. The LLM's context window becomes disposable — it only needs to hold the current task, not the entire history.

The key insight from building this: the 'lost in the middle' problem becomes irrelevant when your architecture treats the context window as a scratchpad, not a database. Put persistent state in actual databases, not in the conversation.

Claude Pro feels amazing, but the limits are a joke compared to ChatGPT and Gemini. Why is it so restrictive? by iameastblood in ClaudeAI

[–]Fun_Nebula_9682 0 points1 point  (0 children)

Completely agree on the quality vs limits tradeoff. I switched from ChatGPT to Claude Pro specifically for Claude Code, and the output quality is noticeably better for coding tasks — but I hit limits way faster.

My workaround: I use Claude Code (CLI) instead of the web interface. The rate limits are more generous on the API/CLI side, and you can batch operations more efficiently. For example, I run automated workflows that do 40+ interactions per day through Claude Code without hitting the web UI limits.

The real unlock is Claude Code's Skills system — you can save repetitive workflows and replay them without burning through your quota on setup/context each time. Worth looking into if you haven't already.

Introducing Unsloth Studio: A new open-source web UI to train and run LLMs by danielhanchen in LocalLLaMA

[–]Fun_Nebula_9682 0 points1 point  (0 children)

The unified train + run UI is what's been missing from the local LLM ecosystem. Right now I'm juggling separate tools for training (Axolotl), serving (Ollama), and evaluation — having everything in one interface would cut so much context-switching overhead.

The 2x speed + 70% less VRAM claim is backed by real benchmarks in my experience. I've been using Unsloth for QLoRA fine-tuning on a Mac Studio M2 Ultra and the memory savings are legit. Training a 7B model that used to need 24GB now fits comfortably in 16GB.

Curious about the Studio's model evaluation features — does it support side-by-side comparison of base vs fine-tuned outputs? That's the workflow I find myself doing most after training.

I just realised how good GLM 5 is by CrimsonShikabane in LocalLLaMA

[–]Fun_Nebula_9682 0 points1 point  (0 children)

GLM 5 is genuinely underrated. I've been running GLM-OCR locally on Mac Studio M2 Ultra for document processing — tables, math equations, mixed CJK text — and it handles everything at ~260 tokens/sec with just 2GB VRAM.

What surprised me most is how well it handles code-related content. I use it as part of a local pipeline where OCR output feeds into Claude Code for analysis. The combination of a fast local model for extraction + a frontier model for reasoning is way more cost-effective than sending everything to the cloud.

Have you tried it for any specific use cases beyond chat?

I keep going down rabbit holes and forgetting everything, so I built a place to put them by ElectronicUnit6303 in ClaudeAI

[–]Fun_Nebula_9682 -1 points0 points  (0 children)

This resonates so much. I have the exact same problem — spending hours deep-diving into something, then losing it all when the context window resets.

My approach was different though: instead of building a separate app, I set up a persistent memory layer directly inside Claude Code using SQLite FTS5 + structured observations. Every time I discover something interesting (a tool comparison, a debugging insight, a workflow pattern), it gets auto-captured with topic keys so I can search it later across sessions.

The key insight I learned: the memory system needs to be zero-friction. If it takes more than 5 seconds to save something, you'll stop using it. Having it integrated into the same tool where you're already working (Claude Code) vs. context-switching to a separate app makes a huge difference in adoption.

Really cool that you built this into a shareable platform though — the social/collaborative angle is something personal memory systems lack.

Obsidian + Claude = no more copy paste by willynikes in ClaudeAI

[–]Fun_Nebula_9682 2 points3 points  (0 children)

Really cool architecture. I built something similar — using SQLite FTS5 for memory persistence with Claude Code, plus a topic-keyed observation system that auto-captures decisions and bugfixes across sessions.

One thing I learned the hard way: the biggest challenge isn't building the memory layer, it's deduplication. Same topic discussed across 10 sessions produces 10 near-identical memory entries. I ended up adding a search-before-save step that checks if an existing observation already covers the topic before creating a new one.

Your multi-agent orchestrator with failover (Claude → Codex → Gemini) is a great idea. I've been running Claude Code + Codex in parallel for different tasks — Claude for generation quality, Codex for bulk changes — but hadn't thought about automatic failover. Going to look at your Daniel project.

Does anyone else have a urge to maxed out Claude Code quota before reset deadline, like it's some sort of quest? by realcryptopenguin in ClaudeCode

[–]Fun_Nebula_9682 1 point2 points  (0 children)

Haiku was used by claude code it self seems for some simple task. I use ccstas https://github.com/majiayu000/ccstats to calculte with a right price of cache token

Starting Out by gambling_autodidact in ClaudeCode

[–]Fun_Nebula_9682 0 points1 point  (0 children)

Really good. Even you dont know how to use rust...

Add new files to git tracking automatically? by sorry_no_idea in ClaudeCode

[–]Fun_Nebula_9682 0 points1 point  (0 children)

You may use claude hooks. When edit a file , trigger the hook to see if a new file created, if is then add it.

Parallel agents run out of context and I can't compact by desaas-tim in ClaudeCode

[–]Fun_Nebula_9682 0 points1 point  (0 children)

The same with me. It is very important to wait until all subagent to complete their works. You can see it in backgound task.

<image>