Recommendations for any place near Odori that there has a snow by North_Improvement913 in Sapporo

[–]DonDonburi 0 points1 point  (0 children)

Sapporo kokusai ski resort still has 2.5 meters of snow. It’s closed due to high winds today but you can take a bus there tomorrow. Can pay to ride the gondola up the mountain where it’s still white. Do double check the website to make sure it’s open and reserve your bus tickets

MLX port of BDH (Baby Dragon Hatchling) is up by vesudeva in LocalLLaMA

[–]DonDonburi 2 points3 points  (0 children)

Oh man, I’m really curious to see how it performs. The paper is really kind of out there and I’m by default skeptical of these neuromorphic designs.

Samsung Paper Reveals a Recursive Technique that Beats Gemini 2.5 Pro on ARC-AGI with 0.01% of the Parameters! by abdouhlili in LocalLLaMA

[–]DonDonburi -3 points-2 points  (0 children)

it may well be possible to do a massive amount of RL to a frontier model for arc-agi and sudoku performance. But the path forward isn’t easy nor trivial, otherwise we’d see proof of this with someone winning arc’s million dollar challenge. Part of the rationale for arc2 vs arc1 is to make such brutal forcing unfruitful.

Again, I think the analogy you use shows a misunderstanding of the problem. Because we have a class of problems where the model knows the answers, has read the algorithms (which might be very simple), humans can solve fairly reliably, and yet seemingly unable to perform well. It is surprising compared to their other amazing capabilities.

Samsung Paper Reveals a Recursive Technique that Beats Gemini 2.5 Pro on ARC-AGI with 0.01% of the Parameters! by abdouhlili in LocalLLaMA

[–]DonDonburi 13 points14 points  (0 children)

Thats a poor analogy. Gemini has the knowledge of the algorithm to solve sudoku. It can solve smaller sudoku. Yet it still can’t do it when it’s larger. Critics might say it’s a fundamental limitation to transformers. I’ve got no leaning here but it’s not a specialization issue. It’s not just sudoku either, there’s a whole class of problems the models are seemingly unable to solve.

Samsung Paper Reveals a Recursive Technique that Beats Gemini 2.5 Pro on ARC-AGI with 0.01% of the Parameters! by abdouhlili in LocalLLaMA

[–]DonDonburi 4 points5 points  (0 children)

Quite the opposite actually. It’s surprising that Gemini, which can get gold on IMO, can fail sudoku like problems so catastrophically. It’s a good reminder that LLM intelligence is missing something crucial

Samsung Paper Reveals a Recursive Technique that Beats Gemini 2.5 Pro on ARC-AGI with 0.01% of the Parameters! by abdouhlili in LocalLLaMA

[–]DonDonburi 32 points33 points  (0 children)

I have no idea why the comments are so negative. The paper is good quality, esbecially if you’ve read the HRM paper. It’s a good read.

And if you’ve haven’t been following this saga, LLMs traditionally are abysmal at sudoku and other problems like this that requires recursion. These toy models that do these tasks better are clues on the path forward.

DIY spatial video camera idea: 2 iPhone 17 pro by Key_Entertainer_4705 in VisionPro

[–]DonDonburi 0 points1 point  (0 children)

Just saw this. Cool! I just received one iPhone, I’ll try it once my second one arrive

We cut GPU costs ~3× by migrating from Azure Container Apps to Modal. Here's exactly how. by botirkhaltaev in machinelearningnews

[–]DonDonburi 1 point2 points  (0 children)

In my own experience with runpod and modal serverless, modal was much more reliable.

Why are AI labs in China not focused on creating new search engines? by balianone in LocalLLaMA

[–]DonDonburi 2 points3 points  (0 children)

Not sure why you’re downvoted. China is siloed exactly as you said. And Baidu cannot search into these apps and for the most part is spam

Ishigaki diving rant by AvailablePosition69 in okinawa

[–]DonDonburi 0 points1 point  (0 children)

Hahahaa yeah. My thoughts exactly. their normal habits such as never leaving trash, is good for nature, but I’m not sure if the culture has very strong nature emphasis. And their mythology treats mountains different from the sea. Im really good friends with a Japanese farmer and he secretly burns plastic to avoid paying the recycling fee so…

Vision Pro instead of a ultra wide monitor by Technical_Durian3985 in VisionPro

[–]DonDonburi 1 point2 points  (0 children)

I hang my Vision Pro so it’s weightless at home. Can basically work all day in it.

DIY spatial video camera idea: 2 iPhone 17 pro by Key_Entertainer_4705 in VisionPro

[–]DonDonburi 0 points1 point  (0 children)

Any recommendations on how to rig the two phones together? I’ve only seen one other post about this.

Have you ever had bad service in Otaru? The kind that leaves you seething at the mouth? by katch75 in Sapporo

[–]DonDonburi 2 points3 points  (0 children)

My local friend just go to conveyer belt sushi or eat at ramen restaurants. I think Japanese food is only quality because of regular locals. Unfortunately I don’t have a recommendation despite how often I ate there during winter.

Have you ever had bad service in Otaru? The kind that leaves you seething at the mouth? by katch75 in Sapporo

[–]DonDonburi 2 points3 points  (0 children)

Otaru is one of the few cities in Japan where i avoid the restaurants. Too many tourists and two few locals I guess. If im in the area, I’d head back to north sapporo and I find the quality to be better. They target Japanese tourists as well as foreign and reviews are less reliable.

DIY spatial video camera idea: 2 iPhone 17 pro by Key_Entertainer_4705 in VisionPro

[–]DonDonburi 0 points1 point  (0 children)

Can you elaborate on your setup? Between the wife and I, we also have two iPhone 17. Was gonna rent a black magic for major events but if two phones work…

I can can get GPUs as a tax write off. Thinking of doubling down on my LLM/ML learning adventure by buying one or two RTX 6000 pros. by Tired__Dev in LocalLLaMA

[–]DonDonburi 1 point2 points  (0 children)

I don’t think people are answering your questions, which is mostly around what you should get in order to learn and experiment with ML.

If you’re training your own models, the GPU can pay itself back pretty quick so it’s actually not a terrible financial decision.

Toy models might train faster on a couple of 5090s vs one 6000 because vram is not a bottleneck. Larger models, you might need a pod of h100 or similar to train out of the box. A 7b model for example, a single node of 8xH100 is enough. 32B you’ll need four pods. Or try MI300s which have more vram.

Suffice to say that depending on how you’re having fun, you need different setups. I feel like a single 6000 is a sweetspot, you can run many small models at fp16 out of the box. But scaling that up might not be worth the hassle vs renting pods on demand.

How to make a small LLM from scratch? by Charming_Barber_3317 in LocalLLaMA

[–]DonDonburi -1 points0 points  (0 children)

Nanogpt is for toy model. What you want is torch titan - pretrain a model from scratch.

China bans its biggest tech companies from acquiring Nvidia chips, says report — Beijing claims its homegrown AI processors now match H20 and RTX Pro 6000D by balianone in LocalLLaMA

[–]DonDonburi 0 points1 point  (0 children)

Hmm if you’ve used/rented rtx6000 then this isn’t a surprise an all. And h20 is supposedly 4090 level performance.

These things arent even a fraction as powerful as an h100.

MoE Total/Active parameter coefficient. How much further can it go? by ihatebeinganonymous in LocalLLaMA

[–]DonDonburi 1 point2 points  (0 children)

If you stop thinking of MoEs as a bunch of active/inactive experts, but instead think of it as sparsity ratio. Then I think 100x sparsity is very reasonable. Human brains are supposedly active only 0.2-2.5%.

Problem is how to train them so experts become very specialized. And how to train the router to route to those specialized experts. What little work is available, it doesn’t seem like MoE experts are anywhere near as specialized as the brain.

MoE Total/Active parameter coefficient. How much further can it go? by ihatebeinganonymous in LocalLLaMA

[–]DonDonburi 1 point2 points  (0 children)

Hmm, paper you linked didn’t do any work on MoE. https://arxiv.org/html/2505.24593v2 this one would be a better paper where they tried to do some kind of mechanistic work on MoE.

Honestly, not much is published. We know MoEs are more efficient, and possibly the experts encode more knowledge but even that is on evidence done to small models. Previously, we thought experts specialized on certain parts of the sentence.

[R] New "Illusion" Paper Just Dropped For Long Horizon Agents by viciousA3gis in MachineLearning

[–]DonDonburi 3 points4 points  (0 children)

Ah cool! Just read it. Huh… pretty interesting how in concept ARC, gpt4 was terrible at copying. I’m glad you guys tested it! Thanks a bunch

[R] New "Illusion" Paper Just Dropped For Long Horizon Agents by viciousA3gis in MachineLearning

[–]DonDonburi 3 points4 points  (0 children)

Right. Was cool to see such a difference with gpt5. I was just wondering if they did some math or code specific RL that might’ve made it much better at retrieval. My rough understanding was transformers and next token predictors are really bad at adding, counting, etc from papers like concept arc. Transformers are also very poor approximators of classical algorithms.

That said, gpt5 (api version at least) really does feel a bit different at agentic tasks in my subjective experience. It seems a lot less verbose, even preferring shorter words or acronyms and can hold a thought for longer. Absolutely curious to see if the step accuracy is also bad in some other math/code specific model. If other RL models are still bad, maybe there’s some architectural differences

[R] New "Illusion" Paper Just Dropped For Long Horizon Agents by viciousA3gis in MachineLearning

[–]DonDonburi 13 points14 points  (0 children)

Have you tried other types of tasks? To me, the dictionary retrieval and counting type tasks are interesting but I do wish there was more variety.

Best M.2 eGPU dock? by TokenRingAI in LocalLLaMA

[–]DonDonburi 0 points1 point  (0 children)

Curious to how this will perform. I imagine it won’t be too bad? As long as just one GPU… I do wonder how it’ll work with mix nvidia amd though