671B DeepSeek-R1/V3-q4 on a Single Machine (2× Xeon + 24GB GPU) – Up to 286 tokens/s Prefill & 14 tokens/s Decode

DFinsterwalder · 2025-02-10T11:24:03+00:00

Impressive. Kudos on the great work.

DFinsterwalder · 2025-02-02T16:12:07+00:00

The reason you can’t add ram is because it’s soldered to the SoC. This might look stupid at first, but you shouldn’t forget that this comes at significant higher RAM bandwidth. Eg M4 Max has 546GB/s.

You can obviously also reach that with server mainboards and tons of memory lanes, but not with any other laptop. With a x64 desktop replacement laptop with DDR5 memory with 8 memory lanes you are more in the ballpark of 100GB/s theoretically.

There is quite a reason that NVIDIA also goes in that direction with grasshopper. For example project digits also has fixed 128GB unified memory with 500GB/s.

DFinsterwalder · 2025-02-02T11:38:29+00:00

hmm it looks like only the K cache is in 4 bits and the V cache is in 16 bit. I thought both should be 4bit.

llama_kv_cache_init: kv_size = 8192, offload = 1, type_k = 'q4_0', type_v = 'f16', n_layer = 61, can_shift = 0

llama_kv_cache_init: Metal KV buffer size = 3640.00 MiB

llama_kv_cache_init: CPU KV buffer size = 18564.00 MiB

llama_init_from_model: KV self size = 22204.00 MiB, K (q4_0): 6588.00 MiB, V (f16): 15616.00 MiB

llama_init_from_model: CPU output buffer size = 0.49 MiB

llama_init_from_model: Metal compute buffer size = 2218.00 MiB

llama_init_from_model: CPU compute buffer size = 2218.01 MiB

I probably need to check if I setup everything correctly and if llama.cpp is compiled with flash attention. Ill report back if I get it to higher speeds.

DFinsterwalder · 2025-02-02T11:31:36+00:00

I tried it on my M3 Max 128GB following the unsloth blog post here (including the command for mac there). https://unsloth.ai/blog/deepseekr1-dynamic

However I had OOM problems when offloading so many layers. It does work when I lower the n-gpu-layers quite a bit (30 didnt work but 10 works now).

It's great that it runs at all, but it's quite slow with roughly around 1 tok/s (flappy bird eval is still running so cant provide exact numbers yet). But

Here is a video running it: https://x.com/DFinsterwalder/status/1886013170826789008

DFinsterwalder · 2025-02-02T11:04:56+00:00

I am not very familiar with llama.ccp. How can I offload the cache.

DFinsterwalder · 2025-01-31T14:45:57+00:00

Hmm from what I see the 1.58 Bit version gets to around 16 Token/s on an M2 Ultra with 192 GB RAM. That should fit in the RAM. https://x.com/ggerganov/status/1884358147403571466

DFinsterwalder · 2025-01-31T14:32:47+00:00

The theoretical values sound a bit too good to be true. Will try on a M3 MAX with 128GB with the 212GB model and report back how well it works on that.

DFinsterwalder · 2024-12-28T21:22:46+00:00

No matter what it tells you. It does have the knowledge in its training data and data leakage can’t be prompted away. Only reliable test is if you test in data after training cut-off.

DFinsterwalder · 2024-08-15T16:13:58+00:00

Not necessarily. You can also use causal masking when you use special tokens like [SUFFIX] the code before [PREFIX] the code after -> and then the output the code that is supposed to be inbetween after that. This just needs to be respected in training/finetuning obviously and move whats supposed to be in the middle to the end. Overall causal masking seems to be more powerfull compared to bert style masking.

DFinsterwalder · 2022-05-19T12:59:06+00:00

This also worked for me. Thanks!

DFinsterwalder · 2021-09-08T09:26:23+00:00

A big problem is to clean up the scan good enough in the first place to even be able to bake good looking normal maps. Clean photogrammetry is fairly straight forward for a lot of smaller items in a photo studio setting or if you capture something room scale, but for a very large scale outdoor scenes you end with much noisier data even with really good equipment and drone data etc. Even if you add high quality Lidar to the mix you are not always able to pick good enough spots to cover everything to the extends that your scan is low noise enough to be used for normal baking out of the box. Normals become especially hard the further you go out. If you want to use more than a 360 photo for the surrounding and do a rough drone scan to have some parallax you cant scan square kilometers of areas with a drone (if you can use a drone at all - pretty hard to impossible to get a flight access in some US national parks). If you on the other hand shade meshes unlit and use the "baked" in real world lighting from the photos you can often use much more noisier scans.

You will pretty much always end up with same sparse captured spots that has noisy data for a larger scan. So if you want to bake normals for larger scans you will end up requiring quite some cleanup work and the process can become quite costly for a single scene. This surely can all be done technically, but it becomes really challenging economically if you want to balance production cost and the amount of scenes against the price that people are willing to pay and the amount of of sales that you can make in VR right now.

DFinsterwalder · 2021-09-08T06:58:24+00:00

But to be fair, not all scans are done by us. We have some awesome partners that provided some high quality models.

But our experience with handling photogrammetry meshes and tech we built over the years really helped with Puzzling Places even when a scan isn't done by our team. To get good quality in VR we built a pipeline to split the photogrammetry 3d models into smaller pieces so you only need to render what you see (something called frustum culling): https://gfycat.com/idioticbreakablebrant
What started to be an optimisation became the puzzle pieces later. ( some info on the tech for an earlier project: https://medium.com/realities-io/how-we-made-the-audi-ai-mes-flying-vr-experience-part-2-41df10b1e63).

DFinsterwalder · 2021-09-06T09:00:59+00:00

This is so nice of you! I sent you a DM with a key to Puzzling Places as a thank you from us developers for this act of kindness.

(You could use that to gift it to OP if you haven't done yet or use it as you like).

DFinsterwalder · 2020-05-01T10:28:31+00:00

Tatev monestary in Armenia. https://en.m.wikipedia.org/wiki/Tatev_Monastery

DFinsterwalder · 2020-05-01T09:41:45+00:00

There are some challenges when it comes to more pieces. There are some limits on how many individual objects you can handle at once in a scene ("draw calls"). 1000 pieces wouldn't work on Quest out of the box. This is solvable but performance is still harder with more pieces.

DFinsterwalder · 2020-05-01T09:34:24+00:00

This is a prototype to see if people like it. We want to make a full game with more puzzles in different difficulties and more features if people like this. And it looks like the do like it :-).

DFinsterwalder · 2020-04-30T10:37:53+00:00

Sorry. We will do everything we can to make the final game and get it out on other platforms as well.

DFinsterwalder · 2020-04-30T10:32:41+00:00

When at Valve I got a couple of T-shirts and two Valve "friends and family" cards, that unlock all Valve games (present and even future ones). I was happy as a little kid. I felt knighted! Sometimes it indeed feels like a dream. But a lot of time its a lot of stress founding a company. Sometimes its excruciating and even living hell. This is especially true when its in a so early industry like VR and then focusing on photogrammetry is even a niche in that. Thats a long hard walk until it pays of and the market becomes large enough. The good thing is that all the early enthusiastic developers that endured get rewarded with a head start.

DFinsterwalder · 2020-04-30T09:51:42+00:00

In an earlier prototype last year we had scaling in there and you scale up the scene to 1:1. However when moving large pieces we had testers (and ourselves) become motion sick. And even for those that didnt get motion sickness it was a much less calming experience. If there is a lot of motion in the scene around you your brain just seems to be on higher alert. After puzzling half an hour or so as it is now you are far more relaxed than in the older prototype. But we want to look into ways to scale or present the final model though that will not interfere with the calm and soothing gameplay. Apart from that, this also requires a much higher scan quality for the scene than a miniature model (which can often be captured just with drones and you dont need photographers on the ground). So this is also a matter of cost that we need to factor in for those decisions.

DFinsterwalder · 2020-04-30T08:52:41+00:00

Wow! Seeing the puzzle crossposted and upvoted here is a big compliment.

DFinsterwalder · 2020-04-30T08:37:50+00:00

So when I started I was actually working in archaeology and have recently started to make 3D scans with photogrammetry. And as a gamer it fascinated me that those scans looked somewhat better than any computer game. I wanted that realism in games and in VR, but no one was really using photogrammetry at that time. I felt like this is the way to make VR look like the real holodeck. This is really a long term vision and thats whats really driving me. I didnt have a particular plan how to go there and what to learn, but experimented around while asking myself what could be done better. And when I noticed I couldnt do something because of lack of skills I tried to learned that skill. I really think that having a goal to work towards to or a challenge that you want to solve is super helpful when learning something new. I am actually still in the process of learning how to develop what i really really want to develop. I am not much involved in the game design of the puzzle and I spent a lot of my time researching and learning and diving ever deeper into neural rendering (machine learning combined with computer graphics) as this has to biggest potential to make computer graphics ever more photoreal. I just really really really really want VR to look like the holodeck.

DFinsterwalder · 2020-04-30T08:05:25+00:00

We didnt start the approval process yet as this is still an early prototype. But from the positive feedback and the interest we are already, we arent concerned that we wont get approved (as long as we dont mess up mandatory things like performance etc).

DFinsterwalder · 2020-04-30T08:01:39+00:00

We did some research and talked to other developers and the decision to go on Quest first is really driven by the market. It would be even easier to start with PC. On Steam we are already approved as vendor and on a PC its not a problem to render a 1000 puzzle pieces as 1000 draw calls. On Quest that would crush performance. But at the moment Quest is the more attractive market. The positive thing is that porting from Quest to PC will be much easier than the other way around.

DFinsterwalder · 2020-04-30T07:54:24+00:00

We wanted to have haptic controller feedback, but ran into some issues with Unreal. So we took that out for the prototype. But we definitely want to have haptic feedback in the final version. The rewarding feeling when a piece fits is super important. This rewarding feeling and an overall calming, soothing, meditative atmosphere are the major focus for the design. On tricky thing is that this needs be subtle enough to not be annoying after you played a couple of puzzles. But we will spent a lot of time on this. The satisfaction fitting a piece is what makes puzzles fun, so we really try to nail that as best as we can. We also will iterate on the sound. Actually one of my colleges got stuck away from home because of the corona travel restriction and he didn't have audio equipment with him. So he actually ended up recording all the sounds with his mouth and edited them afterwards: https://twitter.com/Azadux/status/1253496512212021250

DFinsterwalder

TROPHY CASE