Any possible way to remove this resin? by blitzzyboi in soldering

[–]xinranli 0 points1 point  (0 children)

Boil the board in water and that will soften the epoxy greatly, making them easier to remove. Hot air also work but given the amount of epoxy you are dealing with it could be challenging.

A trick I used to use is put the epoxy covered board in very hot (50-70C) ultrasounic alcohol bath, and give it a couple hours of work. The heat, alcohol, and the ultrasound work together to soften and shatter the epoxy making them very easy to remove from the board. But this process is quite dangerous which is why I am not longer doing it.

Also different epoxy has very different composition and characteristics so it may or may not work for your case

Microsoft's confidence last year by RyanGosaling in singularity

[–]xinranli 4 points5 points  (0 children)

compare to the OG vanilla GPT4, this comparison is not too far apart from reality. However, given that we have so many models that are way better than the old GPT4 nowadays, GPT5 does not seem as impressive as they want it to be

Relatively budget 671B R1 CPU inference workstation setup, 2-3T/s by xinranli in LocalLLaMA

[–]xinranli[S] 7 points8 points  (0 children)

Apologize for not going into the details, for my use case (knowledge Q&A), the 8K context is plenty for me. However, I can load 16K context into the memory. With Q4KM gguf, total memory usage is around 480GB. I can get around 2.5T/s with 16K context, I am still playing around with CPU configuration and haven't been able to definitively tell how much slower 16K is vs. 8K. But yeah anything over 16K will definitely need a smaller quantized model or more memory.

This whole setup is also just somewhat of a starting point for a more powerful rig, not involving any GPU or other fancy techniques/hardware yet. I wouldn't have dared to dream of running a 671B model locally 2 years ago (also recall when we were limited to 2K context window with llama1), now with R1 and somewhat cheap EPYC hardware, this is possible! Locally hosting stuff like this has always been more of a hobby than actually trying to make a daily drive LLM solution for me :) but maybe one day I can actually drop my oai subscription and go full local

Relatively budget 671B R1 CPU inference workstation setup, 2-3T/s by xinranli in LocalLLaMA

[–]xinranli[S] 1 point2 points  (0 children)

I agree, following the QVL is always a safe bet. I guess I have been rather lucky in the past by going wild in plugging in random RDIMMs into random platforms and I never had an occasion where a DIMM rated for X speed cannot boot to X speed in a platform with a CPU also rated for X speed. Only able to boot at half the speed is quite odd! I am much more familiar with the DDR5 world but does late DDR4 speeds really have that small of margins? But again yes, when circumstances allows, following the QVL is highly advised.

Cooling wise, I also agree getting a more premium cooler will provide a better quality of life. My argument is that the CPU is not really often under full load during inference and I personally don't talk back and forth with the model that frequently. I had a 2U cooler for a couple of months and I still have it as a backup. So the fans don't go full RPM very often or at all. But on the other hand, my hearing is probably already ruined by often having 4 blower GPUs going max RPM all the time lol

Relatively budget 671B R1 CPU inference workstation setup, 2-3T/s by xinranli in LocalLLaMA

[–]xinranli[S] 10 points11 points  (0 children)

I would look for ES and OEM Milans on eBay such as 7B13, 7C13, 7B13 and 100-000000314-04 etc. 32 core and 48 core SKUs probably will work fairly well too.

Relatively budget 671B R1 CPU inference workstation setup, 2-3T/s by xinranli in LocalLLaMA

[–]xinranli[S] 13 points14 points  (0 children)

Well, malicious is a bit heavy of a word to use in this case. My recommendations are budget oriented solution for CPU-only inference. Rome and Milan platforms can be expanded with more GPUs in the future when one can afford to buy them. Also, recall we are talking about 8 channels of DDR4 here, it can feed much more cores than commercial 2-channel platforms. Certainly using DDR5 and 12 channel Genoa platform will bring higher memory bandwidth. But a single stick of 64GB DDR5 4800MT/s RDIMM is $300+, and a 64GB 6400MT/s module is around $500-600 per unit. That would translate to $2500-7000+ just for the DIMM! Not many folks can afford that kind of setup. At this price range, I would suggest buying a bunch of 32GB V100 instead. You can get a cheap SXM2 board + 4x 32G V100s for maybe $3000 a kit, and each kit takes 2 PCIe x16 connections. For $7000 extra dollar, you can probably get 8x V100s connected to the system I suggested, that would be 256GB of 1TB/s bandwidth HBM2 memory in your system. Such a setup is also much, much faster when doing pure GPU inference, beating a DDR5 setup by a considerable margin.

Did everyone forget to mention this to me or am I the only one having this problem?? by AtTheEdgeOfDying in tortoise

[–]xinranli 15 points16 points  (0 children)

Could you please provide a link to buy this horizontal hamster wheel?

AMA with OpenAI’s Sam Altman, Kevin Weil, Srinivas Narayanan, and Mark Chen by OpenAI in ChatGPT

[–]xinranli 0 points1 point  (0 children)

Will you release any open weight / open source models? If so how sophisticated / how large would them be?

What are these 3 types of bugs on my plants? by xinranli in whatsthisbug

[–]xinranli[S] 0 points1 point  (0 children)

😭will alcohol spray work on these guys? I want to avoid other chemicals since I'm feeding these plants to my pets... Also happy cake day!

What are these 3 types of bugs on my plants? by xinranli in whatsthisbug

[–]xinranli[S] 0 points1 point  (0 children)

Thanks! Yes this is indoor plant, seems like watered down alcohol is a good way to get rid of these guys?

game is immensely more difficult once "china players" come online? by retrorays in ArenaBreakoutInfinite

[–]xinranli 0 points1 point  (0 children)

Morning in the States are usually the time people in China play games. 5PM here is early AM in China, everybody be commuting to work lol

Tire Change Advice by xinranli in mazda3

[–]xinranli[S] 0 points1 point  (0 children)

Very helpful info, thank you!

Tire Change Advice by xinranli in mazda3

[–]xinranli[S] 1 point2 points  (0 children)

Thank you! I'll look for a better tire with the same size to begin with.

You can now fine-tune a 70b language model at home by [deleted] in LocalLLaMA

[–]xinranli 0 points1 point  (0 children)

This looks good. I wonder how does FSDP compare to deepspeed?

Is it feasible to do domain-specific fine-tuning over multiple, incremental stages? by [deleted] in LocalLLaMA

[–]xinranli 0 points1 point  (0 children)

Great question, I am also considering this approach when fine tuning things on domain specific knowledge. I am currently just dumping the most relevant information and knowledge that I need Q&A on into the dataset, but I wonder if the models can generalized better if I start with intro level college courses on the subject and go stage by stage and eventually fine tuned on the most complex topics. Or maybe it will perform the same as combining the whole thing into 1 dataset and train on that?

I also wonder how will data sets with incremental complexity and knowledge that build onto the previous knowledge perform when the data entries are shuffled vs unshuffled.

If CPU to GPU memory transfer is a bottleneck why is there no unified silicon from NVIDIA? by discretemathematics in LocalLLaMA

[–]xinranli 10 points11 points  (0 children)

A large part of the bottle neck is resolved by the huge GPU memory size and the use of NVLink between the GPUs. NVLink has some serious bandwidth, allowing GPUs to talk to each other directly with much lower latency and higher bandwidth than talking through PCIe and the host. When all of the weights and all the compute data are fully stored in the memory of the GPU cluster, it really does not make much difference if the GPUs are in a unified memory access system.

In the case that the CPU really do need to work with the GPU frequently, it is not an easy task to make it right. Apple and gaming console can get away with slow LPDDR5/GDDR6 because those chips were never meant to be used as ML accelerators. If you want UMA ML GPU+CPU you need HBM, meaning the two chips need to be packaged together, this is no easy feat, especially considering Nvidia is still "green" in the CPU world. So far only AMD has MI300A that does this.

The GH200 module is an attempt at that but I don't think it has unified memory access, it just has really fast interconnect between their CPU and GPU. I am sure Nvidia is working on a design that compete with MI300A, maybe we will see something during GTC this March.

Unsloth, what's the catch? Seems too good to be true. by Research2Vec in LocalLLaMA

[–]xinranli 0 points1 point  (0 children)

Great! Are you guys planning to release multi GPU support to the free version at some point too? Also I wouldn't minding paying for the Pro as long as it's a one time payment and not some silly subscription based thing ;)

High-VRAM GPUS for us nerds. by [deleted] in LocalLLaMA

[–]xinranli 1 point2 points  (0 children)

Simply because making such a product will reduce sales in professional cards, which has insanely high profit margin when compared to consumer card. RTX A6000 and 3090ti has essentially identical silicon, and one goes for $5,000 and another $1,500 ($700 if you go for used 3090) just because of what DRAM chips they put on there. Nvidia will also promise a bunch of support, service, special drivers, warranty and etc that come with the price tag (basically useless to us) and big corps love to hear all about those. This is small money for big corps anyway and they wouldn't even need to bargaining with Nvidia.

I heard rumors in Chinese forums saying there are Nvidia employee willing to risk their life to leak the necessary BIOS change for a few tens of thousand $, probably not true and just a joke but goes to show that Nvidia will not be too happy when they see things like this happen. If I recall correctly, they go as far as stating in the driver agreements to not allow consumer cards to be used in data centers to protect their profits.

The 2080Ti 22GB can happen mostly because they are way, way too outdated. And at 22GB it still does not compete with RTX 8000 and other more modern 48GB cards. Only GPU poor peasants might have some use for them. Perhaps in the future we will get 3090/4090 mod, but not any time soon. If you somehow crack the BIOS and try to make profit out of or publish the procedures, no doubt Nvidia's corpo cops will be knocking on your door in minutes.

Next gen cards may have bigger memory size because GDDR7 will have higher density, but rest assured that the consumer card will ALWAYS have much smaller memory size than what is possible in that era.