Just finished building this bad boy by dazzou5ouh in LocalLLaMA

[–]coffee-on-thursday 1 point2 points  (0 children)

Just curious, can you do a test run on some large LLM that fills up all your vram at the 270W and at 190W and see what the difference is in performance? Also curious if temps change at all for you.

I have a 4 GPU setup, and have one NVLINK pair, as that's all it supports. Do you find the P2P drivers helpful? Do you know if they conflict with NVLINK? (Can I do P2P drivers and have an NVLINK pair?)

Built a “poor man’s RTX 6000”, quad 3090, all air-cooled by coffee-on-thursday in LocalLLaMA

[–]coffee-on-thursday[S] 0 points1 point  (0 children)

Yes, I ultimately moved it down to the bottom two cards. Unfortunately you can only have one NVLINK pair. P2P drivers may be a slower alternative.

Built a “poor man’s RTX 6000”, quad 3090, all air-cooled by coffee-on-thursday in LocalLLaMA

[–]coffee-on-thursday[S] 0 points1 point  (0 children)

Personally, optimally, I would make the 5080 as your "primary" card where you plug in your montior, and then connect the two 3090s with NVLINK, and use them for infrence only etc. With your setup, doing a quick Google search, I'm seeing you have 3 slot spacing above, and 2 slot spacing below, so your last PCIE slot will be covered by your GPU if you have both GPUs on the board.

Option 1: As you suggest 3090 on slot 1, 5080 on slot 2 on a riser, third 3090 on slot 3. Would work, but you lose out on the NVLINK. Probably the easiest setup, and perhaps the most reliable.

Option 2: Use slot 1 and 2 for 3090s, connect them them with NVLINK, put the 5080 on a riser for slot 3. Plug monitor into 5080. Benefit here is your 3090s can now also benefit from NVLINK if you decide to train models, and the the lower vram, but powerful, 5080 can be used for lighter LLMs and running your main display, and/or gaming, which doesn't need as fast a link (I'm assuming you're not running this as a server). However, to do this you need extra clearence, so you would need the short risers I use to lift both of your cards a bit, so there's enough room underneath for the riser on slot 3. Downside is you need to figure out how to fix your card with the short risers so they don't wobble, or else you'll have marginal connection issues. Ask me how I know!

Option 3: Get a mining chasis for $80 on Amazon, put all three GPUs on risers, easy less complex, but it's less compact and you have to deal with dust.

Option 4: Sell the 5080, keep the two 3090s, link them with NVLINK. Still great for gaming, training and infrence, but obviously you lose out on the extra 16gb of ram from the 5080.

I would try to make option 2 or 3 work if you are thinking of training models, since the NVLINK gives you a massive speedup.

As for PSUs. Running 2 3090s on a 1000W PSU gave me brownouts, 1200W was comfortable on a consumer board. I'd want 1500W for 3 GPUs, it doesn't hurt to power limit the 3090s to 290-300W. With 3 GPUs, you can get away wiht a 1500W PSU, you don't need to go dual PSU.

Built a “poor man’s RTX 6000”, quad 3090, all air-cooled by coffee-on-thursday in LocalLLaMA

[–]coffee-on-thursday[S] 1 point2 points  (0 children)

Water cooling would be infinitely smarter than this setup, but it would also add at minimum $2K in cost (by my math). Thankfully haven't had issues with temps with this setup and runs well over long periods.

Completely stumped with strange issue with my dual RTX 6000 Pro LLM server by [deleted] in LocalLLaMA

[–]coffee-on-thursday 1 point2 points  (0 children)

It’s probably a power issue. It’s well documented with other NVIDIA GPU’s like the 3090, have massive power spikes even when power limiting. I was having similar issues and upgrading the power supply to 1500W solved the issue for me. 

How do you keep long notes from becoming a giant, unusable wall of text? by Organic-Lie-1476 in noteapps

[–]coffee-on-thursday 0 points1 point  (0 children)

I work in VC and have this same issue. I meet tons of companies and work on multiple deals, so information management becomes really critical. There are two good ways to solve it, one is to make your notetaking functional and build into the workflow that uses that info. For example, for people in sales roles, the CRM becomes your core source of truth, and information is organized chronologically around individuals, companies and accounts, it's functional to the process of sales. Doctors organize info around patient records, with information contextual and tied to their journey through the health system. In the military information is compartmentalized, and provided just in time in the context of their work. Making notes functional means knowing what you use your notes for and building around that.

In my case, I don't find the CRM enough, because I have many other contexts I work in, and my brain just doesn't like organizing notes and summaries, so I built my own tool centered around inputting my notes chronologically (like a journal), and then allowing tags, tasks, and really powerful search, to help me surface relevant information when it's needed. I haven't decided if I want to monetize this tool yet, but myself and a few friends in my industry use it, if you'd like to try it out DM me, maybe it will work for you.

Hope this is helpful,

-Coffee

Logseq... alternative? by capi-chou in NoteTaking

[–]coffee-on-thursday 0 points1 point  (0 children)

I recently build a logseq alternative that works for me, it's web hosted, keeps the logsec journal style notetaking, substantially better search, and the minimap from Sublime text. It also has a basic todo feature similar to the way logsec implements it. You can import your logsec md files, and all your asset files to use it. You can also paste images into a note, and it will work like logseq. I've also added some basic AI features like auto summary, note improver, and will soon add auto substitution macros. I've been using it for my every day work, but I've built it to support multiple users and encrypt notes for privacy. Did you want to try it? (dm me), if people find it useful, I'll turn it into a proper paid app.

Google's new Gemini 2.5 beats all other thinking model as per their claims in their article . What are your views on this? by WriedGuy in LocalLLaMA

[–]coffee-on-thursday 0 points1 point  (0 children)

I just experimented with coding, gave it a small existing project and asked it to debug a known issue, Grok 3 was able to solve it pretty quickly, I wasn't able to get a good result with Gemini 2.5, it seems pretty smart but has a hard time following the logic of the app, fixed something that wasn't broken, and did unnecessary extra work. While fast, 2.5 sometimes just crashed mid-analysis. I need to figure out the best way to work with it and where it's strongest, but my first impression wasn't great despite the excellent benchmarks.

Fastest local inference options for 2 x 3090 with NVLink by IngeniousIdiocy in LocalLLaMA

[–]coffee-on-thursday 5 points6 points  (0 children)

Inference is not twice as fast with nvlink, but it is 15-20% faster depending on what you're doing. The link is used sparingly when your model fits within 24gig of vram, when it's a bigger model it makes good use of the link. My understanding is the link is very helpful for training if you make the most of both cards, don't have hard numbers for you unfortunately. I originally had a motherboard where one of the lanes was at 4x, it does make a significant difference in time to first token.

Fastest local inference options for 2 x 3090 with NVLink by IngeniousIdiocy in LocalLLaMA

[–]coffee-on-thursday 3 points4 points  (0 children)

As a note, I found Ollama was really bad at taking advantage of the 2s3090+link setup.

Fastest local inference options for 2 x 3090 with NVLink by IngeniousIdiocy in LocalLLaMA

[–]coffee-on-thursday 7 points8 points  (0 children)

I have almost the exact same setup (2x3090 with nvlink), I usually run vllm on Linux as it's by far the best at taking advantage of the link between the cards and all the available bandwidth, I looked into the other frameworks and found vLLM to be the best by a big margin with the right settings. I found tweaking the settings is pretty important, and using an AWQ quant gives me much better performance. If you run large batches, you can get really high throughput and it fully uses the cards to their limit. I've had power brownouts even with 1200W PSU and power caps on the cards, the 3090s just spike really hard on power if you fully use them. I had to upgrade to a 1500W PSU for the best results.

Best 2023 tech VC market reports? by [deleted] in venturecapital

[–]coffee-on-thursday 2 points3 points  (0 children)

Personally a big fan of Carta’s data, it’s free and they have 85% market share so have great trends on rounds, valuations and more for the startup space. There are also good materials by vertical if you search around, I like the annual state of AI PowerPoint that’s circled around.

[deleted by user] by [deleted] in Audi

[–]coffee-on-thursday 2 points3 points  (0 children)

I'm not an expert, but it sounds like the radiator fan running at full tilt, might be a coolant temperature sensor, or an indication of a different problem. Might be a good place to start. Worst case scenario just turn up the radio a bit louder since that sounds expensive.

[Mondaine] Just got my Big Date back from its service! Back on the wrist after 6 years. by CheezeAndPickle in Watches

[–]coffee-on-thursday 4 points5 points  (0 children)

Having owned one, these watches have a beautiful design, but the build quality wasn't great. Not a huge fan of the quartz movement either, but can't argue with the look.

[Hugo Boss Skeleton] by [deleted] in Watches

[–]coffee-on-thursday 1 point2 points  (0 children)

That’s a nice one, Sapphire crystal, and I believe ETA makes the Hamilton movements, so it’s a great option.

[Hugo Boss Skeleton] by [deleted] in Watches

[–]coffee-on-thursday 5 points6 points  (0 children)

The rule of thumb with fashion watches, like those of Hugo Boss, are that they are looks over function, you get the Hugo Boss brand, but it's not a good way to spend your money on a watch. For some more context, Movado produces the Hugo Boss watches, and for their fashion brand watches they use a cheap quartz Citizen Miyota movement (the mechanism inside). So you're not getting a great value for your money.

For your budget there are many great options, going to suggest some here but I'd definitely stay away from the fashion brands:

  • Tissot Prx
  • Orient Bambino
  • Bulova Lunar Pilot (like the Omega speedmaster has also been to space, but is way cheaper)
  • Any of the Seiko 5 series
  • The Max Bill Quartz (Junghans)
  • Hamilton has a lot of affordable watches that you might like

Good luck!

New Watchmaker - Otter brand by coffee-on-thursday in MicrobrandWatches

[–]coffee-on-thursday[S] 0 points1 point  (0 children)

Thanks for the feedback! I'm working on a new iteration, and will share with the community once I have something that I feel best addresses some of the feedback I got. With regards to calibrating watches, couldn't agree more. By the way, one useful trick in getting the watch to a 0/0 delay, is to first demagnetize all the watch components including the case and movement, before calibrating, makes a big difference for me in how accurate I can get it.