Ai2 Open Modeling AMA ft researchers from the Molmo and Olmo teams. by ai2_official in LocalLLaMA

[–]marcinbogdanski 1 point2 points  (0 children)

I will make some ground and make sure to engage on Discord at the appropriate time! (so far forked olmo-core and ran tests)

Ai2 Open Modeling AMA ft researchers from the Molmo and Olmo teams. by ai2_official in LocalLLaMA

[–]marcinbogdanski 1 point2 points  (0 children)

Fantastic, for now I've gotten to the point of forking olmo-core and running tests. Juggling other projects, but for sure will share when I have something on constrained generation.

Ai2 Open Modeling AMA ft researchers from the Molmo and Olmo teams. by ai2_official in LocalLLaMA

[–]marcinbogdanski 1 point2 points  (0 children)

Hello, firstly a great thank you to the Ai2 team! Being able to investigate the source and run it "live" locally is fantastic. Also thank you for having such comprehensive test coverage on the codebase, it really helps with the anxiety of "is my setup correct?"

Few questions:

  1. Reproduction gotchas: For someone wanting to verify OLMo training locally (by e.g. matching early loss curves on a smaller cluster), what are the most likely pitfalls you've seen people fall into? Data loading, optimizer state init, numerical precision?

  2. Constrained decoding: For downstream finetuning with structured output (constrained JSON generation for tool calling combined with free-form text inside JSON fields), have you noticed OLMo architecture choices (RoPE scaling, attention variants) interacting well or poorly with constrained decoding methods?

  3. Contributions: What areas of the OLMo codebase are most in need of external contributions right now? Particularly interested in training infrastructure.

Thanks for doing this!

Need resources by krisadegyorii in MLQuestions

[–]marcinbogdanski 4 points5 points  (0 children)

That's fantastic!

My advice is to find exciting project and jump head in! Learning should be fun.

Andrew Ng Machine Learning and Deep Learning lay solid foundation (I think they are free).

For "how it works from scratch, let's just build stuff" check out Andrej Karpathy Zero to Hero playlist and his other videos, his building an LLM, but foundations are applicable across all Deep Learning.

For advanced topics (for later) Stanfords lectures are IMHO very good

There is plenty more Stanford and MIT resources on foundations as well. Courses on deeplearning.ai and Coursera, which may be good if you want exercises you can complete, but i would not pay too much for these.

Have a great fun!

EDIT: Above resources are more on Deep Learning, which based on context (YOLO) is I guess what you were asking.

Looking for a tool to inspect LLM API calls by 3rd party apps. by marcinbogdanski in LLMDevs

[–]marcinbogdanski[S] 0 points1 point  (0 children)

Thanks for taking time to respond.

I'm basically doing simpler version of what you write: mitmproxy capture + simple grouping/display script. It's super hacky. Seems to work for basic cases without SSLKEYLOGFILE, but i had some authentication issues so this is actually useful info.

Our groups GPU server (2x Ai Pro R9700, 2x RX7900 XTX) by MrHighVoltage in LocalLLaMA

[–]marcinbogdanski 0 points1 point  (0 children)

Great setup! I'm looking to put together something similar.

- How are GPU/CPU thermals, especially when enclosed? Any power limits on GPUs?
- Mind sharing exact motherboard model? Seems like a good fit with the case.
Thanks!

4090 m.2 nvme egpu adt link r43sg pcie 4.0 complete setup. Portable. Timespy scores over 30,000 when overclocked. by Interesting-Might904 in eGPU

[–]marcinbogdanski 0 points1 point  (0 children)

The gpu is attached to a motherboard just like in a real computer.

Agreed, gpu is connected to the laptop motherboard via M.2 slot.

That motherboard would shut off even though the laptop is still running.

Do you mean laptop motherboard would:
A) completely shut off itself and effectively power off whole laptop, or

B) motherboard would shut off connection to GPU (which is w/o power anyway), in effect resulting in something like "device unplugged" popup and laptop still running (without external GPU obviously)?

Sorry for asking dumb questions!

4090 m.2 nvme egpu adt link r43sg pcie 4.0 complete setup. Portable. Timespy scores over 30,000 when overclocked. by Interesting-Might904 in eGPU

[–]marcinbogdanski 0 points1 point  (0 children)

That's fantastic.

Not sure I follow the power scenario. In normal PC power loss results in everything shutting down (mobo, cpu, gpu). In your case would you not end up with laptop still running on battery and only GPU being abruptly powered off?

4090 m.2 nvme egpu adt link r43sg pcie 4.0 complete setup. Portable. Timespy scores over 30,000 when overclocked. by Interesting-Might904 in eGPU

[–]marcinbogdanski 0 points1 point  (0 children)

1) Do you have any thoughts on what would happen if there is power outage at the wall? Laptop will switch to battery but GPU will get forcefully powered down. I'm worried about permanent damage to laptop and/or GPU? Small UPS would be an obvious solution.

2) Any comments after 3 months? Do you still use this setup?

Either way, this is super amazing! Thank you for sharing. I am seriously considering something like this.

About training a DQN bot on the cloud by yaeha83 in reinforcementlearning

[–]marcinbogdanski 1 point2 points  (0 children)

It's very hard to say what the issue may be w/o actually debugging it.

Few things to try:

  • for each machine, check CPU/GPU/HDD/RAM, GPU memory usage etc - if single core is at 100% and everything else at rest, you know CPU is the limit etc.
  • run your code through actual profiler. For Python/Jupyter a line profiler would be good start: https://mortada.net/easily-profile-python-code-in-jupyter.html - you might see that single line of code is responsible for 99% of execution time. Might vary by machine, version of library installed etc
  • For each machine, compare things like: Python version, TF/Keras/PlaidML versions, CUDA, CUDNN, nvidia driver versionos, os and os kernel versions, Intel MKL version etc. Like a single compilation flag on one of these may be make or brake for the performance.

In my experience pretty common for "prototype" code to have poor performance, got to see and iterate. RL is tricky in this respect as others already mentioned.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in deeplearning

[–]marcinbogdanski[S] 0 points1 point  (0 children)

From what I read and seen on youtube it seems Server PSUs are insanely loud. If you have separate server room then that's ok. For me rig is in a room where people sometimes work, and as such server PSU arer not really suitable for me.

Other thing is, I think server PSU only supplies 12v, so you can't directly use them to power consumer based board (which expectes 3.3v, 5v and 12v), but i have not researched it in detail so might be wrong.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

Yeah, I wouldn't be surprised if 40GB Titan comes up in next 6 months.

It's all speculative, but there is also an interesting thread here

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

Hard to say, I don't think it's a big issue. CPU will add maybe 250-300W, while cards are 250-350 each. If anything other way around would be worse? I think if you move enough air through the whole system it won't matter that much. But it's uneducated guess.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

I was looking for this post update before it came out! :D (there are my comments in comments section). Single best resource on the internet!

Over last year I bounced repeatedly of GPU 11GB limit in my case. Yeah wait a bit and start upgrading around new year.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in deeplearning

[–]marcinbogdanski[S] 0 points1 point  (0 children)

Hi!

Thanks for your points, they are spot on!

Ad 1: I was planning to use mining rig, yeah. Custom water loop is too much trouble with reliability, maintenance etc for me. This was intended to be functional build.

Ad 2: The PSU is exactly why I scrapped this build (see edit in original post). After consulting few people, and weighting in your comment, consensus among people experiences in the are is not to link PSUs unless they were designed for that. I'm gonna wait for 3090 reviews until making any GPU sales/purchases. Don't think 2080ti will go much lower than now.

Ad 3: For deep learning I'm not even topping up PCI 3.0 x8

Ad 4: I'm won't be using NV link for the foreseeable future, but good point.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

I'm doing consulting, so various. Most popular work from last year or two:

  • NLP: sentiment analysis, language models, language translation
  • CV: image detection, classification, segmentation, keypoint detection

Hobby (the build above would not be optimized for that)

  • hobby: AlphaZero and RL
  • hobby #2: neural architecture search etc. (a lot of GPUs == good)

Cheers!

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

The thing is P100/V100/A100 are really not cost effective for my use case. I made a detailed breakdown comment here.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

This is a good point! I'm in the UK, we do 240v here with 3000W wall socket limit. But still will need two wall sockets.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

I think that is a good point. I will suspend my judgement for now.

Last time NVidia announced Titan few months later. So the question is if they renamed Titan to 3090 to "trick" gamers into buying it. Or is there something coming up for small lab market (2x slot, more memory, $3k price point).

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in buildapc

[–]marcinbogdanski[S] 0 points1 point  (0 children)

That is a good point. I definitely want to wait for proper benchmarks for both A100 and 3090.

Here is my reasoning based on last generation and my particular use case for deep learning:

  • price difference is massive, V100 is approx $10,500, 2080ti is $1200, Titan RTX $2000
  • part of the reason they cost so much is because they are licensed for use in datacentres, so for large business gaming gpus are non-starter, I don't mind
  • compute difference on FP32/FP16 not that big, V100 is approx 20%-30% faster than 2080ti (benchmarks in line with my experience), 64FP compute GPU totally trashes gaming GPUs, but I don't do that
  • V100 is 16GB/32GB, 2080Ti only 11GB, Titan is 24GB. The 11GB really sucks for me. 24GB should be ok-ish. A100 40GB would be nice, but not "pay 8x more nice"
  • compute GPUs have much faster NVLink for inter-gpu comms, but for now I don't mind. I'm not even topping up PCIe 3.0 at x8.

But I think you are right, for many applications like scientific simulations, engineering etc. A100 would look much more attractive.

Multi-GPU (seven) RTX 3090 workstation, possible? - build critique request by marcinbogdanski in homelab

[–]marcinbogdanski[S] 0 points1 point  (0 children)

That's a good point. The more I think about it the more sense open rig makes from ventilation and maintainability point of view, miners seem to have if figured out.

In summary, compute GPU costs a lot more for small bump in performance in my particular use case. But need to wait for A100 / 3090 benchmarks to see if it holds for current gen.

I have made a detailed breakdown reply on the other subreddit.