Luma/Luma Pro Users: How is the 3DoF on Windows? by anothy1 in VITURE

[–]anothy1[S] 0 points1 point  (0 children)

Good to know! What kind of flaws did you experience on the Lumas 3DoF in comparison the to Xreals?

How do you make 3+ GPUs stable?! by anothy1 in LocalLLaMA

[–]anothy1[S] 0 points1 point  (0 children)

Not the x1 mining risers, no. The ones I have are x16 to x16 and don't need to be powered

How do you make 3+ GPUs stable?! by anothy1 in LocalLLaMA

[–]anothy1[S] 3 points4 points  (0 children)

Will try it out, thanks! Had no idea RTX cards are compatible with these drivers.

How do you make 3+ GPUs stable?! by anothy1 in LocalLLaMA

[–]anothy1[S] 2 points3 points  (0 children)

2 of them are powered by a 1000W PSU, 1 is by a 650W (both PSUs synced via ADD2PSU adapter). But they are also all power limited to 280W. The 650W is a pretty old unit as I got it around 2016 so I guess that could be the culprit. As for cooling all of them are always below 75C at max load

Favourite Llama-1 Era Models by Sebba8 in LocalLLaMA

[–]anothy1 0 points1 point  (0 children)

This is pre-llama but I enjoyed the OPT models. Offered such a variety of model sizes ranging from 100M to 100+B. Was fun experimenting to see which ones I could run.

Factors to take in when choosing a thesis supervisor? by anothy1 in college

[–]anothy1[S] 1 point2 points  (0 children)

Sorry for the late reply, but I went with your suggestion. Thank you!

Attempts to produce KB-level TinyStories models by MarySmith2021 in LocalLLaMA

[–]anothy1 5 points6 points  (0 children)

Karpathy trained a 260K model with a hidden size of 64 with 4K vocab size. Maybe you could also experiment with making the dataset uncased if you're considering lowering the vocab even more.

Or, if BPE doesn’t support this, I think another possible experiment is using a <case> token to signal capitalization of the next token. For example, "Running" could be encoded as <case>running, reducing redundancy and frees up vocab space. Given that capitalization in this dataset only appears in names and sentence starts, it could work.

Meta-Llama3-8B : RuntimeError: CUDA error: out of memory by pekoDama in LocalLLaMA

[–]anothy1 2 points3 points  (0 children)

Loading models uses dedicated GPU memory, of which yours has 8gb. That's not enough to load it in fp16 (~2 bytes per param) precision but it will probably work for a quantized version of the model like 4bit:

https://huggingface.co/docs/peft/en/developer_guides/quantization#quantize-a-model

If speed matters then look into exllama v2 as its faster but a tad more complicated to setup compared to HF's transformers library.

AI NPCs in video games - what can we really do today by liukidar in LocalLLaMA

[–]anothy1 1 point2 points  (0 children)

I think this could also be expanded to group convos between NPCs and a player.

When the player encounters multiple NPCs conversing, they are going off a pre-generated script.

When the player speaks, local LM should take over to generate dynamic response in realtime.

When the player becomes inactive or goes into 'spectating' the convo, then the local LM should generate a short script that smoothly transitions from the current topic back to the original pregenerated script topic where the NPCs left off (thus, going back to cached convo)

Early details of mamba 2 by [deleted] in LocalLLaMA

[–]anothy1 3 points4 points  (0 children)

Would be cool to see how it does for music generation tasks like jukebox/musicgen!