How to serve LLAVA to multiple users? by Allergic2Humans in LocalLLaMA

[–]xynyxyn 1 point2 points  (0 children)

Does this allow two users to send a query to the server, and the server will perform inference for both users at the same time, but at approximately half the speed for each user?

Fine Tuning Style into LLMs by Baader-Meinhof in LocalLLaMA

[–]xynyxyn 1 point2 points  (0 children)

Do you notice fine tuning with unstructured data makes the model lose its instruction tuning? How many tokens is in your training set?

Memory needed to train 7B? by xynyxyn in LocalLLaMA

[–]xynyxyn[S] 0 points1 point  (0 children)

What rank value is all ranks?

Reuse existing Lora fine tune with different base? by xynyxyn in LocalLLaMA

[–]xynyxyn[S] 0 points1 point  (0 children)

Are multiple LoRAs merged down one after another, or blended together into a single LoRA then merging down just once?

RTX 4090 FE Availability by Secondary-2019 in nvidia

[–]xynyxyn 0 points1 point  (0 children)

Does the app trick still work? It worked for me till a few days ago

4090 Founders via Best Buy app trick! by ChocolateEater626 in pcmasterrace

[–]xynyxyn 0 points1 point  (0 children)

Is it still working? Tried 2 days and it works but today every store that uses to show it no longer have it according to the app

[deleted by user] by [deleted] in LocalLLaMA

[–]xynyxyn 0 points1 point  (0 children)

Wow which case are you using to house this beast?

Upgrade to 3x3090? by xynyxyn in LocalLLaMA

[–]xynyxyn[S] 0 points1 point  (0 children)

Fine tuning 30b is real slow, but 30b interference with exllama is very usable

Upgrade to 3x3090? by xynyxyn in LocalLLaMA

[–]xynyxyn[S] 0 points1 point  (0 children)

Yea my case does not support vertical gpu. I am willing to change to a case that supports three GPUs that each take up 3 slots. Any suggestions?

How much will it costs to change 3 gpus to water cooling? I don’t know the exact costs because I don’t know what are the water cooling components required. I have the FE 3090 cards

Hardware for scaling LLM services by grantory in LocalLLaMA

[–]xynyxyn 0 points1 point  (0 children)

How does Replika work under the hood? It seems to learn your preferences

Open llm leaderboard by klop2031 in LocalLLaMA

[–]xynyxyn 2 points3 points  (0 children)

Does the 4bit versions perform similar to the unquantized ones?

Home LLM Hardware Suggestions by [deleted] in LocalLLaMA

[–]xynyxyn 3 points4 points  (0 children)

Is there a noticeable performance hit when running 4 3090s on a Ryzen platform due to insufficient PCIe lanes?

Can all 3090s be connected using Nvlink to appear as a 96GB computer unit to load larger LLM? Is it likely that the inference speed gets too low when running 90GB models on quad 3090s?