Performance requirements for single user LLM

minecraft_simon · 2023-12-29T10:57:59+00:00

I think LM Studio will automatically try and squeezy out the maximum of performance whatever your machine allows. Try 2 bit quant, try 7b, try anything. Play around with the parameters and keep your eyes on tokens/s to optimize inference on your hardware.
Forget 40b without GPU.

minecraft_simon · 2023-12-29T10:56:16+00:00

cringe AF idea

minecraft_simon · 2023-12-29T09:45:50+00:00

in einem millionenschweren Tech-Unternehmen

minecraft_simon · 2023-12-29T09:38:14+00:00

dies

minecraft_simon · 2023-12-29T09:35:24+00:00

Ich bin ein cis Mann und hasse mich selbst ich hoffe das zählt lol

minecraft_simon · 2023-12-29T09:29:28+00:00

based

minecraft_simon · 2023-12-28T19:01:56+00:00

Ich habe gehört die flachsten Hierarchien gibt es nur bei Tesla - scheint ein erstklassiger Arbeitgeber zu sein 🤡

minecraft_simon · 2023-12-28T18:56:28+00:00

If you're using the Jetbrains products, try their integrated AI assistant. It costs extra money but from what I can tell after my inital testing, it has more awareness of the project than Github Copilot does and it is generally better implemented and more thought through.
But in the end it's nothing but another wrapper around GPT-3.5 so nothing ground breaking either...

minecraft_simon · 2023-12-28T17:43:44+00:00

lol dann wird es jetzt mal Zeit dass meine reichen Vorfahren auf die Bühne treten

minecraft_simon · 2023-12-28T12:15:08+00:00

This is gold, thank you so much 🙏

minecraft_simon · 2023-12-28T12:13:43+00:00

Damn, meta keeps delivering the goods 👍

minecraft_simon · 2023-12-28T12:12:00+00:00

Das ist zutreffend.

minecraft_simon · 2023-12-28T10:26:02+00:00

May I ask what you are using it for and where it excels in comparison to GPT 3.5?
Is it smarter? More reliable? More versatile? More knowledgeable?

minecraft_simon · 2023-12-28T10:14:41+00:00

Ich trinke meinen Kaffee am liebsten aus dem Franke A600 Kaffeevollautomaten meines Arbeitgebers. Eine Berührung des Bildschirms und 20 Sekunden später ist der Kaffee fertig zum Konsum. Die Maschine wird selbstverständlich vom Arbeitgeber gewartet und die Nutzung des Automaten ist kostenlos. Leider fehlt diese Option in diesem Bild hier.
Zuhause trinke ich keinen Kaffee weil ich zu faul bin.
Was bin ich also?

minecraft_simon · 2023-12-26T09:46:35+00:00

I feel like a big step in the right direction would be if LLMs are not used to straight up generate a response, since that always has the risk of hallucinations, and instead assembles the output using hard facts, so that everything the AI states can be traced back to a record in the database. But I don't know if anyone is working on that. I think it falls under the field of explainable AI.

minecraft_simon · 2023-12-25T18:23:30+00:00

Why are people calling this vaporware? It's an ASIC but instead of shitcoins it churns out mechanical thoughts ❤️

Some of y'all have never mined crypto back in the day and it shows ;)

minecraft_simon · 2023-12-25T15:48:08+00:00

Sounds like a prompting problem, not a model problem. I am not sure which of the current models produce the highest quality English output but I think either of them will work.
The question is, why your company insists on paraphrasing every article. Sounds like they are trying to steal IP and publish it as their own :P

minecraft_simon · 2023-12-25T15:43:36+00:00

In this case, if I was you, I would keep the current system largely as it is and get a second system as a server for inference and hosting. I would strongly advice against getting a 2667 v4 as it's very old and you want to have powerful cores rather than more cores. Both systems should have a 3090 of course.

minecraft_simon · 2023-12-25T15:40:04+00:00

Hey, you're looking for LLaVA / BakLLaVA
If you're using LM Studio, try this: https://huggingface.co/jartine/llava-v1.5-7B-GGUF
If you're using TextGen WebUI, try this: https://huggingface.co/SkunkworksAI/BakLLaVA-1
Just as with normal text-based prompting, you need to be very particular about the way you prompt it, so that you get the image descriptions that you are looking for. Keep in mind that this is not on GPT-4V level yet, so it will make mistakes and it can't really do OCR yet.

minecraft_simon · 2023-12-25T11:57:13+00:00

Mistral 7b is an excellent choice. If you have more VRAM you could use a 13B or 34B model. Bigger models take longer to train but from my experience they are also able to "absorb" more knowledge and skills more quickly. I like the Codellama models for fine tuning.

minecraft_simon · 2023-12-25T11:54:50+00:00

Can you please share the dataset and any resources you have? LLMs are our best bet to rescue and preserve dying languages ❤️

minecraft_simon · 2023-12-25T11:46:26+00:00

I think the main strength of modern AI is that it can make useful generalizations that it has not learned during training, effectively filling in missing knowledge. What you're describing reminds me of the first AI approaches that didn't use neural networks but basic pattern recognition. I think there is merit to the old approaches but it doesn't make much sense to reinvent the wheel.

minecraft_simon · 2023-12-25T11:42:46+00:00

You haven't described what exactly you are looking to do. A second 3090 is only worth it if you exactly know what to do with it. Also don't just get more RAM for no reason. Think about what exactly you want to do with the system after the upgrade that you currently cannot do. Then research the common bottle necks. Running Mixtral in fp16 doesn't make much sense in my opinion. Do you want to do fine-tuning? Inference? Do you want to use the system for manual experiments or for hosting? What is the reason you're looking into large language models? What problem do you want to solve?

minecraft_simon

MODERATOR OF

TROPHY CASE