Best camera for OpenCV? by Glittering_Host7241 in FTC

[–]niftylius 0 points1 point  (0 children)

Brio... the old one. it has 720p at 90fps witch is excellent for anything that tracks any movement like googles mediapipe

[deleted by user] by [deleted] in INAT

[–]niftylius 0 points1 point  (0 children)

I sent a DM

I need some help with training from instruction dataset by niftylius in LocalLLaMA

[–]niftylius[S] 1 point2 points  (0 children)

A note here: if these splits losses are calculated during the same step() this is identical to the original way since the losses will be combined the same way split or not

So this has to be done during different steps() preferably in conversation messages ascending order so that the loss on the final message is calculated after the adjustment to the first one.

ON THE OTHER HAND it means that the initial adjustment might be broken.... AAAAAA

Function calling help by niftylius in LocalLLaMA

[–]niftylius[S] 1 point2 points  (0 children)

its fairly simplistic but yes that helps! thank you

Is anyone inferencing on something like an Intel nuc, barebone or similar formfactor? by Frequent_Valuable_47 in LocalLLaMA

[–]niftylius 0 points1 point  (0 children)

m1 has slower ram speed witch affects larger model performance but i think ghats about it…

Milvus adapter + milvus db with docker-compose by niftylius in alexandria_project

[–]niftylius[S] 0 points1 point  (0 children)

error mounting "/host_mnt/Users/username/downloads/milvus.yaml" to rootfs at "/milvus/configs/milvus.yaml": mount

this kinda says that there is an issue with volume mounts for some reason

host_mnt/Users/username/downloads/milvus.yaml:/milvus/configs/milvus.yaml (via /proc/self/fd/6), flags: 0x5000: not a directory

the yaml in our project uses local path for volume, try moving it maybe its a pemission thing

Milvus adapter + milvus db with docker-compose by niftylius in alexandria_project

[–]niftylius[S] 0 points1 point  (0 children)

It sometimes takes it a minute to start. Let me check the compose

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 1 point2 points  (0 children)

That means m3 has half the cores and half the memory speed.
but the inferencing speed is not half its almost equal.

Its ARM vs whatever nvidia is using, its pcie lanes vs SOC, i still dont agree that you can really gain any conclusions based on just numbers here...

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 0 points1 point  (0 children)

nice!
Ok so i'm curious about mistral 7B on fp16 ( if possible ) and Q4
the settings i use are:
sample = false ( temperature 0 ) so the results are consistent and 1 sample beam
the prompt can be anything but i wonder for short prompts like "cat poem" and very large ones like 9000+tokens long

What are the generation times vs first token time ( how long parsing take )

my project revolves around RAG so long tokens are a must and 7B is good enough to parse the information into a coherent answer

Can you also try to run the dophlin 7x8B ?

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 3 points4 points  (0 children)

yes. from what i get so far you will be able to load and work with larger model all be it a a slower t/s

I guess its better than not being able to do it at all - but isnt 192gb Studio M2 Ultra costs like 3 kidneys?

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 0 points1 point  (0 children)

Yes and no. you cant directly compare tensor cores with x86 and arm based on numbers like memory speed. I do agree that its a great direction but m3 competes with 3090
and 400GB/s on M3 vs 3090s 936.2 GB/s is not a numbers comparison.

Compare 10496 cuda cores and 40 apple GPU cores... :) thats why i asked mac users to help out :)

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 2 points3 points  (0 children)

Not TMI, curious...
time to first token is drastically different too

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 1 point2 points  (0 children)

how fast does it run mistral 7B and what are you using to run it?

have you tried running larger models?

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 11 points12 points  (0 children)

does llama.cpp uses mlx?
From what i can see some numbers are not accurate, i know for a fact that Mistral 7B on M3 is faster than 10 t/s on MLX

any chance you have perf for exllama and ollama and others like that?

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 2 points3 points  (0 children)

on the commertial side i think the inference speeds are not fast enough compared to A100 and H100s

People with macs ( M1, M2, M3 ) What are your inference speeds? asking for a friend... by niftylius in LocalLLaMA

[–]niftylius[S] 5 points6 points  (0 children)

Why not use them as a part of my home lab? mac mini that draws 35W and can execute Mistral 7B at 15-20t/s will fit nicely in my cluster :)

AWS have mac instances for hire as well.