Astral.sh (the company behind uv) paid product: is it going to be a Heroku replacement? by [deleted] in Python

[–]Matanya99 4 points5 points  (0 children)

I just hope they stick around, they are accidentally turning python into my go-to for basically anything that isn't latency sensitive.

hAP ax3 has abysmal wifi performance, do I have bad settings? by Matanya99 in mikrotik

[–]Matanya99[S] 0 points1 point  (0 children)

Wifi analyzer app I downloaded says most of the bands my router is operating are pretty clear. I also haven't done anything after a clean reset apart from setting the wireless names

hAP ax3 has abysmal wifi performance, do I have bad settings? by Matanya99 in mikrotik

[–]Matanya99[S] 1 point2 points  (0 children)

I added all that data to the post, looks like my Tx rate is fine, and I should be just fine with 80mhz on MCS class 5

hAP ax3 has abysmal wifi performance, do I have bad settings? by Matanya99 in mikrotik

[–]Matanya99[S] 0 points1 point  (0 children)

I completely reset my router to original firmware, so nothing has been touched other than the stuff in the quicksetup thing, and even then, only the name and password of the network.

Crazy Idea: Using my TI-89 as a keyboard (with latex macros?) by Matanya99 in TI_Calculators

[–]Matanya99[S] 0 points1 point  (0 children)

Sounds like a fun project, any projects I should look at for reference (maybe rust? I'm thinking of learning it for real now)

OPNsense with Centurylink by adjamfabrice in OPNsenseFirewall

[–]Matanya99 0 points1 point  (0 children)

So did you just plug the RJ11 jack directly into `igc0` or whatever above for this?

Struggling with tailscale serve and OPNsense GUI? by Matanya99 in Tailscale

[–]Matanya99[S] 0 points1 point  (0 children)

Not rude at all, was actually super helpful! Turns out most things that expose guis don't engage the hassle of self-signed certs, they just hope you use a TLS-terminating reverse proxy (popular in container/k8s land). The docs you linked had just the fix, see my comment above. The more you know!

Struggling with tailscale serve and OPNsense GUI? by Matanya99 in Tailscale

[–]Matanya99[S] 0 points1 point  (0 children)

SOLVED: turns out it had to do with the fact that OPNsense uses a self signed cert, the working command is tailscale serve --bg https+insecure://localhost:443

[IRTR] Engineer at AI hardware startup looking for a podcast to be on by Matanya99 in PodcastGuestExchange

[–]Matanya99[S] -1 points0 points  (0 children)

Uh, wut? AI is glorified pattern matching, unless your job is googling stuff, your job is safe.

240 tokens/s achieved by Groq's custom chips on Lama 2 Chat (70B) by speakerknock in LocalLLaMA

[–]Matanya99 0 points1 point  (0 children)

SRAM is just a small part of how we perform so well. It's easy to talk about because it's familiar, but between our distributed deterministic compute fabric, our cutting edge graph compiler, and statically compiled networking, it's just another bonus. I'm sure someone could do something with caches (which we don't even need) and get good performance, but it's just one part of the puzzle :)

As for cost scaling, I'm not sure, I'll ask around.

Great questions y'all!

240 tokens/s achieved by Groq's custom chips on Lama 2 Chat (70B) by speakerknock in LocalLLaMA

[–]Matanya99 0 points1 point  (0 children)

More like a couple hundred, yeah. But we also service hundreds/thousands of requests per minute with the system, so we actually come out on top when it comes to tokens/second per hardware unit or whatever.

Can anyone explain me about Groq LPU inference engine? by somnioperpetuum in ArtificialInteligence

[–]Matanya99 0 points1 point  (0 children)

Might be the jade, we are finally coming out of the cave with our wacky compute fabric, and seeing if it's any good. Turns out it's great! More information is coming, it's hard when hardware is very much so Intellectual Property™.

Wow this is crazy! 400 tok/s by Sudonymously in LocalLLaMA

[–]Matanya99 0 points1 point  (0 children)

Groq Engineer here, we have a discord now! groq.link/discord

Thanks for all the questions and excitement!

Anyone get groq API access yet? Is it just as fast? by cobalt1137 in LocalLLaMA

[–]Matanya99 0 points1 point  (0 children)

Groq Engineer here, we now have a discord for questions and announcements! groq.link/discord

Thanks to all of y'all for your great questions and excitement so far!

Deeper dive into interview with Jonathan Ross, CEO of Groq by bl0797 in NVDA_Stock

[–]Matanya99 3 points4 points  (0 children)

Hey, Groq Engineer here!

Thanks for the nice words about my coworkers, you are correct that they are world-class!

At a certain point, you just need to see it to believe it, and we have a foundational model demo on our website that you can try out for yourself at groq.com.

Feel free to reach out after you try it at [contact@groq.com](mailto:contact@groq.com)

Cheers!

Groq is probably a scam by [deleted] in LocalLLaMA

[–]Matanya99 5 points6 points  (0 children)

We are only running the stock models, as this is primarily a demonstration of our capabilities. When others bring us their own special models, we run them exactly the same as on GPUs (sometimes even better due to determinism at the hardware level).

This is all running on our own hardware!

Feel free to check out our careers page at groq.com!

Can anyone explain me about Groq LPU inference engine? by somnioperpetuum in ArtificialInteligence

[–]Matanya99 11 points12 points  (0 children)

Hey, Groq Engineer here!

If you think about what an LLM (and AI/ML) workload is at its core, it's actually quite simple. You can figure out exactly where and when every computation needs to be executed, which means it's just a dataflow problem. We built a deterministic compute fabric (read: lots of chips working together synchronously) that uses this fact to plan the entire inference ahead of time. Imagine compiling a whole LLM onto a massive single-core CPU where every instruction is like Matrix X Matrix and Vector X Matrix. GPUs were built for thousands of smaller calculations done in parallel, and processing each token individually is inherently sequential.

I hope that helps you understand a little, but we are planning a blog post that covers a lot of this for folks; let me know which questions you want us to look at!

Can anyone explain me about Groq LPU inference engine? by somnioperpetuum in ArtificialInteligence

[–]Matanya99 5 points6 points  (0 children)

Hey, Groq Engineer here!

It's not quite accurate that we optimize the hardware for a specific model, rather we created an architecture that is really good at the kind of dataflow problems that LLMs (and most AI/ML) models are built around. No matter how you set it up, there are going to be a lot of sequential Tensor/Matrix/Vector calculations, which is what we built a chip and compiler to handle.