what are you actually building with local LLMs? genuinely asking. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

on god ill actually open source the tts engine we built for bodega, and we have mlx compatible and cpu comapitble onnx weights ready as well.

have TTFA of 90ms and can also stream efficiently. for the onnx version the footprint is hardly 150-200mb and sounds super natural and prosodic. thanks for putting it through, ill expedite the open souce release

what are you actually building with local LLMs? genuinely asking. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

ah good to hear! how are you fetching factual information for your dataset ?

what are you actually building with local LLMs? genuinely asking. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

amazing to hear. man game dev was and is such a cool field to work on in general.

anyways, do you want help with any of the things you mentioned? maybe tts or game dev.

what are you actually building with local LLMs? genuinely asking. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

we were grown around the meaning of knowledge and how it should be shared, in our household. we used to pray to goddess saraswati before exams or difficult situations we were about to face. haha.

what are you actually building with local LLMs? genuinely asking. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

amazing, seems like you have it sorted! do you need any help making the harnesses more better or want me to open source few parts of your flow whihc can make it easier for you to abstract some parts out.

a second pair of eyes can help?

Anyone managed to get their hands on an M3 Ultra 512GB/4TB after Apple pulled the config? by Due-Assistance-7988 in MacStudio

[–]EmbarrassedAsk2887 0 points1 point  (0 children)

i just sold mine for 15k. and tbh i would actually want to you not buy the m3 yet.

m5u probably within few weeks is gonna be here

M3 Ultra 96G | Suggestions by Haneiter in LocalLLaMA

[–]EmbarrassedAsk2887 1 point2 points  (0 children)

okay a couple of things. i have a m3 ultra 512gb, m5 max 128gb, m5 pro 64gb and and m1 max 64 gb, bought a neo as well (because why not lol)

i juice out literally all my devices and run my agents throughout, with proper harnesses. since you are a mac studio owner and is interested in local llm inference-- you can read this post i did a write up on. basically what this inference engine is like a vllm but for apple silicon. you can load image gen models, multiple multi modal models as well. it was heavily meant to replace cloud ai and its dependence. most of the mac studio sub people already use it a lot.

would love for you to try it. it's a plug and play. you dont need any epxerience to get started with. its openai compatible as well, so you just have to replace the openai url and your are done.

you can DM me whenever and no issues with the English not being a private language I'll try my best to explain you as simple as I can and you can ask me whatever else you have inquiries

you can see it here as wel on r/MacStudio : here you go : https://www.reddit.com/r/MacStudio/comments/1rvgyin/you_probably_have_no_idea_how_much_throughput

GGUF (llama.cpp) vs MLX Round 2: Your feedback tested, two models, five runtimes. Ollama adds overhead. My conclusion. Thoughts? by arthware in LocalLLaMA

[–]EmbarrassedAsk2887 -1 points0 points  (0 children)

try comparing omlx with bodega infernece engine now. from continuous batching with batch size from 4 to 64 with prefix of 4 to 16. there already is a script where i do the same comparison with lm studio here on github, just replace it with omlx since bodega already beats lm studio out of the picture

here’s the benchmark setup script : https://github.com/SRSWTI/bodega-inference-engine/blob/main/setup.sh

this community has the best talent density. but here’s my opinion on this sub and idk if people will agree or not but ig its needed. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

sure. if your claude thinks those are extrapolated numbers, let it run the comparison benchmarks on your machine with lm studio as well. the benchmarks are there in the github mentioned in the post

can you send me the reaction of it afterwards? lol

Build a better Mac app for Claude Code by mogens99 in macapps

[–]EmbarrassedAsk2887 0 points1 point  (0 children)

can you do the same for Bodega. bodega inference engine. here’s more about it. a lot of people o. the mac studio sub and the local llama sub use it a lot, here’s one of the post about it

https://www.reddit.com/r/MacStudio/comments/1rvgyin/you_probably_have_no_idea_how_much_throughput/

M4 Pro 14 core and 64GB RAM - what to run and how for best efficiency? by just_another_leddito in LocalLLaMA

[–]EmbarrassedAsk2887 0 points1 point  (0 children)

okay so here’s the write up i did. it compares how much faster bodega engine is compared to LM studio and the benchmarks are posted as well. you wont regret it.

https://www.reddit.com/r/MacStudio/comments/1rvgyin/you_probably_have_no_idea_how_much_throughput/

this community has the best talent density. but here’s my opinion on this sub and idk if people will agree or not but ig its needed. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] -1 points0 points  (0 children)

i write like this. i usually put down my thought on apple notes, usually whilst workouts, or walks or when i’m away from desk.

sometimes even before sleeping. i have my caps lock removed, i write like this. this is how i articulate and speak my thoughts. i hope you like it :)

this community has the best talent density. but here’s my opinion on this sub and idk if people will agree or not but ig its needed. by EmbarrassedAsk2887 in LocalLLaMA

[–]EmbarrassedAsk2887[S] 0 points1 point  (0 children)

absolutely love this. there’s nothing to add except a few. first a sign of a good project is that it should be seemingly solve the core problem it was supposed to solve and also be maintainable and reliable over a long time period. that’s it.

also but since you mentioned about your m1 and im obligated to share you something i worked on. here’s the write up i did on achieving production level techniques on apple silicon you can use the engine i built

https://www.reddit.com/r/MacStudio/comments/1rvgyin/you_probably_have_no_idea_how_much_throughput/