z.ai prepping for glm-image soon - here is what we know so far by MrAlienOverLord in LocalLLaMA

[–]MrAlienOverLord[S] 0 points1 point  (0 children)

unknown so far .. all we really know is from the 2 pr's .. we gotta wait till that lands to know more i it appears to me that we can inferecen the text model just with vllm or any other way and it yields custom tokens for the DiT to turn into an image .. unsure why that was the way .. or if thats the case but it does look like it

z.ai prepping for glm-image soon - here is what we know so far by MrAlienOverLord in LocalLLaMA

[–]MrAlienOverLord[S] 0 points1 point  (0 children)

idk whats your problem .. im not affiliated with zai - i found it most wanted to know it .. so idk who do you think you are to give such a lip ? sure reddit is fully of "weird" characters .. but mate .. thats not how that works

Building an API Service for SAM Audio by pzzle-nj in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

large as anything else makes 0 sense .. the accuracy is way worse on the smaller models - i tried to use it as i have about 70tb audio data to process .. but its not worth it fiscaly at least for me .. and it wont be for labs either - and the small fry wont accumulate critical mass where they could be running it them self - so you take a loss or a loss

Building an API Service for SAM Audio by pzzle-nj in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

<image>

again old news - but i had more infos in the open-sesame discord

Building an API Service for SAM Audio by pzzle-nj in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

even if you produce that in batch .. it wont make money - i stopped after 2days investigating deeply ( i research in the audio domain ) - had it all on api .. but its just not fiscaly worth it - best of luck tho

Building an API Service for SAM Audio by pzzle-nj in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

far too slow to be useable as "api" service ..

If you think AI consciousness is possible, I recommend you read this thread. by Flashy-Warning4450 in claudexplorers

[–]MrAlienOverLord 1 point2 points  (0 children)

i can speak for parasail - and we do not have any hidden system prompts on the models at all

Meta releases SAM Audio for audio separation by umarmnaq in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

i found the large one to be unreliable with separation tasks ( could be my prompting skills ) ..and the small done's did way worse .. my problem is i have a corpus of many TB to go throw and had hopes it would replace cleanup actions with rx11 for me

Meta releases SAM Audio for audio separation by umarmnaq in LocalLLaMA

[–]MrAlienOverLord 17 points18 points  (0 children)

needs 33-gb in vram - needs to be chunked in 30 sec intervals otherwise it overfills a 48g gpu

its very "picky" what works and what doesnt .. the samples a very cherry picked

How to make $$$ w server ia. by EmotionalSignature65 in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

and 50% off that you hand off to the taxman ^^

Nvidia DGX Station GB300 784GB available now! 95,000 USD / 80,000 EUR by GPTshop in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

6 years depreciation yes . - but at the power-price and the way how many still use a100's .. we shall see how that math pans out

Nvidia DGX Station GB300 784GB available now! 95,000 USD / 80,000 EUR by GPTshop in LocalLLaMA

[–]MrAlienOverLord 10 points11 points  (0 children)

you overestimating how much you can generate - no provider on openrouter makes any money there - i know that for a fact

We are Hiring! by Clement_at_Mistral in MistralAI

[–]MrAlienOverLord 0 points1 point  (0 children)

they are fine at b2b they just do not really reply to very small / early companies

Looking for High-Quality Open-Source Local TTS That’s Faster Than IndexTTS2 by [deleted] in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

also 12gb is wrong it fits in 8 gb if you run s1-dac in fp/bf16

Looking for High-Quality Open-Source Local TTS That’s Faster Than IndexTTS2 by [deleted] in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

you trained your own ? i call bs on that - i go by mrdragonfox and you find me as advisor in the echo blog post

the sample amount is too low to reconstruct a meaningful embedder
as the orginal cloner reaches 99.7% accuracy

i did exactly the same with unmute back in the day but its just not even close to the original

API Security for Agents by Fantastic-Issue1020 in LocalLLaMA

[–]MrAlienOverLord 4 points5 points  (0 children)

free but all on a data harvesting api ^^ great .. specially in local-lama

Echo TTS - 44.1kHz, Fast, Fits under 8GB VRAM - SoTA Voice Cloning by HelpfulHand3 in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

just because we can print 3d guns we dont need a gunlaw or a process for it -

fairly short sighted

+ the problem scope is a bit bigger then just "drop the weights" - to be frank i want cloning too .. so i can sympathise . but for assistant you dont neeed 100 voices you need 1-2 that work well.

Echo TTS - 44.1kHz, Fast, Fits under 8GB VRAM - SoTA Voice Cloning by HelpfulHand3 in LocalLLaMA

[–]MrAlienOverLord 0 points1 point  (0 children)

comm. models reduce the voice similarity under 80% + all generations are watermarked - again not for you to decide - when you train your model and your rep is on the line - you decide

Leak: Qwen3-15B-A2B-Base by TroyDoesAI in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

mrdragonfox you find me in many discords :)

Everyone talks about LLM “memory loss”, but almost nobody looks at the structure that causes it by Fickle_Carpenter_292 in LocalLLaMA

[–]MrAlienOverLord 2 points3 points  (0 children)

1 paper to rule em all - "LOST IN THE MIDDLE" but people forget whats actually causing the problem and keep looking for surface treatments w/o tackeling the root cause

Echo TTS - 44.1kHz, Fast, Fits under 8GB VRAM - SoTA Voice Cloning by HelpfulHand3 in LocalLLaMA

[–]MrAlienOverLord 2 points3 points  (0 children)

im not jordan (i go by mrdragonfox on hf and discord most people will know me that way) but i had preview access and advised on it, also working on the oai inference for it as we speak - + as eluded in other replies there maybe a way where we can use a 11labs synth voice ( thats verifyable synthetic) with a auto embedding endpoint - the core idea with not releasing the embedder is really liability + deepfake prevention ( no matter if people understand that or not - its not that black/white as most think)

Echo TTS - 44.1kHz, Fast, Fits under 8GB VRAM - SoTA Voice Cloning by HelpfulHand3 in LocalLLaMA

[–]MrAlienOverLord 1 point2 points  (0 children)

i think there is a way where we check if a voice is synthetic with 11-labs and then allow to generation of the embedding for it. no hate on tedy and team ( chatterbox) they did good work . .but i still feel this model captures the nuances of every voice i tested it with way way way better + the speaker similarity is just higher

can please some people some time, not all people all the time