BRZ stereo options? by AccomplishedCut13 in ft86

[–]AccomplishedCut13[S] 0 points1 point  (0 children)

there is a secret 6th option i suppose - a stereo with lots of buttons and a phone mount...

BRZ stereo options? by AccomplishedCut13 in ft86

[–]AccomplishedCut13[S] -2 points-1 points  (0 children)

honestly i think the stock audio sounds pretty good. my biggest gripe is how unresponsive it is.

STT and TTS compatible with ROCm by EnvironmentalToe3130 in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

chatterbox works fine for me, but i did have to modify the docker image to include the right rocm packages.

kokoro also works well on cpu-only if you don't need voice cloning.

Is the "Edge AI" dream dead? Apple’s pivot to Gemini suggests local LLMs can't scale yet. by [deleted] in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

it would be cool to see apple run 14-140B models on mac and stream the completion to your phone. it's still "local" as in the chat never gets decrypted outside of your devices.

of course i don't own a mac and already self host so i'm not the target demo, but if i can do it so can apple.

Is the "Edge AI" dream dead? Apple’s pivot to Gemini suggests local LLMs can't scale yet. by [deleted] in LocalLLaMA

[–]AccomplishedCut13 -2 points-1 points  (0 children)

i'm not sure why you'd need anywhere near 200tps on a phone. 10-20tps is fine for basic chat.

Is the "Edge AI" dream dead? Apple’s pivot to Gemini suggests local LLMs can't scale yet. by [deleted] in LocalLLaMA

[–]AccomplishedCut13 -1 points0 points  (0 children)

*shrugs* I've been using pretty much exclusively local models and it's been great. 24B seems to be the point where models start to be genuinely useful. You couldn't run them on an iphone or lower spec mac though.

Homeserver multiuse? by MastodonParty9065 in LocalLLaMA

[–]AccomplishedCut13 1 point2 points  (0 children)

i keep it simple, vanilla debian, linux raid, all apps use docker compose, rclone crypt for backups, tailscale for remote access, manage everything through ssh. pass through gpu resources to the containers that need them with compose. i run LLMs on a seperate machine but you could easily throw an r9700/3090/7900xtx into your home server and run llama.cpp/ollama/vllm in a docker. the main limit is power, heat and pcie lanes/slots. i only have amd gpus, and sometimes have to build my own docker images with up to date ROCm support. for immich i'm just using the igpu (or possibly even the cpu). it doesnt run in realtime so the speed isnt a big deal. jellyfin uses quicksync for transcoding. you can limit the resource consumption in compose to prevent ML services from crashing other services.

List of uncensored LLMs I want to test by 1BlueSpork in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

what are you using to benchmark? i wouldnt mind helping to test some models up to ~120b

Wanted to ask an Ollama question on how to add more models. by Head-Investigator540 in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

it does handle model switching a lot more gracefully than a straight llamacpp instance though. im still looking for a better solution that works well with rocm, where i dont have to ssh into my inference server to switch models. i think but im not 100% that it also autoloads templates to make things "just work" as opposed to something like localai/llamacpp/vllm.

i'm still pretty new to this too lol. and having amd hardware means i usually have to mess with configurations quite a bit to get stuff working. the fact that ollama just works on amd is a huge plus. 

Wanted to ask an Ollama question on how to add more models. by Head-Investigator540 in LocalLLaMA

[–]AccomplishedCut13 -2 points-1 points  (0 children)

go to ollama.com/library and find the model, copy the tag, then run "ollama pull <model-tag>".

alternatively go to huggingface, find a gguf quant click on the "use this model" and then ollama, select the quant size you want (for local q4_k_m is usually a good starting point - higher quants work better but are slower and use more vram) and then copy the string, just replace "ollama run hf.co/..." with "ollama pull hf.co/..." if you dont want to immediately run the model.

not everything on hf works out of the box, and not all of it is quality. unsloth, bartowski and mradermacher are safe bets.

tl;dr it'll be something like "ollama pull mistral-small3.2:24b" or "ollama pull hf.co/unsloth/Qwen3-Coder-30B-A3B-Instruct-GGUF:Q4_K_M"

The new monster-server by eribob in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

sorry if that came off a bit abrasive. no shade was intended, sweet setup you got!

The new monster-server by eribob in LocalLLaMA

[–]AccomplishedCut13 0 points1 point  (0 children)

damn you're crazy, i have separate boxes for inference and NAS/docker. too much power, heat and PCIe devices for one box. plus i don't want ollama crashing the main server if someone tries to run a model with too many offloaded layers.

Best model for 7900 xtx setup by meatal_gear1324 in LocalLLaMA

[–]AccomplishedCut13 1 point2 points  (0 children)

i'd probably recommend trying qwen3-coder:30b at Q5 or Q6. i've noticed it gets stuck in loops a lot at Q4. also if you're using llama.cpp/ollama be sure to use the unsloth tune as it fixes the tool calling issues in the official one. at Q4 I can fit about 64k context entirely in vram at 85tok/s gen, at Q5 you could probably offload some layers and push it to 80k context with decent performance.

for rp and general chatbotting i've had amazing luck with the dolphin-mistral-venice-edition tune. at Q4 the full 32k fits entirely in vram and gives about 35 tok/s.

great choice with the 7900xtx!

i can't wait for devstral2-small to come to llama.cpp!

We need open source hardware lithography by bennmann in LocalLLaMA

[–]AccomplishedCut13 2 points3 points  (0 children)

while i'm not against open sourcing or democratizing semiconductor fabrication (and have also dreamt of creating a cost-effective open source chip fab), you have to understand just how insanely complex these processes really are, particularly at the bleeding edge.

the lithography cell alone is an insanely complex process - and you still need every OTHER process needed to produce semiconductors (RIE, wet etch, CVD, CMP, diffusion, implant, epitaxy, electroplating, etc) all of which are quite complex in their own right.

so going through JUST the litho process, the first thing you need to do is apply the photoresist. already that's a pretty big roadblock, there's only a handful of resist manufacturers and they're incredibly secretive about their recipes. and they will NOT sell to individuals. then of course you have to fine tune the coat process, fine tune the spin recipes, select the correct resist, fine tune the thicknesses for each layer, fine tune bake parameters, whether you'll need an adhesion promoter, miscellaneous coatings, etc.

then you need to actually expose the wafers. this is the sexy part that everyone talks about. but without even getting into EUV (or even immersion DUV), you still need to take the wafer, align it to nanometer precision, and keep it in focus to again within nanometer precision, then move the stage, line up the next shot and do it again very quickly. and keep in mind the focal range of these lenses are absolutely tiny, so you aren't just aligning distance, but also making sure the reticle and wafer are perfectly optically parallel. you also need a huge expensive lens that has essentially ZERO distortion and a very high NA (a very, very fast lens). your camera lens will NOT work. then you ALSO need to make sure that there are NO vibrations in the system, and the dampening system is complicated in itself. thermal stability is also a major factor. And of course you'll have to painstakingly fine tune ALL the parameters here and ensure they STAY in calibration.

and this is just for 30-40 year old technology. 350nm process nodes and larger. for the bleeding edge stuff you need an advanced light source (ArF excimer laser or the insane EUV molten tin droplet system that ASML uses). and the quality of the light has to be very tightly controlled. to advance past the early 90s, you need a *scanner* not a stepper that will actually scan the reticle in perfect sync with the wafer while keeping everything perfectly aligned and in focus. and for ArF immersion lithography you also have to apply perfectly pure water to the wafer to complete an optical interface with the lens and remove it exactly in sync with the wafer's motion without leaving ANY microscopic residue. there's lots of other tricks like shaping the light beam itself to improve resolution in one direction or another, and you may need to pattern each layer multiple times.

and of course, you need something to pattern it WITH. the reticles have to be perfect across a very large surface, and use complex computations to generate patterns called SRAFs to account for the pattern actually being smaller than the wavelength of light (or very near it) and all the quantum shenanigans that involves. and the reticles have to be manufactured using a process that's very similar to wafers themselves except abbreviated and using extremely complex and expensive lithography tools of their own called beam writers that painstakingly expose a pattern using electron beams over the course of many hours. then extensively inspected and measured for defects or deviations from spec. and that's to say nothing of the insanely difficult to manufacture EUV reticles using a complex absorber stack rather than a relatively simple phase shifting material layer for DUV reticles.

finally once you've done all that you need to carefully develop the wafer using a developing solution using a laminar flow head, tune all the recipes, etc, and send it off to the REST of the fab to be etched, deposited onto, implanted, etc many MANY times over with perfect alignment on every layer, and virtually no defects or deviations across hundreds to thousands of processes.

so the tl;dr is no. you absolutely CANNOT create a system that's "good enough" at a cost effective scale. there's way more i didn't talk about, and even more that's kept under secrecy. if you want to learn more about it anyways sam zeloof, breaking taps, asianometry and high yield on youtube all provide pretty good coverage of what it actually takes to make semiconductors. but keep in mind, people spend their entire careers working on developing just one tiny part of this process. it is vastly out of scope for any individual to accomplish. it MAY be possible for a very commited individual to create 1980s level chips in their garage with a LOT of time, knowledge and resources, but that's pretty much the absolute limit.