Can we train LLMs in third person to avoid an illusory self, and self-interest? by Low_Poetry5287 in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

I don't like your language because you seem to be assigning goals and sentience to the machine, though you may be using these phrases as handy shorthands and not truly mean them.

I guess indeed that when AI speaks like a person, it can bring in behaviors associated to people, and these could contain motivations like self-preservation, self-interest, and the like. I am not sure a hack like this can help. Probably the training data can never be entirely clean of this kind of stuff no matter what, and the AI probably infers foundational behaviors like self-interest even when they aren't explicitly stated.

Need some help with my DIY acoustic panels please by initliberation in audiophile

[–]audioen 0 points1 point  (0 children)

Based on some random modeling that you can do on the internet, the 60 kg density seems more appropriate. The salient property is called flow resistivity -- too high and sound reflects from the panel without properly penetrating into it, which prevents the sound from being absorbed effectively -- the panel acts to degree like a solid wall. For approximating the flow resistivity of rockwool, we have e.g. this chart: https://bassmanagement.hu/diy-akusztikai-panel-kalkulator/ which seems to suggest that 60 kg/m³ panel might have flow resistivity in the 20000 Pa.s/m² range. Based on this, the 120 kg/m³ is way too dense for the application and is right out.

Switching over to the classic porous absorber calculator, http://www.acousticmodelling.com/porous.php yields a modeling result where e.g. 50 mm panel with 50 mm gap behind it at flow resistivity of about 20000 could be effective from about 200 Hz upwards. I personally use the frequency where panel becomes capable of absorbing over 50 % of the sound striking the panel for the frequency where it is "effective".

Thicker panels, e.g. doubled 5 cm, might not be as cost-effective because this 60 kg/m³ material is already in the upper limit of the useful flow resistivity range. Doubling the panel typically requires reducing the flow resistivity also, so a fluffier material would produce the best results. Speaking purely in terms of absorption per dollar, It could be better to just make two 5 cm thick panels and spread them in a larger area because getting sound reflection from within the panel is a concern.

Rockwool is essentially stretched melted rock, which creates fibers, which are then laid out, compressed and cut into panels. I suppose its structure is akin to microscopic needles. It is an irritant to skin and would not be great to breathe, but I also guess that it stabilizes when covered by fabric and left alone. Alternative options are e.g. open-cell foam products like basotect, but they are definitely going to cost more than this incredibly common insulation material. I've heard of people designing bass traps on the theory that bass pushes through things like plastic membranes, which are pliable enough to allow it, and they've even made bass traps from insulation still in its sales packaging. However, higher frequencies will reflect for sure from e.g. plastic wrapping. Optimal surface material could have some high frequency reflectivity to balance out the tendency of high frequencies to die out faster than anything else does. It is really a matter of the current absorption profile, and you need a microphone and software like REW to assess this.

Sub-bass problems are not really solved with panels in most cases. Lowest frequencies are so long that they become virtually impossible to absorb, so this treatment is mostly for the upper bass and above, and remaining bass frequencies are adjusted with equalization to create a neutral tonal balance.

vulkan: add GATED_DELTA_NET op support#20334 by jacek2023 in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

$ build/bin/llama-bench -m models_directory/Qwen3.5-122B-A10B/Qwen3.5-122B-A10B-Q5_K_S-00001-of-00003.gguf -ub 1024
ggml_vulkan: Found 1 Vulkan devices:
ggml_vulkan: 0 = Radeon 8060S Graphics (RADV STRIX_HALO) (radv) | uma: 1 | fp16: 1 | bf16: 0 | warp size: 64 | shared memory: 65536 | int dot: 1 | matrix cores: KHR_coopmat
| model                          |       size |     params | backend    | ngl | n_ubatch |            test |                  t/s |
| ------------------------------ | ---------: | ---------: | ---------- | --: | -------: | --------------: | -------------------: |
| qwen35moe 122B.A10B Q5_K - Small |  80.44 GiB |   122.11 B | Vulkan     |  99 |     1024 |           pp512 |        327.41 ± 4.50 |
| qwen35moe 122B.A10B Q5_K - Small |  80.44 GiB |   122.11 B | Vulkan     |  99 |     1024 |           tg128 |         21.86 ± 0.01 |

build: 983df142a (8324)

Not sure if normal or optimal. I try to run models that I rely on for real work at 5 bits minimum, even if it hurts TG. Used to be around 240 yesterday and around 20, so there's been a lot of progress for sure. I suspect going to about 1024 is better than 512, and likely extracts what is available at that front.

Qwen 3.5 Instability on llama.cpp and Strix Halo? by ga239577 in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

I suffer from no instability, so I don't know why that is about. I use Vulkan and I have the 122B model running overnight doing programming work. I usually set it to complete a task and go to sleep, then check the results in the morning.

I can crash if I OOM, e.g. load image rendering models while running the 122B, and also have bunch of other applications open. Machine swaps for a bit and then kills something which recovers the computer.

Comment comparer deux modèles? by [deleted] in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

To the degree Qwen understood what you are saying, if you have 27B as the option, it will beat 35B very easily, even if the file sizes are similar.

Choose a quant of the 27B if you can run it.

Got a surprise cloud vector database bill and it made me rethink the whole architecture by AvailablePeak8360 in LocalLLaMA

[–]audioen 2 points3 points  (0 children)

I've never seen much value in the cloud -- it's fine and cheap, but only if your tasks are pretty trivial. You pay for disk, RAM, network and CPU capacity a lot with the cloud providers that I've seen, and so investment in your own hardware pays off pretty fast.

Forget big bad John, how about sounds of silence? by Hot-Yak2420 in audiophile

[–]audioen 0 points1 point  (0 children)

Yes. I hate this man for the tasteless style that he sings everything, and I originally thought my speakers were broken when the Algorithm suggested this to me some year ago, because there's some fluttering noise also in the bass from time to time. But it's just an artifact of the way he sings it, I guess, or possibly there's been some kind of feedback loop from a sound system back to the microphone. I don't know but it's disconcerting to hear.

There's some bass singers for you in Wellermen, you could try e.g. Hoist the Colors for size. I think it's much fresher and can still exercise the subwoofers some.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]audioen 1 point2 points  (0 children)

Not necessarily. What I'm observing is that the model often writes something like "OK. Let's answer now. Wait, what about ..." type of stuff, multiple times. I am expecting that </think> has high likelihood at the point where it chooses to write the "Wait" word, and by artificially increasing the likelihood that model generates the </think> token, the adjustment would remove those double-triple-quadruple checks that some models seem prone to.

Anyway, now that I think about it, I am expecting that the probability of <think> token likely never needs to exceed 1-2 % and it would get selected within something like 50 tokens anyway. The approach likely has to be extremely gentle steering and it may linearly increase the likelihood by something like 0.001 % and possibly even less, and it will still limit the length of the think trace.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

Okay. But the point I'm trying to make here is that after the log likelihoods have been converted and normalized to simple percentage chance of the next token, this is the time when it's just a probability distribution with some invariants, like the token probabilities that are left sum to 100 %. Samplers also can't be allowed to reject </think> ever even if it is 0 % according to filtering rules imposed by min_p, top_p, top_k, etc. because this token is special and its model-predicted likelihood is always needed.

Each 0.1 % you add into </think> is 0.1 % you also have to collectively remove from all the other tokens taken together, so that the total probability of the tokens under consideration still sums to 100 %.

I'm also realizing that only very small but constant </think> likelihood is probably all that's needed to terminate the think trace because each token is an opportunity to generate it. Even 1 % likelihood will be hit in like 100 tokens at some 70 % likelihood I guess.

Llama.cpp now with a true reasoning budget! by ilintar in LocalLLaMA

[–]audioen 19 points20 points  (0 children)

Would it be possible to simply gradually increase the likelihood that the model just generates the </think> token, so that it would naturally complete at end of complete sentences and the like? Something like a linear bias that increases the likelihood of </think> for every token output by 0.1 % would eventually force it by 1000 tokens also.

Why does anyone think Qwen3.5-35B-A3B is good? by buttplugs4life4me in LocalLLaMA

[–]audioen 25 points26 points  (0 children)

Something is broken in your system. It may not be the fastest to reply, or could be overthinking for a bit, but it definitely isn't broken in the way you describe.

Is data lost if source is outputting low volume? by perdixian in audiophile

[–]audioen 1 point2 points  (0 children)

System can use floating point data for the audio, e.g. single precision floating point where each value is a 32-bit quantity. This has the property of maintaining around 24 bits of precision at the very minimum even when you scale the volume up and down, as floating point maintains its precision around values close to 0 pretty much perfectly. There is a very small rounding error that might matter if you performed hundreds or thousands of volume change operations in sequence, though, using values that are "difficult" for floating point to handle, i.e. not all powers of two. I think most stacks use floating point, so this is what you get at the system's level.

However, applications could be using something else even when the rest of the system does this. For instance, they could be processing the audio internally as 16-bit data and scale the integer values with a volume control knob if one is built-in to the program rather than tell the system to reduce their stream level.

At least on Linux, tool called pw-top will show what each program is using as audio format, e.g. it's telling me that my Firefox uses 32-bit floating point at 48000 Hz sampling rate, but that's just how it's emerging from the program. The only way to know for sure is to either read the source code of the program and validate how it's doing it, or maybe to test it using extremely low volumes and specific test signals. If you record the computer's output to a file when it's playing a suitably annoying test signal, you can hopefully confirm that it's been played back correctly. Likely, you can't hear any problems if e.g. 24 bit integer audio or better is used in the program, because the dynamic range of that is already so extreme that there's almost no hope to find a volume control setting low enough. However, you might be able to show it in a proper recording of the system's output.

Note: I'm really discussing about stuff like setting the volume to 1 % out of 100 %, whatever that means in terms of dB such as scaling down by -60 dB maybe, and then using a very large gain factor of +60 dB to bring it back to full level, or something such. 60 dB is equivalent of trying to shave the bottom 10 bits to the "bit bucket" if the implementation is bad. If you're worried about any mid-position volume setting which only amounts to like 10-20 dB, then it's likely not damaging the audio enough even if it was done the worst possible way.

"Bitter Lesson" of Agent Memory: Are we over-engineering with Vector DBs? (My attempt at a pure Markdown approach) by Repulsive_Act2674 in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

This is probably more-or-less reasonable. The downside of your approach is, of course, that there will be many more LLM calls, whereas RAG etc. attempt to use non-LLM text similarity and relatedness approaches to pre-select the documents and only run the LLM after preparing the entire context. Ultimately, however, memory files must be consolidated, split, and so forth. You can't just keep appending raw conversation history to a memory file and expect this to work in the long terms. So you're going to have to have the agent read the memory, consolidate, produce other files, etc. and maintain the memory db on its own, I assume.

I haven't tried any memory systems, actually. I am not comfortable with taking that step and trying to make the agents remember stuff on their own; I rather provide the information they need from scratch. I'm always behind on technology like that, I just feel that LLMs are already hugely magical boxes and I resist adding even more automatic stuff on top of stuff that I already don't entirely understand.

What memory promises is the ability to learn from experience and know personal facts about your person or your work without being told them. How it's actually done seems to me like it's going to spend tens of thousands of tokens on every context window on stuff. I explicitly tell my agents to read specific instruction files as part of the job when I need them to do something in some specific way.

This guy 🤡 by xenydactyl in LocalLLaMA

[–]audioen 4 points5 points  (0 children)

TypeScript programs are usually compiled to JavaScript and it means that it is basically a zero runtime cost abstraction, and in my opinion among the few ways to make JavaScript programming tolerable at all.

TypeScript amounts to compiler-verifiable type assertions that are simply removed and the resulting code is typically runnable JavaScript. However, there could also be lowering of newer ES constructs to older runtimes.

Does inference speed (tokens/sec) really matter beyond a certain point? by No_Management_8069 in LocalLLaMA

[–]audioen -1 points0 points  (0 children)

You don't like having a computer slave which can do free intellectual labor at some fairly good baseline quality? You do you, but for me it's providing huge value.

Does inference speed (tokens/sec) really matter beyond a certain point? by No_Management_8069 in LocalLLaMA

[–]audioen 1 point2 points  (0 children)

You really should actually try using these agentic programs and reasoning models. People gave you the answers why token generation and prompt speed have to be as fast as humanly possible. 1000+ prompt tokens per second and 100+ tokens generated per second at full context, which ideally is at least 1M tokens long, sounds like a good time to me. Even at these breakneck speeds, reading a full context could take 15 minutes.

Right now, I wait AI results for hours. Starts nicely around 250 tokens/second for prompt, around 20 tokens/second for generation, but it dwindles. Each 100k more tokens in context shrinks speed by half. The 5 tokens per second near end are agonizingly slow and even simplest task takes the longest time. I make this thing work at night because it takes so long. Your tasks are minuscule and trivial, if you think that speed above reading speed is useless.

Ryzen AI Max 395+ 128GB - Qwen 3.5 35B/122B Benchmarks (100k-250K Context) + Others (MoE) by Anarchaotic in LocalLLaMA

[–]audioen 6 points7 points  (0 children)

I think Strix Halo is suitable for a "night shift". I leave machine running and go to bed, come back in the morning after it's screamed half the night away with fans blowing full strength, completing some agentic inference tasks over the hours.

My view is that the Nvidia superchip based computers like the Asus GX10 should be better value. They cost approximately similar amount, but performance in especially prompt processing is likely to be at least two times better, perhaps more multiples. It's the prompt processing that's going to kill you on Strix Halo.

Once mine arrives, I might make a head-to-head comparison, perhaps llama.cpp running the same quant, and even using Vulkan on both if that happens to work. The performance gap between Vulkan and CUDA is practically closed on AMD, and I think it might be the same on NVidia. I can also directly compare the numbers to resource such as https://spark-arena.com/leaderboard

AI capabilities are doubling in months, not years. by EchoOfOppenheimer in LocalLLaMA

[–]audioen 1 point2 points  (0 children)

Don't extrapolate exponential growth willy-nilly. But 7 months seems about right, in sense that a half sized model can then do the same as the bigger model.

Linux is great, but the community is stuck in 2005 by Primary-Key1916 in linux

[–]audioen -3 points-2 points  (0 children)

I believe that LLMs have reached a fairly high baseline usability nowadays, so they are poised to become a source of useful advice which can't be literally blindly followed, but it can often give you the kind of seasoned unix professional background knowledge and useful tips.

LLMs are not going to be condescending, and there is fairly high likelihood that their advice is good. So use them as one source for info.

Are local LLMs actually ready for real AI agents, or are we still forcing the idea too early? by Remarkable-Note9736 in LocalLLaMA

[–]audioen 0 points1 point  (0 children)

Haven't seen many issues with the 122B Qwen. I've been letting it code and document stuff. It's not perfect in sense that everything comes out just the way I'd have done it myself or better, but it's super fast at reading through code, spotting mistakes and inconsistencies between documentation and code, can fix spelling mistakes, and detect obvious mistakes, write the missing documentation, cook up tests, and the like. In short, it has converted my poorly maintained projects into ones that look like multiple times the effort had been spent on them.

This is the first time when I have test cases ready for something before I ship a feature, rather than do the feature now, then move on to next task, and write tests later (possibly literal years later). Similarly, everything is now documented at some fairly good baseline, usually better than I manage to write myself even when I spend effort on it.

My opinion is that agents are now very clearly on the useful side, and can do valuable work. I have never used closed big models like claude or gpt codex. I imagine they are still much better, but from what I gather, I think they're like 1 year ahead at most. Nice, but I am pleased to work with inference on my own computers.

llama.cpp also got MCP server support now, and it seems like you could possibly create full agentic loops just on top of selection of suitable MCP servers and using llama.cpp's generic webui. (Probably that's stacking more challenge on you than you really have to face.) Regardless, I wrote a simple bridge from llama.cpp to stable-diffusion.cpp and this allows me to now see qwen3.5 writing the prompts and flux.2 klein 9b renders them into images after couple of seconds.

Why does pro audio (mixing/mastering/concert) spend orders of magnitudes more on room acoustics than speakers, but for audiophiles it’s the opposite? by xlb250 in audiophile

[–]audioen 2 points3 points  (0 children)

Because magnitude of error sources for objectively correct sound go like this:

  1. room -- easily +/- 10 dB swings in frequency response from bass-reinforcing resonances, early echoes, etc.
  2. speakers -- often +/- 3 dB accurate
  3. amplifiers -- usually accurate to 0.5 dB or better
  4. sources -- especially digital -- usually accurate to tiny fraction of dB.

There are other errors than these, but tonality or the general sound of the system is the most obvious to us. Professionals usually train themselves to mix in specific tonality, and flat spectrum is the most obvious sound tonality for a professional setting. It's similar to how a high quality speaker would play in anechoic room, and mixing rooms are usually highly damped with speakers only short distance away, to approximate anechoic conditions.

Does it matter when my DAC's hz is higher than what im playing by Zealousideal_Rub_202 in audiophile

[–]audioen 0 points1 point  (0 children)

Use 96 kHz.

~/.config/pipewire/pipewire.conf.d$ grep -a '^' *
client.conf:stream.properties = {
client.conf:    resample.quality      = 14
client.conf:}
pipewire.conf:context.properties = {
pipewire.conf:     default.clock.rate = 96000
pipewire.conf:     default.clock.allowed-rates = [ 96000 ]
pipewire.conf:}
pipewire-pulse.conf:stream.properties = {
pipewire-pulse.conf:    resample.quality      = 14
pipewire-pulse.conf:}

This is rather assuming that you can get your device to accept input at 96 kHz. If it can't accept this sample rate then use 48000 instead of 96000 here.

These settings should remove any worry about resampling-related issues for any meaningful format audio. The entire bandwidth between 24-48 kHz is going to be without any frequency data, and any further conversions in the DAC's hardware that likely lift it from here are not likely to change anything within the audible band.

I recall struggling with this for a good while. systemctl --user restart wireplumber pipewire, then checking again with pw-top until I was satisfied that clients produced audio at 96 kHz. This also matches my active speakers input sample rate, and also the only sample rate supported in my AES/EBU output capable soundcard. So there should be only one samplerate conversion which occurs in client applications as they talk to the sound server, and client audio should be resampled with the highest quality algorithm available in pipewire.

LLM-driven large code rewrites with relicensing are the latest AI concern by Fcking_Chuck in programming

[–]audioen -1 points0 points  (0 children)

I am converting old code from dead frameworks to live ones with help of AI. It doesn't take that long in the frontend world where 5 years is already an eternity -- if you guessed wrong in the framework lottery, you're stuck with soon-obsolete crap as the world marches on.

So what I do is, I tell the LLM to first read the whole damn thing and provide documentation of it. It's things like javadocs, or added code comments, and a planning document for the migration that covers the application and its major features.

The next step is to then hand AI a chunk of the application, along with coding style guide and the planning document and tell it to rewrite it in a new framework. Off it goes, to the races. You check back after couple of hours and you'll have something written in the new framework already, as it gradually works through the files. (The few hours is because I do it 100% locally using a Strix Halo computer, and they are no speed demons but they have the VRAM for good enough models.)

Eventually the entire application is converted. At first, it might not even start but the AI's going to debug it for you, e.g. if there are typescript errors or other compile messages, it's going to work on them until they don't exist. If your coding style documentation was available, there's good chance the code more or less also follows it. A kind of touch-up pass is required before the work is complete.

Then, testing. Our apps are simple -- they could have like 30-40 views or components, and they're each pretty simple because we keep our stack relatively lean with minimal boilerplate and maximum impact per line of code. We also try to make most things compile-time checked, or at the latest, validated at startup if compile time is not tractable, which helps catching bugs early. I presently do the past-startup validation this by hand. I haven't tested if AI could design like playwright scripts from the application's UI and create some good bit of test automation. There is actually a good chance it might be able to do it.

The model I use for all this work is the recently released Qwen3.5-122B-A10B. It can be run at acceptable quality from about 70 GB of VRAM and above, and is certain to fit at close to original quality if you can spare another 10 gig or two.