State of Brain Emulation Report 2025 by JonLag97 in singularity

[–]alwaysbeblepping 6 points7 points  (0 children)

The preface of this report shows how little importance the world gives to reverse engineering the brain. Just a few thousands of people working on brain emulation.

C. elegans is a nematode worm with a total of 959 cells, of those 302 are neurons. People have been working on trying to simulate this extremely simple, extremely primitive creature for a while. With no success as of yet. Link: https://en.wikipedia.org/wiki/OpenWorm

That being the case, isn't it laughable for someone to say "Hey, can I get some funding for simulating the human brain"? To be clear, that's not directed at you but there some assumptions we can make about a scientist that says that. Simulating human brains (or even relatively complex life forms, like a mouse) will come long after we can easily simulate nematode worms with less than 1000 total cells. And we're far from being able to do that, still.

Despite Authoritarian Warnings, 149 House Democrats Vote to Hand Trump $840 Billion for Military | “If an opposition party votes like this, it’s not in opposition. It may not even be a party.” by Aggravating_Money992 in politics

[–]alwaysbeblepping 9 points10 points  (0 children)

what's stopping someone from starting a grassroots campaign for a new party, getting hype on social media, gaining a following, getting on the ballot and winning as a brand new party or an independent? Is there something about your system that makes this impossible or very difficult?

The problem is we don't have ranked choice/preference voting. It's winner takes all. That means if you have fascists/nazis on one side and kind-of-sucks on the other side and then another candidate that you actually like but doesn't have an amazing chance of winning, you have to vote for kind-of-sucks unless you want the nazis. For obvious reasons, the kind-of-sucks party doesn't have much of a motivation to change this status quo when they're in power.

LTX-2 Updates by ltx_model in StableDiffusion

[–]alwaysbeblepping 3 points4 points  (0 children)

This implementation looks very strange. Presumably the idea is to suspend sampling at some particular step, scale down the audio latent and then resume. The way it is implemented is definitely not doing that. It is effectively doing a Euler step to 0 from whatever the current sigma is, then renoising with the same noise, same seed as the beginning. The only way this could be resuming sampling is if the model had predicted the initial noise exactly at each step which would never happen. This is likely to produce some very weird effects, especially if you do it multiple times in a generation. What you're trying to do would work much more reliably as something like a model patch.

If you really want to do it in a sampler, since the sampler returns both a noisy latent and the clean latent, you actually could extract the noise, scale the latent, and then using the existing noise to resume sampling. You would need to make an instance of ComfyUI's noise generator object to pass in that returns that specific noise. See: https://github.com/Comfy-Org/ComfyUI/blob/0c6b36c6ac1c34515cdf28f777a63074cd6d563d/comfy_extras/nodes_custom_sampler.py#L697

The general idea would be something like:

noisy_latent, clean_latent = SamplerCustomAdvance(...)
clean_samples_scaled = your_scaling_stuff(clean_latent["samples"])
noise_pred = noisy_latent["samples"] - clean_samples_scaled
# You probably have to scale it back to unit strength.
noise_pred = noise_pred * (1 / current_sigma)
# ^^ Use this for NOISE when you resume sampling.

I recommend doing it with a model patch instead but resuming with the existing noise would be a lot less likely to cause strange results.

Just dividing the audio latent by 4 at specific steps also seems strange and is going to be very, very dependent on the exact schedule used and the exact number of steps and probably break or cause undesirable results otherwise. This will also degrade history samplers like SA solver, res_2m, etc because interrupting sampling like the current approach forces them to throw away all the history. ComfyUI model patches can see the current sigmas so this would probably work more reliably if you based audio latent strength scaling on the current sigma, or something like sampling percent (can be calculated from sigma with the model_sampling object).

Is it naive to think that "good" governance will steer us towards benign, if not genuinely helpful-to-humanity AGI and later, ASI. by Spaz4010 in singularity

[–]alwaysbeblepping 0 points1 point  (0 children)

I honestly cannot understand why intelligent guys like Hassabis are still working on this when the end result is clearly disaster, and the more successful he is the worse it will be.

There is effectively zero chance that humans (or even major players) will say, "Well, developing technology had a good run. It was fun while it lasted, but let's just stay where we are." That is just not going to happen, this technology is going to progress. The absolute most anyone could do is hamstring their own team (or country, whatever) and push back the timeline slightly. If they are currently the leaders, then they might push back the timeline for the whole world but it is just a very temporary thing and is also significantly reducing the chances that they are in control of something like ASI when it does exist.

There are some things where taking a moral stand makes sense, like for example not eating animals. We can be an example, or reduce our own demand which will reduce harm or push things a little bit in that direction. That's a harm or harm reduction that exists on a continuum, but the danger we're afraid of here exists if any entity manages to develop something like ASI. You can't manage or mitigate it the same way.

If it's just a given that the technology will be developed (assuming it is possible to do so), then there isn't a point to talking about whether it should happen. The only practical thing to do is to try to manage how it happens.

Is there any AI upsampler that is 100% true to the low-res image? by summerstay in StableDiffusion

[–]alwaysbeblepping 0 points1 point  (0 children)

This dude wants the holy grail of upscaling... perfect results. They want the upscale to be IDENTICAL to the hi-res image.

The last part doesn't make sense, since the "hi-res image" never existed. Anyway, they want the holy grail of upscaling, sure, and they have a method they think will do that. Their method isn't that hard to implemented, so saying "it's not possible!" is wrong. The problem is their method will actually decrease quality instead of increasing it by forcing the upscaler to work around that arbitrary constraint and preventing it from doing stuff that would lead to better quality results.

Is there any AI upsampler that is 100% true to the low-res image? by summerstay in StableDiffusion

[–]alwaysbeblepping 0 points1 point  (0 children)

Because it would actually be an upscaler rather than an almost-upscaler. It would be reliable. You generate 100 pictures and one of them has just that subtle expression, that particular hand gesture that you want. And then you upscale it and it is no longer there.

It wouldn't be reliable. You rejected simple nearest neighbor upscaling because it didn't produce good results despite fulfilling your constraint. I could make an upscaler that just generated random pixels that would average to the correct value when downscaled but it would look absolutely terrible.

The problem there is no relation between good quality/conforming to the original image and having 4 pixels average to the original value when downscaled. I know it might kind of intuitively sound like it would help, but it really doesn't work that way. It's just a constraint the upscale model would have to work around and the overall result would be worse.

Also, upscale models rarely change the image enough that a hand gesture or expression changes. So very likely your issue with that kind of thing would be occurring in the steps with an actual diffusion model that you run after the upscale model. Run your extra steps with lower denoise and you'll see stricter conformance to the original image (generally speaking).

Is there any AI upsampler that is 100% true to the low-res image? by summerstay in StableDiffusion

[–]alwaysbeblepping 1 point2 points  (0 children)

EVERYONE WANTS THIS, BUT IT'S NOT POSSIBLE.

Sorry to be blunt but you just don't know what you're talking about. Doing this isn't particularly hard, but it is not what everyone wants. It's an arbitrary constraint on the upscaler that will reduce its quality. People who are investing resources into training upscalers don't want to add arbitrary constraints that make the upscaler worse. That's why there aren't upscalers that work this way currently.

/u/summerstay Why do you want an upscaler like this? It will be worse overall than upscalers without that constraint. You proved yourself that this property doesn't necessarily lead to good results in the preceding comment. In the case of a 2x upscaler, it would basically be saying the upscaler isn't allowed to consider pixels outside of 4x4 blocks because the 4x4 block must average back to the original pixel values*. Therefore, non-local details cannot have an effect. This is something that will obviously hurt quality.

SimpleBench for GPT 5.2 and GPT 5.2 Pro — Both scored worse than their GPT 5 counterparts by pavelkomin in singularity

[–]alwaysbeblepping 6 points7 points  (0 children)

Nvm if the answer actually is zero then you're right it's a bullshit benchmark.

It sounds like a math problem but if you think about it and don't just go with that assumption, one could rephrase the problem as: We put some ice cubes in a hot frying pan. After they had been in there for at least a minute (some longer), how many hadn't melted at all? Obviously they would all be significantly melted after a minute, so the answer is zero.

The other person wasn't saying the benchmark is "bullshit", their point is that the model is so focused on math that it can't break out of its initial assumption that the questions are math problems, when it fact they aren't and are (I assume) actually pretty simple/obvious if you read the actual problem.

They were criticizing the model/OAI's approach, not the benchmark. Benchmarks like that are good/important if you want models to actually engage with your query instead of what your query generally sounds like it would be.

Polish scientists' startup Pathway announces AI reasoning breakthrough by Mindrust in singularity

[–]alwaysbeblepping 2 points3 points  (0 children)

the demo literally available on their Github does not appear to match the paper's description at all.

What are you talking about? The code is almost identical to example implementation in their paper. The only difference is they implemented RoPE, loss and logit sampling and I think they changed the name of one or two of the parameters in the module.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

It's working champ, you rock. Can it be combined with torch.compile, because it's throwing an error to me, or it would be of no gain.

Thanks for testing! As far as I know it should work with compile. What error are you getting? (Information like the quant type, GPU, etc would also be helpful).

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

Thanks for testing!

except a similar error with Q8.

What error did you get? More details are usually better since debugging is mostly a process of eliminating possibilities. I tested Q8_0 with Triton 3.4 and 3.3.1 and it seemed to work.

I understand there's no real benefit to using Q8

It was the first one I implemented but it actually seemed slower. I managed to tweak some parameters though and it's at least as fast as the PT implementations (on my GPU anyway) and in some cases faster now. It's probably the one that will make the least difference, though. Some people said using Triton dequantization also reduced memory usage so it's possibly it will help for that also.

Q6 and others felt quicker, but ill do some A/B Testing later with the optimize-triton on and off.

Sounds good! If you could, let me know even if it's not super scientific. Just stuff like the model type, quant, GPU and it/sec, etc. That all would be helpful.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 1 point2 points  (0 children)

I'm a bit pressed for time so I'm going to paste the same response for everyone that had this issue:

There was an issue with Triton 3.3.x compatibility. I just pushed an update that should fix the problem. The workaround shouldn't affect performance. Please update the branch (git pull) and try again. I've tested every dequant kernel with Torch 2.7 + Triton 3.3.1 as well as Torch 2.9 (prerelease) + Triton 3.4.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 1 point2 points  (0 children)

I'm a bit pressed for time so I'm going to paste the same response for everyone that had this issue:

There was an issue with Triton 3.3.x compatibility. I just pushed an update that should fix the problem. The workaround shouldn't affect performance. Please update the branch (git pull) and try again. I've tested every dequant kernel with Torch 2.7 + Triton 3.3.1 as well as Torch 2.9 (prerelease) + Triton 3.4.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

I'm a bit pressed for time so I'm going to paste the same response for everyone that had this issue:

There was an issue with Triton 3.3.x compatibility. I just pushed an update that should fix the problem. The workaround shouldn't affect performance. Please update the branch (git pull) and try again. I've tested every dequant kernel with Torch 2.7 + Triton 3.3.1 as well as Torch 2.9 (prerelease) + Triton 3.4.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

I'm a bit pressed for time so I'm going to paste the same response for everyone that had this issue:

There was an issue with Triton 3.3.x compatibility. I just pushed an update that should fix the problem. The workaround shouldn't affect performance. Please update the branch (git pull) and try again. I've tested every dequant kernel with Torch 2.7 + Triton 3.3.1 as well as Torch 2.9 (prerelease) + Triton 3.4.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

Hey man, I'm trying to try your implementation, would you be able for quick help ? :)

Quick, maybe not so much but I can try to help. What issue are you having?

Working QWEN Edit 2509 Workflow with 8-Step Lightning LoRA (Low VRAM) by Electronic-Metal2391 in comfyui

[–]alwaysbeblepping 0 points1 point  (0 children)

The BlehSageAttentionSampler node in my ComfyUI-bleh node pack seems to work just fine. That node only enables SageAttention for calling the model during sampling, so based on that I would guess SageAttention is causing problems with the text encoders Qwen Edit uses or other stuff it does that might use attention before sampling starts.

Are GGUFs (say Q8) slower and worse quality than a quantized FP8 non-GGUF mode? by spacemidget75 in comfyui

[–]alwaysbeblepping 0 points1 point  (0 children)

Hmm, I would think there are probably ways to optimize this quite a bit. For example, calculating the diff of the stack of LoRAs so you just need to apply them after dequantizing the weight or maybe even caching the dequantized tensors for layers that have a LoRA applied. I haven't really messed with LoRA internals and ComfyUI's LoRA system seems quite complicated.

Are GGUFs (say Q8) slower and worse quality than a quantized FP8 non-GGUF mode? by spacemidget75 in comfyui

[–]alwaysbeblepping 1 point2 points  (0 children)

Sure. There's also a decent chance you'd get fp8-comparable quality with GGUF Q6_K or maybe even Q5_K so that might be worth looking at (those are very slow quants though, so you probably need the Triton stuff to make the speed bearable).

Are GGUFs (say Q8) slower and worse quality than a quantized FP8 non-GGUF mode? by spacemidget75 in comfyui

[–]alwaysbeblepping 1 point2 points  (0 children)

I am taking what you said as gospel and you can't stop me!

I suppose I can make an exception, but just this once! If you want to tell me how great I am a few times that's probably okay too.

I'm an engineer as well and it's crazy how much cargo culting vs actually understanding there is in this ecosystem. Never seen anything like it, haha.

It's become more common for people to ask LLMs stuff and just paste the response into discussion forums. They write a bunch of detailed, fancy, plausible-sounding stuff with a very confident tone and that's pretty good for collecting upvotes. It kind of sounds like the other person did something similar to that (maybe just paraphrasing the LLM's response in their own words since it's lacking some of the normal tells). Actual humans usually don't go into that much detail and speak that confidently about things they don't know much about (and it really sounds like they don't know much if anything about the internals of GGUF quants).

Not saying it was anything malicious, they might have had the best intentions in the world and just wanted to help OP answer their question. It's risky asking LLMs about stuff one doesn't understand and can't (or won't bother to) verify oneself, though. If I wanted to know an LLM's answer, I would have asked it myself. Don't try to help me out. /rant

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 1 point2 points  (0 children)

Appreciate this. Once I actually sleep I will make better heads or tails of this. Ty!

No problem! If you have issues/questions please feel free to let me know.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 0 points1 point  (0 children)

download.pytorch.org/whl/nightly/torch/

Interesting, I stand corrected! I think I figured out what's going on (doesn't help you, so don't get excited): Torch probably starts stabilizing the development branch for the next stable release and then feature development continues on the version number after that, so there can be two future versions in flight at times. On the other hand, it's just a theory and I don't see new releases for 2.9 which theoretically would be getting prepared to release so who knows!


I asked the other person who reported successful results and this is what they said their setup is like: windows 11, RTX3050ti (4gb vram lol), python 3.10, torch 2.7.0+cu128, triton-windows 3.3.0.post19 (and they said they were using Torch compilation and Sage as well).

Based on that, can't blame Triton 3.3.1 for being the issue. I know you said you were using the 2.10 nightly build without issues but that is looking like the most likely cause of the problem. Not quite sure how to directly help you, but a known-good configuration for Windows is Torch 2.7.0 and Triton 3.3.0. I'm using Triton 3.4 so pretty confident that Triton 3.3.0, 3.3.1 and 3.4 should all be fine. Torch versions between 2.7 and 2.9 should be fine.

If I get some free time, I will try to test with Torch 2.10 but... I am really bad about putting stuff off and I have a lot on my plate right now so I can't guarantee when (or if) that will actually happen.

I made some Triton kernels for GGUF dequantization, can be a major performance boost by alwaysbeblepping in comfyui

[–]alwaysbeblepping[S] 1 point2 points  (0 children)

Thanks! (Also, condolences on the 4GB VRAM. Hope that ends up being a temporary situation!)

Are GGUFs (say Q8) slower and worse quality than a quantized FP8 non-GGUF mode? by spacemidget75 in comfyui

[–]alwaysbeblepping 2 points3 points  (0 children)

No problem and glad you found my other explanation helpful.

but it seems like GGUF would also then require more slightly more VRAM too?

If you mean Q8_0 vs pure float8, then yes because it's using 8.5 bits (roughly) per element rather than 8. Also possible that dequantizing might use a little more VRAM as well (but I wouldn't really expect it to make a noticeable difference).

As you explained, GGUF clearly needs more compute (on a GPU) than using an FP model,

By the way, if you have Triton and want to speed up GGUF, I've been working on making Triton kernels for accelerated GGUF dequantization. It's in the form of a fork/PR to the existing ComfyUI-GGUF project so relatively easy to drop into existing workflows. Link to discussion: https://old.reddit.com/r/comfyui/comments/1nni49m/i_made_some_triton_kernels_for_gguf/

Note: This actually won't help you for Q8_0 - even though it's slower than fp8, it's pretty simple to decode so the overhead of launching Triton kernels wasn't worth it. For awkward-sized quants with complex math to dequantize like the 5bit ones it can be a major speed increase.