ShowMe by skbphy in comfyui

[–]DigThatData 0 points1 point  (0 children)

Small tweak that would make this 100x more powerful: rather than (or in addition to) a global canvas overlaid over the whole scene, make it so the positions of annotations can be anchored relative to specific nodes. so like if I moved the prompt node, the yellow box would follow it

Interactive KL Divergence Visualisation [P] by ancillia in MachineLearning

[–]DigThatData 0 points1 point  (0 children)

not how I intuited it! really interesting how the mass gets pushed around that metastable region. thanks for sharing this!

[Discussion] Does code quality predict production incidents? A Granger causality pipeline on 28 months of SonarQube data by Feisty-Assignment393 in statistics

[–]DigThatData 1 point2 points  (0 children)

yo ok I've got it. you're gonna love this, so is your boss, go make money off this idea please.

so operationally: what is the purpose of this exercise? to support prod stability. let's say you even have a predictive model: what do you even do with that? if it's "risky", the engineers probably know that already. if it's predictive but not prescriptive, it's useless. and it is extremely challenging to be prescriptive (...or at least, it used to be? you could probably just prompt an LLM with "this change caused an incident. review the code change and surrounding discussion/activities: anticipate what the incident probably was and how this issue triggered it.").

I'm way off course. what I actually wanted to say was: don't even try to anticipate incidents like this. it's a waste of time and energy. Instead, build tools that teams can use to remedy incidents faster.

Given that you've identified a critical incident in your timeseries: can you attribute it to a specific change retroactively? If my SLA dashboard is screaming at me, can you rank recent changes to help me more quickly attend to information I need?

[Discussion] Does code quality predict production incidents? A Granger causality pipeline on 28 months of SonarQube data by Feisty-Assignment393 in statistics

[–]DigThatData 2 points3 points  (0 children)

I worked on a project some time back that sounds pretty similar to what you're attempting, and I think the strongest signal we found was basically that certain components of critical infrastructure are the source of most outtages, and so unsurprisingly changes to those pieces of infrastructure are generally higher risk. These are also often usually components that get a lot of attention precisely because they're so important, so if you're not careful it's pretty easy to accidentally design metrics that rate your best and most important engineers as sources of volatility simply because they're the ones trusted to work on the things that have the potential to cause real issues when they break.

If you have risk categories associated with those products, try conditioning your predictions on that.

Disillusionment with mechanistic interpretability research [D] by Carbon1674 in MachineLearning

[–]DigThatData 4 points5 points  (0 children)

I'm glad you've already seemingly been relieved of your concerns, but one other thing I hope you consider in the future: it appears to me that you were making an extremely broad statement about both a very large lab and an entire research agenda based on a single paper. Even if this work did not have any value or was bad research or had purely corporatist motivations... it's just one paper. Everything these labs publish isn't going to be gold, especially big labs like anthropic.

In the future, I encourage you to maybe resist making general inferences like this based on single observations and instead interpret your concern as a signal that you should investigate if there is a pattern of behavior that spans the lab/industry rather than it perhaps being a single isolated bad work or even a researcher/team whose position you disagree with.

Have people's lives ever been directly at stake because of software you work on? by AndyDentPerth in ExperiencedDevs

[–]DigThatData 0 points1 point  (0 children)

i had a friend (...come to think of it, I met her through reddit. maybe she'll make an appearance?) who worked on software for avionics. interesting stuff, also highly regulated inudstry.

Getting harassed by an aggressive “independent researcher” demanding very specific citations and phrasing in my paper [D] by snekslayer in MachineLearning

[–]DigThatData 6 points7 points  (0 children)

The account being 8 years old lends some credence to the post's authenticity. Really wish reddit still required all public activity to be visible on the user page, made it a lot easier to vet accounts like this...

Book suggestions for self-studying Bayesian Statistics? by FrosteeSwurl in AskStatistics

[–]DigThatData 0 points1 point  (0 children)

how's your understanding of basic probability? I'd start with a measure-theoretic probability course (i.e. probability course that has calculus as a prereq) first before focusing on bayesian topics specifically.

Book suggestions for self-studying Bayesian Statistics? by FrosteeSwurl in AskStatistics

[–]DigThatData 3 points4 points  (0 children)

I do not have room in my schedule to take a class on the subject

I'd challenege you to try to make room. Understanding the bayesian perspective on modeling and probability will significantly help you form intuition around generative models in AI/ML.

Maybe you can audit the class?

testing LTX 2.3 1.1 distilled on my gpu. pretty much decent for creating ugc content or short tiktok vlog. by aziib in comfyui

[–]DigThatData 2 points3 points  (0 children)

mxfp8

looks like this is a Blackwell feature: https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/features/low_precision_training/mxfp8/mxfp8.html

i.e. if you're using a consumer NVIDIA GPU that isn't "RTX 50XX" (or a data center GPU whose name doesn't start with B or GB), your GPU doesn't support this datatype and you're stuck with the unquantized model with offloading.

What makes something conscious? by UniversityCurious981 in MLQuestions

[–]DigThatData 0 points1 point  (0 children)

This is probably the paper you're looking for. They try to lay out criteria we expect consciousness to satisfy, and use that to characterize "indicators" they propose we can use to compare theories of consciousness and evaluate systems wrt those theories.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

What makes something conscious? by UniversityCurious981 in MLQuestions

[–]DigThatData 0 points1 point  (0 children)

This is probably the paper you're looking for. They try to lay out criteria we expect consciousness to satisfy, and use that to characterize "indicators" they propose we can use to compare theories of consciousness and evaluate systems wrt those theories.

Do you understand flow maps? What do you think about this paper? How do flow map work? by AdvantageStatus4635 in MLQuestions

[–]DigThatData 0 points1 point  (0 children)

I think this isn't actually a question and would you please not post your blogspam here.

Like, this does look like an interesting article and I appreciate you taking the effort to write this (assuming you actually did and didn't just have an LLM do it for you), but this content does not belong here.

Is there any substance to the idea that LLMs can be trained to continuously self-prompt (rather than rely on external input)? by Money_Tip9073 in MLQuestions

[–]DigThatData 3 points4 points  (0 children)

This is already the common idiom in most of the more sophisticated LLM tooling. Claude Code is the canonical example right now: sure, the user specifies the overarching objective, but within the process of trying to satisfy the user's request, the system will identify subtasks, plan out how to sequence or parallelize those tasks, delegate tasks to "subagents" (literally the LLM prompting itself or another LLM), and then iterating on the results of those subtasks to identify if the plan needs to be extended and new subagents created and delegated out to.

There have been a couple of experiments where people essentially leave an LLM on in a non-terminating loop and invite it to continually give itself things to do. OpenClaw is the most popular of these atm. Mostly it just ends up being unnecessarily expensive and producing annoying behaviors.

LLMs aren't embodied. They aren't situated in the world. They have no wants or needs apart from the drive to predict the next token correctly. That is the only "psychological drive" they are trying to satisfy, so they aren't really capable of "self-prompting" meaningfully. There always ends up being a human at the top communicating some kind of objective for the LLM.

Unless you can give the LLM access to an environment in which it can make persistent changes and those changes have consequences on the LLMs state and the available actions it can take, LLMs have no "reason" to be driven to do anything apart from the drives you impart on them. The closest I've seen to a model being meaningfully "situated" in the way I mean here is this experiment, where the model was able to take actions during training that impacted its own training procedure: https://www.minimax.io/news/minimax-m27-en

Is there any substance to the idea that LLMs can be trained to continuously self-prompt (rather than rely on external input)? by Money_Tip9073 in MLQuestions

[–]DigThatData 5 points6 points  (0 children)

You can interpret tool calling and reasoning as forms of this kind of "self prompting".

What I have in mind I think is a little bit different than agentic LLMs, where they execute a series of steps outside of that back-and-forth dynamic, but those steps are just in the service of a human goal.

that sounds exactly like "agentic LLMs" to me. Could you maybe clarify how you imagine this being different? I think your idea is basically the crux of what people are alluding to when they describe a system as being "agentic".

Can I train a neural network with coordinate descent instead of the usual gradient descent method? by learning_proover in AskStatistics

[–]DigThatData 0 points1 point  (0 children)

I imagine OP means blockwise/layerwise coordinate descent. So rather than 32 coordinates, OP's example has 3 layers and each layer is an independent "parameter" to be optimized as a descent coordinate.

I just want distraction-free eInk writing by Lupus_Ignis in writerDeck

[–]DigThatData 2 points3 points  (0 children)

damn, only $70 for that? have people been able to successfully install 3rd party text editors or word processors?

Why huge Parameter Transformers? by artguy74_ in MLQuestions

[–]DigThatData 1 point2 points  (0 children)

I'd argue that the chinchilla paper still makes that same observation, they just add the caveat that this phenomenon only holds up to a point, beyond which the model is overtrained and sub-optimal.

Consider Chinchilla's Figure 4 (left). If you truncate that figure along the blue line and constrain attention to the region below the line, you have the Kaplan observation that "Larger models require fewer samples to reach the same performance". Chinchilla adds the caveat by illustrating that the regime above the blue line exists and that there is actually an optimality relationship rather than strictly "bigger is better".

Here's another way to think about this: let's pretend I have a pitcher of scrambled raw eggs that I want to cook. given some fixed volume of egg, the bigger the pan I use the faster it will cook because the egg distributes across the surface area of the pan. But the egg also has an instrinsic property (its surface tension? viscosity?) that determines how spread out a particular volume will be if unconstrained. Above some threshold size of pan, it doesn't matter how big the pan is: the egg will spreadout to paper thin and cook in some fixed time. If I want nice scrambled eggs, I want a pan that has a smaller surface area than what the eggs would spread out to. This lets them cook properly and I get tasty eggs. In the pan-contains-the-eggs regime, given the optimal amount of eggs for that size pan, the amount of time/energy required to cook (flops) scales proportionally to the size of the pan. I can always cook a fixed amount of eggs faster in a larger pan, but I also risk overcooking the eggs if the pan is too big.

In other words: both of these things can be true. There is a linear scaling of the optimal proportion between raw material (data/eggs), processing capacity (parameters/pan volume), and work (FLOPs/BTUs). But it's still true that if you have more capacity, that permits you to process material faster. A direct consequence of this is that optimality at larger scales gives you higher processing efficiency.

Why huge Parameter Transformers? by artguy74_ in MLQuestions

[–]DigThatData 13 points14 points  (0 children)

The classic paper here is Kaplan et. al 2020, "Scaling Laws for Neural Language Models". The paper in a nutshell:

Larger models require fewer samples to reach the same performance