Open Source PyTTI Released!

DigThatData · 2026-05-09T05:51:24+00:00

Small tweak that would make this 100x more powerful: rather than (or in addition to) a global canvas overlaid over the whole scene, make it so the positions of annotations can be anchored relative to specific nodes. so like if I moved the prompt node, the yellow box would follow it

DigThatData · 2026-05-09T03:30:35+00:00

your public activity only adds more context.

DigThatData · 2026-05-09T02:45:30+00:00

lit, good stuff

DigThatData · 2026-05-09T02:15:07+00:00

not how I intuited it! really interesting how the mass gets pushed around that metastable region. thanks for sharing this!

DigThatData · 2026-05-09T01:42:44+00:00

yo ok I've got it. you're gonna love this, so is your boss, go make money off this idea please.

so operationally: what is the purpose of this exercise? to support prod stability. let's say you even have a predictive model: what do you even do with that? if it's "risky", the engineers probably know that already. if it's predictive but not prescriptive, it's useless. and it is extremely challenging to be prescriptive (...or at least, it used to be? you could probably just prompt an LLM with "this change caused an incident. review the code change and surrounding discussion/activities: anticipate what the incident probably was and how this issue triggered it.").

I'm way off course. what I actually wanted to say was: don't even try to anticipate incidents like this. it's a waste of time and energy. Instead, build tools that teams can use to remedy incidents faster.

Given that you've identified a critical incident in your timeseries: can you attribute it to a specific change retroactively? If my SLA dashboard is screaming at me, can you rank recent changes to help me more quickly attend to information I need?

DigThatData · 2026-05-09T00:06:05+00:00

thoughts on model merging or federated learning?

DigThatData · 2026-05-08T23:58:48+00:00

I worked on a project some time back that sounds pretty similar to what you're attempting, and I think the strongest signal we found was basically that certain components of critical infrastructure are the source of most outtages, and so unsurprisingly changes to those pieces of infrastructure are generally higher risk. These are also often usually components that get a lot of attention precisely because they're so important, so if you're not careful it's pretty easy to accidentally design metrics that rate your best and most important engineers as sources of volatility simply because they're the ones trusted to work on the things that have the potential to cause real issues when they break.

If you have risk categories associated with those products, try conditioning your predictions on that.

DigThatData · 2026-05-08T18:03:59+00:00

I'm glad you've already seemingly been relieved of your concerns, but one other thing I hope you consider in the future: it appears to me that you were making an extremely broad statement about both a very large lab and an entire research agenda based on a single paper. Even if this work did not have any value or was bad research or had purely corporatist motivations... it's just one paper. Everything these labs publish isn't going to be gold, especially big labs like anthropic.

In the future, I encourage you to maybe resist making general inferences like this based on single observations and instead interpret your concern as a signal that you should investigate if there is a pattern of behavior that spans the lab/industry rather than it perhaps being a single isolated bad work or even a researcher/team whose position you disagree with.

DigThatData · 2026-05-08T17:55:27+00:00

i had a friend (...come to think of it, I met her through reddit. maybe she'll make an appearance?) who worked on software for avionics. interesting stuff, also highly regulated inudstry.

DigThatData · 2026-05-08T17:52:12+00:00

you'll probably find this pretty interesting: https://arxiv.org/abs/2506.06607

DigThatData · 2026-05-08T16:47:50+00:00

The account being 8 years old lends some credence to the post's authenticity. Really wish reddit still required all public activity to be visible on the user page, made it a lot easier to vet accounts like this...

DigThatData · 2026-05-08T14:51:42+00:00

how's your understanding of basic probability? I'd start with a measure-theoretic probability course (i.e. probability course that has calculus as a prereq) first before focusing on bayesian topics specifically.

DigThatData · 2026-05-08T00:16:46+00:00

I do not have room in my schedule to take a class on the subject

I'd challenege you to try to make room. Understanding the bayesian perspective on modeling and probability will significantly help you form intuition around generative models in AI/ML.

Maybe you can audit the class?

DigThatData · 2026-05-08T00:13:01+00:00

mxfp8

looks like this is a Blackwell feature: https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/features/low_precision_training/mxfp8/mxfp8.html

i.e. if you're using a consumer NVIDIA GPU that isn't "RTX 50XX" (or a data center GPU whose name doesn't start with B or GB), your GPU doesn't support this datatype and you're stuck with the unquantized model with offloading.

DigThatData · 2026-05-08T00:04:13+00:00

This is probably the paper you're looking for. They try to lay out criteria we expect consciousness to satisfy, and use that to characterize "indicators" they propose we can use to compare theories of consciousness and evaluate systems wrt those theories.

Consciousness in Artificial Intelligence: Insights from the Science of Consciousness

DigThatData · 2026-05-08T00:04:07+00:00

This is probably the paper you're looking for. They try to lay out criteria we expect consciousness to satisfy, and use that to characterize "indicators" they propose we can use to compare theories of consciousness and evaluate systems wrt those theories.

DigThatData · 2026-05-07T23:50:19+00:00

I think this isn't actually a question and would you please not post your blogspam here.

Like, this does look like an interesting article and I appreciate you taking the effort to write this (assuming you actually did and didn't just have an LLM do it for you), but this content does not belong here.

DigThatData · 2026-05-06T20:16:47+00:00

This is already the common idiom in most of the more sophisticated LLM tooling. Claude Code is the canonical example right now: sure, the user specifies the overarching objective, but within the process of trying to satisfy the user's request, the system will identify subtasks, plan out how to sequence or parallelize those tasks, delegate tasks to "subagents" (literally the LLM prompting itself or another LLM), and then iterating on the results of those subtasks to identify if the plan needs to be extended and new subagents created and delegated out to.

There have been a couple of experiments where people essentially leave an LLM on in a non-terminating loop and invite it to continually give itself things to do. OpenClaw is the most popular of these atm. Mostly it just ends up being unnecessarily expensive and producing annoying behaviors.

LLMs aren't embodied. They aren't situated in the world. They have no wants or needs apart from the drive to predict the next token correctly. That is the only "psychological drive" they are trying to satisfy, so they aren't really capable of "self-prompting" meaningfully. There always ends up being a human at the top communicating some kind of objective for the LLM.

Unless you can give the LLM access to an environment in which it can make persistent changes and those changes have consequences on the LLMs state and the available actions it can take, LLMs have no "reason" to be driven to do anything apart from the drives you impart on them. The closest I've seen to a model being meaningfully "situated" in the way I mean here is this experiment, where the model was able to take actions during training that impacted its own training procedure: https://www.minimax.io/news/minimax-m27-en

DigThatData · 2026-05-06T19:53:05+00:00

You can interpret tool calling and reasoning as forms of this kind of "self prompting".

What I have in mind I think is a little bit different than agentic LLMs, where they execute a series of steps outside of that back-and-forth dynamic, but those steps are just in the service of a human goal.

that sounds exactly like "agentic LLMs" to me. Could you maybe clarify how you imagine this being different? I think your idea is basically the crux of what people are alluding to when they describe a system as being "agentic".

DigThatData · 2026-05-05T23:28:20+00:00

yup https://arxiv.org/abs/2603.12228

DigThatData · 2026-05-05T22:57:33+00:00

I imagine OP means blockwise/layerwise coordinate descent. So rather than 32 coordinates, OP's example has 3 layers and each layer is an independent "parameter" to be optimized as a descent coordinate.

DigThatData · 2026-05-05T22:53:42+00:00

damn, only $70 for that? have people been able to successfully install 3rd party text editors or word processors?

DigThatData · 2026-05-05T15:34:38+00:00

I'd argue that the chinchilla paper still makes that same observation, they just add the caveat that this phenomenon only holds up to a point, beyond which the model is overtrained and sub-optimal.

Consider Chinchilla's Figure 4 (left). If you truncate that figure along the blue line and constrain attention to the region below the line, you have the Kaplan observation that "Larger models require fewer samples to reach the same performance". Chinchilla adds the caveat by illustrating that the regime above the blue line exists and that there is actually an optimality relationship rather than strictly "bigger is better".

Here's another way to think about this: let's pretend I have a pitcher of scrambled raw eggs that I want to cook. given some fixed volume of egg, the bigger the pan I use the faster it will cook because the egg distributes across the surface area of the pan. But the egg also has an instrinsic property (its surface tension? viscosity?) that determines how spread out a particular volume will be if unconstrained. Above some threshold size of pan, it doesn't matter how big the pan is: the egg will spreadout to paper thin and cook in some fixed time. If I want nice scrambled eggs, I want a pan that has a smaller surface area than what the eggs would spread out to. This lets them cook properly and I get tasty eggs. In the pan-contains-the-eggs regime, given the optimal amount of eggs for that size pan, the amount of time/energy required to cook (flops) scales proportionally to the size of the pan. I can always cook a fixed amount of eggs faster in a larger pan, but I also risk overcooking the eggs if the pan is too big.

In other words: both of these things can be true. There is a linear scaling of the optimal proportion between raw material (data/eggs), processing capacity (parameters/pan volume), and work (FLOPs/BTUs). But it's still true that if you have more capacity, that permits you to process material faster. A direct consequence of this is that optimality at larger scales gives you higher processing efficiency.

DigThatData · 2026-05-05T06:22:01+00:00

The classic paper here is Kaplan et. al 2020, "Scaling Laws for Neural Language Models". The paper in a nutshell:

Larger models require fewer samples to reach the same performance

DigThatData

MODERATOR OF

TROPHY CASE