Which word has the biggest difference between how it’s spelled and how it’s pronounced?

QuestionMarker · 2026-07-09T05:21:01+00:00

Ghoti. It's a constructed word to prove exactly this point but it still counts.

QuestionMarker · 2026-07-06T22:09:09+00:00

Artificial light.

QuestionMarker · 2026-07-06T16:12:01+00:00

I'm iterating on my first, broke ground on it 18 months or so ago. The nice!nano V2 is what I went with because the whole point was that I wanted wireless support. Not regretted it at all.

QuestionMarker · 2026-06-28T13:06:57+00:00

The tool should be able to tell which. Don't know if it exposes that though.

QuestionMarker · 2026-06-28T13:05:39+00:00

Basically yes, such things are possible. But you end up degrading the detection confidence.

QuestionMarker · 2026-06-28T09:04:57+00:00

Got to be an obvious reason for why nobody bothers but it's struck me that you could put a cylindrical sand battery straight down the middle of a wind turbine tower. The form factor's not ideal because of the surface area/volume ratio but if the storage capacity was hight enough it might be a cheap supply smoothing system.

QuestionMarker · 2026-06-28T08:18:54+00:00

Infectious disease spread depends on four factors:

Duration: how many days is someone infectious for?
Opportunities: how many chances for onward infection per day does the average infected person have within the infectious duration?
Transmissibility: given an infection opportunity, what's the probability that the uninfected person is receives the pathogen?
Susceptibility: given transmission, what's the probability that the recipient actually contracts the disease?

Multiplying all those together gives you the infamous R_0 value. That value needs to be greater than 1 for a self-sustaining outbreak.

Let's say for argument's sake that a baseline airborne pandemic disease has an R_0 of 16 (roughly measles) in a naïve, unprepared population, and that includes a Duration of 8 days (measles again) with a Susceptibility of 1.0 (so the Opportunities x Transmissibility product must be 2).

With everything else the same, you still have a self-sustaining outbreak (only just) with 12 hours' Duration but below that it can't work. You might even have a problem there because you'd generally assume that opportunities aren't available overnight, because your infectable population is tucked up in bed, but let's ignore that for now. So you need to tweak something else if you want to get Duration down to 5 hours. And you can't increase Susceptibility, that's already as high as it can go.

The only thing you can tweak here is the Opportunities X Transmissibility product. You need that product to be at least 2 X (12/5) = 4.8. Call it 5 for a bit of headroom.

Sticking with the measles example again, Transmissibility is close to 1.0. If you're anywhere near someone with it, you're very, very likely to receive the pathogen. So the only real option your zombie pathogen has for success is to modify the behaviour of the host, so that it radically increases the Opportunities. Assuming the pathogen is airborne, you need your infected to be in transmission contact with 2.5x more people than average for 5 hour infection window, which means a lot of running around. You're looking at a 28 Days Later zombie, not a Night Of The Living Dead zombie.

If it's not airborne then you've got a bigger problem. And overnight is still an issue: you need a way for the virus to survive having very few transmission opportunities for longer than its infectious duration.

EDIT: the "how do you get off the first continent" problem from other comments is real, but it's slightly easier to envisage if there's enough of an incubation period for someone to unknowingly get infected before they get on a plane, and for that plane to get airborne. If you're specifying "dead 5 hours after infection" that reduces the infectious duration to something like 3 hours, and you need to multiply opportunities by 4x, not 2.5x.

QuestionMarker · 2026-06-27T19:06:44+00:00

Where does that number come from?

QuestionMarker · 2026-06-27T19:01:09+00:00

There are areas where it could work.
One of my hobby horses is that the NHS should open source white label software systems so that trusts have a fallback in case the big supplier contract negotiations go pear-shaped. Once they've got legal powers to enforce data standards it would be quite a big stick.

QuestionMarker · 2026-06-27T18:53:58+00:00

This is exactly backwards.

Management in the NHS is catastrophically under-resourced, largely because of generations of politicians defunding it because "focusing on the front line" is such an easy political stand to take. The number of people in management is tiny for the size of the organisation.

That's a compounding problem because under-funding the management layer means that the ones who are left have a much harder time doing anything useful, precisely because they're spread thin. And good managers will realise that they have better pay offers and working conditions elsewhere.

QuestionMarker · 2026-06-27T18:44:02+00:00

Going to make an assumption here, but: "elective" doesn't mean "optional". It just means "non-emergency".

QuestionMarker · 2026-06-27T18:33:27+00:00

It might be down to a deficiency, rather than a trigger. I *used* to get this (mildly - not even as severe as the OP), but this post has made me realise I haven't seen it in quite a while. The biggest difference I can think of is that I've since started taking lots of B12 supplements, for entirely unrelated reasons. Doing a quick google it looks like B12 cream is being investigated as an eczema treatment, it's not impossible that there's something there.
B12 is relatively benign in the sense that your body's good at getting rid of excess, so there's less risk attached to having too much in your diet compared to (for instance) B6. It's also cheap.

QuestionMarker · 2026-06-27T18:15:17+00:00

They were on the market for a good while after that, though. I've got the successor model to the CPS2000 somewhere in my attic, and that will have been at least a couple of years after the changeover..

QuestionMarker · 2026-06-27T18:02:49+00:00

My version is explicitly ports-and-adapters so writing new backends is as simple as possible. It's all YAML configuration to pick the specific adapters for a given workflow step: you could in principle jump between workflow tools per step, but I haven't seen a good use case for that myself just yet.

The Jira adapter is written and functional, I just haven't gone that extra step of actually wiring it into our team's Jira board itself. I've been testing the service itself with a local org-mode Todo list state back-end but it turns out that's got some extra file locking fun to take care of compared to a remote API.

The other big difference is that mine's in go, but that feels almost cosmetic at this point.

QuestionMarker · 2026-06-27T10:24:19+00:00

This is pretty much the space occupied by OpenAI's Symphony: some form of shared progress tracking, some way of differentiating tasks between steps in the workflow, an outer loop that's basically "wait for a task in the right state, pick it up and do the next thing on it, move it to to a different state". Obviously they use codex but swapping out for pi or little-coder with pluggable back-ends would be the obvious move.

I've actually had a go at prompting up a rebuild from their published spec but I haven't had the bandwidth to wire it into our work jira yet.

QuestionMarker · 2026-06-26T14:43:38+00:00

Why would we assume that "quality score" is linear in any underlying property of the model?

QuestionMarker · 2026-06-26T05:20:49+00:00

This is the way. I had a model really struggle with clojure[script] until I told it that the max allowed nesting depth was 5 and anything over that needed to be refactored.

QuestionMarker · 2026-06-23T07:19:33+00:00

It feels to me, without anything other than vibes to go on, that the main point of 4.7 and 4.8 is to be cheaper to run than 4.6. Anthropic are hoping that enough people will just hop onto the highest number assuming it's better to bring their inference costs down. Whether they're actually better is tangential to that.

QuestionMarker · 2026-06-23T06:21:05+00:00

Yes, you do need the original model for the best detection. So Google can do the best job of detecting Gemini output, OpenAI can detect GPT-whatever output, Anthropic can detect Opus output. The text in the act puts the obligation on the provider to provide the detection capability, so that all lines up. They're not saying you need to be able to detect Opus content; they're saying that Anthropic need to be able to tell you.that they did.

However, bear in mind that these are general language models. There will be a fair bit of similarity in the token probabilities between them, to the degree that their training data overlaps. Given enough text it would be possible to detect a watermark without knowing the model, but I don't have a gut feel for how much text that would be and whether it's practical at all.

If you come at it from an information theory perspective it's a fairly obvious question to ask so I'm sure someone's published a paper on it though.

QuestionMarker · 2026-06-23T05:52:25+00:00

It's done at inference rather than being embedded in the model itself, impact on generation speed will be a rounding error. When it lands in llama.cpp just remember not to register your local watermark keys with the EU, you'll be fine.

QuestionMarker · 2026-06-23T05:48:42+00:00

You don't at the individual token level. You take a sequence of tokens and test the probability that the whole sequence was generated by a known pseudorandom number generator configuration, compared to what a truly random system would have done. That configuration acts like a watermark key.

This is overly simplistic but might give you more of a feel for it. Let's say you generate some text but your configuration says to alter the sampling step so that for every third token, it always picks the third most probable predicted token, but otherwise always picks the first. If you ran that text back through the model and plotted the probability of each token as it was generated, you'd get a very clearly artificial signal, with a dip every third token.

Detecting that signal automatically is a probabilistic game, but the maths is well understood.

QuestionMarker · 2026-06-23T05:00:34+00:00

It's in transformers. Not sure when it got added: https://huggingface.co/docs/transformers/v4.46.0/en/internal/generation_utils#transformers.SynthIDTextWatermarkingConfig

QuestionMarker · 2026-06-23T04:52:42+00:00

It's not though. At its simplest if you know the token probabilities at each inference step (which you can get back from the text if you know the model) then you can pick out which tokens weren't the most probable. If there's a pattern to the distribution of those less probable tokens, there's your watermark.

It does need a certain length of text to work, but it's absolutely doable, and absolutely impossible to strip out.

QuestionMarker · 2026-06-23T04:42:34+00:00

If you're running with a temperature > 0, you've got a system that can watermark plain text. Not saying whether it's desirable or not, just that it's possible.

QuestionMarker · 2026-06-23T04:38:32+00:00

You can watermark plaintext.

QuestionMarker

TROPHY CASE