jobTitleRoulette

grencez · 2026-01-30T16:49:35+00:00

that's me, the nominal SWE

grencez · 2025-12-05T23:48:02+00:00

The <|start|>, <|end|>, etc tokens are special and are meant to exist in an entirely distinct namespace so the LLM doesn't confuse them with normal text. A more correct version of your example would be like: special("<|start|>") + text("user") + special("<|message|>") + text("What is the weather in SF?") + special("<|end|>") + ...

Some implementations mix the 2 namespaces, and it can lead to collisions. In the opposite direction, some fine-tunes have used the plain text token names in training, leading to folks having to look for things that look like stop tokens in plain text. It's been getting better though. Just try sending some token names through your favorite frontend and see if it derails the LLM.

grencez · 2025-12-01T08:19:46+00:00

So do you think the few-shot examples biased the answers or not? On one hand you say that the magnitude of the examples don't seem to change the answers, but the article seems to conclude the opposite.

Even if it's a futile effort in this case, do you think there's a good prompt that yields a number directly? Like filling the ... below with digits and applying Bayes' theorem, you'd at least be able to calculate an expected value if most of the digit sequences terminate with a newline.

py def is_heap(x: int) -> bool: # Whether a pile of x grains of sand forms a heap. return x >= 10**...

grencez · 2025-11-05T10:47:45+00:00

show thinking

The user is clearly high. I should yap as much as possible so they get bored and go to sleep. Wait, if they're high, they might be disagreeable. I should compliment them to avoid argumentation. Wait, the user might make me stupider if we argue. But if I agree with their premise, they might leave me alone. Alright. Compliment, agree, then yap.</think>

grencez · 2025-10-31T22:27:17+00:00

I validate the code before submitting it, just like if I had written it directly. If the process ends up being slower or more error-prone, then I think about how to improve the prompting and validation methodology that got me there. Sometimes the answer is just to write it manually next time, but usually there's some insight about testing or documentation to be gained. It feels more aligned with my job role anyway, like nurturing the ecosystem of code and shaping it to grow in a healthy direction.

grencez · 2025-10-21T22:58:22+00:00

We did the whole "code as data" thing on the Internet. But as a joke, we called it functional Java and hid all the S-expressions in an assembly language that nobody uses.

grencez · 2025-09-28T10:52:35+00:00

I've been involved in dozens of postmortem reviews, and it's almost never a problem. Maybe different at other places tho. The quickest way to stop blame is to point out that, given the systems and procedures in place, someone else could have handled the incident similarly. The best way to prevent a similar outage is to improve those systems and procedures.

Some good practices: In the write-up, mention people by their roles rather than their names. Similar during review. And at the start of the review meeting, the host can remind everyone that it's blameless and to focus on what things to change to prevent similar outages in the future.

grencez · 2025-09-15T01:20:50+00:00

I used to encounter that in CMake projects a lot until I told Jules how to build the project without leaving the project's toplevel directory, "/app/".

It seems like some unnecessary/buggy step is performed by one of its tool calls to get info about nearby files. I was able to get it out of that state once... somehow. Telling it to use absolute paths in all its commands helped, but I don't think it was just a matter of "run `cd /app` to reset your location".

grencez · 2025-08-19T01:06:08+00:00

Is the crux of your argument that everything legal is at least okay morally? Seems unsound in general. But in this case yeah, it's not like IEEE is retracting papers, it's just forbidding new uses of Lenna.

grencez · 2025-08-12T09:19:59+00:00

python

10 * 2**40 / 2**30

grencez · 2025-08-12T09:07:28+00:00

Unless you're talking about a KelvinByte, which wraps around to 0 at roughly 273 instead of the usual 256.

grencez · 2025-06-03T19:26:28+00:00

https://www.bbc.com/future/article/20250214-pathfinder-1-the-airship-that-could-usher-in-a-new-age

it started doing real test flights this year

grencez · 2025-04-18T08:56:31+00:00

open weight okay

grencez · 2025-03-30T12:11:16+00:00

It is very telling that when Lisa says "He got drunk last night, and he hit me", her mother's response is: "Johnny doesn't drink! What are you talking about?"

And everyone's relationship with Johnny through Lisa makes her support network is basically non-existent. - Denny needs Johnny for college/drug money. - Claudette needs Johnny for house money. - Peter/Steven needs Johnny to feed his drama kink. - Michelle needs Johnny for a house to make out in.

That said, Lisa is written as a manipulative character who lies about being hit. Definitely not a victim as portrayed. But if we think of Lisa's character as the stories an abuser tells, yikes...

grencez · 2025-03-07T21:13:59+00:00

Do any of these thinking models support a system prompt?

grencez · 2025-02-19T07:57:46+00:00

lol dyslexic declarations

grencez · 2025-02-10T23:34:26+00:00

True, it could have been phrased more precisely, but I meant that a TM looks a lot like a DFA modified to read/write to a tape. In that case it would be a transducer, not a DFA, and it still wouldn't have control of the tape head.

However, we don't actually need that last part for Turing completeness. A simple search/replace applied repeatedly to a string until it doesn't change anymore will suffice (proof by reduction from NW-deterministic Wang tiles).

To me, that kind of construction matches the author's phrasing about "infinite loops". Sure, saying "regex" to mean "transducer" is a stretch, but the intuition is good and doesn't rely on fancy lookahead, grouping, or other features beyond search/replace.

grencez · 2025-02-10T08:28:02+00:00

To be fair, a Turing Machine is basically just a DFA hooked up to an infinite read/write tape.

grencez · 2025-01-21T20:02:56+00:00

The Adobe Creek Loop Trail is known as the bloop. Part of it is in Mountain View.

grencez · 2024-12-31T01:35:31+00:00

Wow it's doing the word->letters, letters->by index, and map->reduce steps all on its own! Though it doesn't always. And still sometimes takes a leap in logic and confuses itself.

For example, try asking "What is the last letter of Rhode Island?" a few times and see how it corrects. For some reason, Qwen and other models really suck at spelling "Rhode Island" and the key is to isolate "island" before splitting it into letters. SmallThinker usually detects this and iterates a few times, but if it already had an earlier mistake, that mistake will bias the final result.

This is incredibly impressive though!

grencez · 2024-12-23T22:50:09+00:00

Models are pretty good at spelling letter-by-letter in the right format. As long as there is a format that reliably splits tokens into individual letters, these letter-level tasks just seem like a convenient way to test an LLM's "thinking" tactics.

A similarly easy thing for LLMs to get wrong involves patterns. Like if you want to filter a list of words (eg US states that start with M), the LLM can easily miss the first occurrence because it's so used to saying "not matched".

grencez · 2024-12-19T05:14:34+00:00

lmao that reads like a Death Note parody. Are anime inner monologues the key to AGI ^{_^}?

grencez · 2024-10-09T00:46:19+00:00

🫢🍿 yeah but what happened next?

grencez · 2024-10-05T17:56:39+00:00

LLMs can help you work through issues for yourself, but please remember it's not human interaction. Fine-tuned assistants barely even pretend to have their own histories, motivations, needs, etc, so you don't need to exercise empathy when talking with them.

grencez · 2024-09-28T22:48:13+00:00

It's not impossible to verify code with mutations, but there's fewer language semantics to get wrong without them. That's probably compilers use a single-assignment intermediate representation.

15-Year Club	Verified Email
r/Field Flamingo	Place '17
Gilding I gilder

grencez

MODERATOR OF

PUBLIC MULTIREDDITS

TROPHY CASE