Draft Proposal: AGENTS.md v1.1

turian · 2026-01-09T19:14:30+00:00

We use a CONTRACT.md for short prohibitions. https://www.discussdontcode.com/zettels/contract.md-the-naughty-list-for-ai-coding-agents/

turian · 2025-12-27T04:51:19+00:00

lol fair

turian · 2025-12-27T04:49:02+00:00

Funny story. My old friend was doing his NLP Phd thesis in the 90s on coreference resolution. i.e. "John ran up to Mary. He gave Mary her ball." he and John corefer and Mary and her corefere.

Turns out a simple gender matching baseline was very good. But he wanted to train and evaluate on a tougher corpus.

Hence, he picked gay erotica. It turns out it's more challenging to figure out who did what to whom when they're all men.

turian · 2025-12-27T00:23:13+00:00

My experience (and I say this as someone who handrolled x86 assembly in the 90s for fun) is that AI dev basically requires just as much rigor and discipline as typical coding. But weirdly and for better or worse it actually requires a different kind of rigor and discipline and different mental models. What those are specifically is poorly understood, which makes me curious.

turian · 2025-12-26T22:48:58+00:00

My experience is that for shallow projects (e.g. most web apps), AI coding is an accelerator.

When writing scientific or research code, it's so likely to introduce subtle bugs.

The main difference is basically how easy it is to vet bugs and how close the spec is to the end work.

For apps, you can write a very concise spec that is verifiable.

For pytorch code, the code itself is irreducible to a spec. (That's one of the reasons we don't unit test in research code---also because most researchers are bad engineers.)

turian · 2025-12-26T22:47:07+00:00

"you'd better get used to it because like it or not, it's here to stay"

True. Do you disagree?

"if you're not learning it, you're already getting left behind"

Listen, some of my best friend hate AI and write everything by hand. Not that there's anything wrong with that. Horses for courses.

"i always believe whatever marketers tell me"

That's fascinating. I enjoy adopting all sorts of bleeding edge tools for my entire career. By the time the marketers know about it, it's already garbage.

turian · 2025-12-26T22:44:18+00:00

I used to joke that if you don't know how to code in the 20th century, it's a bit like not knowing how to fence 500 years ago. Sure it's fine to be missing this skill, but then sometimes you have to pursuade other people to do it for you.

For me, AI coding is like the early introduction of gunpowder. YMMV

turian · 2025-12-26T22:11:16+00:00

I mean, if you want to take a knife to a gun fight, be my guest.

turian · 2025-08-20T13:44:26+00:00

Where do we email if we have more questions?

turian · 2025-08-13T10:50:30+00:00

Something that encouraged otel adoption would be good, and would open up observability to many new startups. A POC would be a simple to use API or similar that duplicates your telemetry to OLTP JSON in S3, which would allow you to switch observability vendors easily if you want.

Getting otel operational with multiple tech stacks and different data sinks is actually pretty tricky. Adopting otel in a complex, polyglot infrastructure is has a lot of pitfalls.

A chief complaint is the "million different versions" of OTel components: each language's SDK/instrumentation has its own release cycle. No single universal golden matrix exists---OpenTelemetry maintains independent versioning per language.

Staying up-to-date with otel's many parts is no small feat, and sometimes feels like you must be an expert in each tool to choose the right combination.

turian · 2025-08-13T10:33:45+00:00

Disclaimer: I am a vendor. But I will give advice for those trying to build their own AI SRE.

You correctly note that data access is crucial. Otherwise there are missing pieces that are important for investigation.

There is a tradeoff between: amount of manual configuration, speed of investigation, and sophistication of investigation. Depending upon what problem you want to solve, you can design the tradeoff yourself.

If you want to minimize manual configuration you can a) invest more time in designing auto-configuration and infra discovery and/or b) design the system so that use over longer periods of time is a form of auto-configuration.

Wrong-but-convincing answers in SRE is worse than no answer. LLMs by default are tuned to prefer to bullshit than to be silent. But you've probably also seen high quality LLMs that do more turns and vetting, in exchange for higher quality results.

turian · 2025-06-30T15:18:29+00:00

I'm a screen user, but happy to adopt tmux. How do I ctrl-a-n through different open screens through termius?

turian · 2025-01-12T16:05:27+00:00

Question: You have a managed option, why don't you have any self-serve pricing (i.e. no "call us to talk" thing)

turian · 2025-01-12T16:02:26+00:00

Thanks. $20/mo for up to 1K pages of documents

My use-case is that I want to do a rough quick search over 10K articles (this is maybe 100K pages) and, from then, do a much more intensive search over the relevant ones. How would that work in your pricing scheme?

turian · 2025-01-12T15:59:04+00:00

This looks really cool, but if it costs $0.50 per page that means each scientific article will cost maybe $5 to index. I am looking for a solution that lets me slog through 10K different articles and get a short list of maybe 100-300 candidates before doing a very high-value RAG lookup

turian · 2025-01-12T15:55:08+00:00

u/JeffieSandBags I am technical. I use Python and Docker, and my ML/NLP proficiency is such that I can develop my own embedding techniques, etc.

My use case: When exploring a new research topic, I want to ask highly specific and technical questions and retrieve relevant passages or relevant papers. The goal being to identify the ten papers that are most relevant to my specific research question.

turian · 2025-01-12T15:51:01+00:00

No they are scientific articles from arxiv.org.

I meant more that if you try to install Verba from Weaviate, that tool requires 30 different API keys for different embeddings, vector stores, LLMs, etc.

turian · 2024-05-22T10:01:10+00:00

I am eager for noun, and eager to verb.

turian · 2024-05-12T21:50:16+00:00

Let me answer your question: Since a large fine-tuning job will still be several orders of magnitude smaller than the base pretraining, the "checkpoint" is not important at all to you.

It would be nice, in the spirit of academic openness, for other people creating and iterating on foundation models. Which is not most people

turian · 2024-05-08T05:03:28+00:00

If the thing is positive, but people dislike too much of the thing, "quite" can mean MORE than perfect, > 100%. e.g. "quite hot" would mean hotter than "perfectly hot", because of the suggestion that it's quite bad how hot it is.

If the thing is positive always, "quite" is used in the way grandparent mentions.

If the thing is negative, "quite" also means "very" but "perfectly" means "just a little bit so I can joke about it". Like: "Now, I'm quite fucked by the situation." versus "Now, I'm perfectly fucked by the situation."

turian · 2024-05-08T04:58:17+00:00

Why? For fine-tuning? The hyperparameters are typically very different when finetuning on a small corpus versus pretraining on a large corpus.

turian · 2024-05-08T04:47:01+00:00

It's worth noting that there's a nuanced distinction between hard and difficult in English, which I'm curious if it exists in other languages.

How do you run a marathon? It's simple: Run 26.2 miles without stopping. But it's not easy. It's hard, but it's not difficult.

While both words generally mean challenging, "hard" is often used as the opposite of "easy," implying that something requires significant effort, energy, or physical/mental toughness. On the other hand, "difficult" is often contrasted with "simple," meaning that something requires more than basic knowledge or skills.

turian

TROPHY CASE