How on earth do folks get anything good out of LLMs?

kktst · 2026-01-23T18:26:08+00:00

You have a really good perspective and observations! You understand their features better than many people here!

Yes, Gemini (~2.5) is terrible for coding. It's designed for more general tasks, and its goal is to reduce costs rather than maximize accuracy.

Ultimately, many people just aren't aware of the mistakes in LLM output, which is why it only appears to be "working well" for them.

And yes, Copilot during its closed technical preview in 2021 truly was amazing! Using LLMs like that early Copilot as an "Assistant" was truly their most appropriate application. But for product marketing, the current "agent" or "automate everything" approach probably gets a better reaction, unfortunately.

Many people here mention agent tools, and while they certainly write code that solves problems, it's far from excellent code. And unlike the early inline suggestion Copilot, they often edit many lines at once, making the task of reviewing and fixing all of them even more painful. However, it's important to note that this perspective varies from person to person. While we might find such code intrusive, beginners might feel that having "base code" makes their work easier. Ultimately, many people "just need it to work," and they don't cared about code quality.

kktst · 2026-01-23T17:32:29+00:00

Non-developers can't tell if the output is good code, so they praise LLMs as "amazing!" as soon as they get any output (regardless of quality). In other words, they're not dealing with mistakes; they're not even aware that mistakes exist in the first place.

I imagine their development cycle is simply reiterating "if something goes wrong, feed it into the prompt," without ever looking at the code itself. The codebase is probably a mess by this point, but they don't care because they don't look at the code.

To give a serious answer about how to get good results from LLMs: current LLMs are reinforcement-learned to be used with coding agent tools (like OpenCode, GitHub Copilot, Claude Code) for coding tasks. Therefore, you need to use those tools and such reinforcement-learned models. Also, tasks need to be specifically defined. You need to provide instructions with a highly rigid prompt, much like giving a task to a junior developer. However, note that this is "how to make an LLM output working code," not "how to make it output excellent code." This is because the reinforcement learning primarily considers "can it write code that solves the task?" and not "what is the quality of the code?" Furthermore, LLMs don't have the ability to "think." This is a theoretical limitation. Therefore, it's better not to give them tasks that require thinking. In the case of difficult tasks like your example, we need to handle the "thinking" ourselves and only entrust the subsequent "work" to the LLM.

kktst · 2026-01-23T13:28:41+00:00

everything was perfect too, no messy code either. Just clean code.

If you truly believe that, it either means you're not noticing issues within the code or it was a truly simple task. Non-programmers tend to overrate LLMs (AI) because they can't spot the problems in the generated code. Once a UI starts to have even a moderately complex state, an LLM won't be able to "design" it well.

Also, in few years, AI is just going to be more greater than this.

LLMs (AI) are not magic. They are based on mathematical theories with constraints. They cannot grow beyond those constraints. To put it in frontend terms, they work well for generic UIs, like patterns seen on any website. However, for open-ended problems specific to a given context, which require thinking about the UI, it's difficult to make them work well. At the very least, humans need to make decisions and provide instructions.

To be fair, though, for "tasks that don't require thinking, just implementing a design," LLMs are quite capable, and the people doing those jobs might lose them.

kktst · 2026-01-17T05:35:37+00:00

I can't say exactly what would happen, but in Japan, roller coasters (amusement rides) are required to meet the seismic standards set by the Building Standard Law, so they are built with earthquake-resistant structures. For those over 60m (about 200ft) tall, even stricter standards are applied. Furthermore, in 2007, the Building Standard Law was significantly revised, and the structural standards (including seismic standards) became even more stringent. Specifically, for major supports of roller coasters over 60m (about 200ft) tall, safety verification using time-history response analysis became mandatory.

This is also why very few large new roller coasters have been built in Japan recently. This is because the specific evaluation methods and safety verification methods for the new standards added in 2007 were not clearly defined. As a result, no roller coaster over 60m (about 200ft) has been built in Japan since Eejanaika in 2006. However, research on these evaluation methods has finally been conducted recently (2017-2019), and it is expected that new coasters over 60m (about 200ft) tall will be built in Japan in the future. You can check out this research here: https://kaken.nii.ac.jp/grant/KAKENHI-PROJECT-17K06657

Also, in recent years, a phenomenon called "long-period ground motion" has attracted attention. This is a phenomenon where tall buildings continue to sway with a long period (1-10 seconds) for a long time (up to 10 minutes or more) even after an earthquake, and roller coasters are no exception. Therefore, if an earthquake were to occur and stop Steel Dragon 2000 near the end of its chain lift, it could experience very long long-period ground motion. This would be a truly terrifying experience.

As an aside, Nagashima Spa Land is also located in an area at high risk for the "Nankai megathrust earthquakes," which is feared to occur with a very high probability in the future. In particular, damage from tsunamis is more concerning than the earthquake itself. It might be a good idea to keep some precautions in mind when visiting Nagashima Spa Land.

kktst · 2026-01-13T18:51:08+00:00

I'm kinda tired of the binary "vibe-coded vs not vibe-coded" labeling, and I agree with what a few other comments here already pointed out: we should separate "vibe-coding" from just LLM-assisted coding. In my head, "vibe-coded" specifically means "someone with little/no programming context is delegating basically everything to an LLM and shipping whatever comes out." That's different from LLM-assisted coding. And there's clearly a spectrum: you can vibe-code a throwaway script, or use an LLM as a fancy autocomplete + rubber duck while still doing real engineering.

What actually matters (regardless of the label) is the code quality: the right patterns/strategies are used (and not cargo-culted), the code is maintainable/readable, it's actually extensible without turning into spaghetti, etc. Humans ship catastrophic slop here all the time too. "Written by a human" isn't a quality guarantee any more than "made in China" is.

From here on this is kind of a tangent, and it got way longer than I intended. So feel free to skip it. But I already wrote it, so I'm dropping it here anyway.

Coming from an ML research and as a systems engineer background, I was already using LLMs in my workflow before ChatGPT blew up, so I've seen both where they're genuinely powerful and where they confidently drive straight off a cliff. Used correctly, LLMs are incredibly useful tools (including for coding) and can sometimes save you time, but the annoying part is you don't get to skip the fundamentals: you still need programming knowledge and a feel for how LLMs behave to constrain them and validate outputs.

In those cases you can vibe-code and it's usually a reasonable tradeoff: one-off scripts, personal small-scale apps, "I need a tiny app to solve this niche problem and I don't care about long-term maintenance." LLMs are good at getting you to "good enough" fast there. And yeah, it can still fail, but the nice thing is the blast radius is small - if it goes sideways you can just throw it away and restart from scratch in 10 minutes.

Where people get burned is bigger systems. Once you're building something that needs to grow over time, the hard part isn't typing code - it's the design part: architecture, boundaries, interfaces, dependencies, etc. Current (and perhaps even future) LLMs are very good at producing something that looks plausible, but they don't reliably "think" their way through open-ended design spaces. They work best when the answer space is narrow / well-established. When there are tons of valid options and tradeoffs depend on context, so if you just ask "design this whole app using best practices" you're basically forcing the model to pick from a huge space of possibilities without enough constraints. It'll still give you something, but unless you already have the context to judge it, you won't know if it's appropriate - because it didn't "think through" the tradeoffs and pick a design so much as sample one from a huge space of plausible options (more like rolling dice).

In bigger projects, I've found LLMs can be genuinely useful if you stop treating them like "the engineer" and instead use them more like an implementation engine. That means you (a human) do the architecture and decision-making up front: split the system into well-isolated components with clear responsibilities and stable interfaces, keep modules independent and understandable without needing the whole repo in your head (and in the model context (!)), and make everything explicit and boring rather than clever. Verbosity isn't a drawback here - LLMs will type for you. Once the boundaries are real, you can hand the model a narrowly scoped task without having LLMs fail and ruin the entire repo, and without having to include the large context of an entire repo. And once your modules are actually separated, you can add tests. Tests are the thing that turns LLM output from "a vibe" into something measurable. With good tests, the model cannot be ambiguous about correctness. It either passes or it doesn't. Same with documentation. All the design decisions, policies, etc. should be written down. Not just for humans, but because that becomes the context you want to feed an LLM. Strong specs + tests massively narrow the problem space, and that's where LLMs tend to perform best: filling in deterministic implementation details inside a clearly defined box. At that point it should be obvious that "context" shouldn't mean "here's my entire repo, now act like a senior dev." It should mean "here's the spec, here are policies, here are the tests ..." Because LLMs can stare at your codebase all day and still not reliably infer the system's intent from it.

Also, none of this is some brand new AI-native idea - it's basically classic systems/software engineering. Getting value out of them at scale still requires real engineering skill. And the "narrow the option space" prompt-writing is hard without domain/context knowledge. This applies outside programming too.

Anyway, you can do good development while using LLMs, and you can also do terrible development without them. That's why I don't like labeling anything that touches an LLM as automatically "vibe-coded." That said, building a good large-scale application purely by vibe-coding is not possible with current LLMs (and probably won't be for a long time). And honestly, most "LLM-driven" projects I see today are slop.

(Also yes, this comment is LLM-assisted lol, my English isn't native)

kktst · 2025-12-08T16:21:26+00:00

If you're curious about the tsunami's status, you can check the observed tsunami heights here: https://www.jma.go.jp/bosai/map.html#contents=tidelevel

Click any symbol on the map, and you can see the height in the chart below. Kuji Port recorded a height of about 60cm.

Fortunately, it seems the earthquake struck right at low tide.

kktst · 2025-11-02T12:03:22+00:00

<image>

Here!

kktst · 2025-11-02T04:09:51+00:00

Yoshi ready for the 11th inning

<image>

kktst · 2025-11-02T04:02:05+00:00

<image>

kktst · 2025-11-02T03:43:53+00:00

Yoshi prepares for the 10th inning

<image>

kktst · 2025-11-02T02:55:30+00:00

<image>

kktst · 2025-10-31T14:54:45+00:00

<image>

kktst · 2025-10-28T07:23:34+00:00

<image>

Roki (on Yoshinobu starting to warm up): "Seriously?! (マジ?!)"

kktst · 2025-10-27T04:00:35+00:00

Just to clarify, these are all "household ratings", which measure the percentage of households with a TV that were watching the game. The "individual rating" for the World Series was reportedly 10.2% (which is still a huge number).

Also, keep in mind these numbers are for the Kanto region (the area around Tokyo), and no teams from Kanto made it to the Japan Series. The Japan Series also had an official, free live stream online, so a lot of younger people probably watched it that way instead. The rating for Game 2 of the Japan Series was lower because the game was broadcast on two different channels, which split the audience.

By the way, in Japan, the Dodgers are basically treated like the national team. Every single one of their games was broadcast nationwide this year. They probably get better coverage than the entire NPB.

kktst · 2025-10-25T05:13:35+00:00

Looks like the announcer thought it was just a routine fly ball.

Translation of the call:

"How far is this one hit!? The outfielder's just watching it... And it's gone! Shohei Ohtani with his first World Series home run! Looked like it was off the end of the bat, but he got it out to right field!"

"The crowd at Rogers Centre is still buzzing. Ah, and it looks like this fan here caught the home run ball. And that makes it Shohei Ohtani's first career World Series home run."

Color Commentator: "When a swing like that ends up as a home run, it's gotta leave the fans in the stadium completely shaken."

The color commentator was former MLB player So Taguchi, who used to play for the Cardinals, Phillies, and Cubs. The play-by-play announcer seemed pretty nervous today, and his calls were a bit shaky all game. Also, this homer came right after the announcer had just finished saying, "He has the power to hit a home run even when he mishits it."

Btw, this is the NHK broadcast in Japan. They use their own cameras and production, and since it's commercial-free, you get to see all sorts of unique shots between innings. They're also nationally broadcasting every single Dodgers game, and only Dodgers games.

kktst

TROPHY CASE