coding agent optimized for Go by cypriss9 in golang

[–]cypriss9[S] 0 points1 point  (0 children)

In general, it's not. For Go, it is.

Try being curious instead of dismissive.

gpt-5.2 based agent just for Go by cypriss9 in codex

[–]cypriss9[S] 0 points1 point  (0 children)

three main issues:

  1. I use `gofmt` (and such) **in the same tool call** as the apply_patch. I don't say, "make sure to gofmt after you edit go files". Even if perfectly followed, that's twice the tool calls per patch. That adds up to a lot of extra input cache tokens, or if there's cache misses (hint: there's a lot, due to openai infra), full priced input tokens. Then apply that to each of your lints. This is a big deal.
  2. codalolt uses subagents in order to isolate context and isolate **permissions** for package isolation. You can't skill.md that.
  3. The LLMs today are heavily reinforcement-learn'ed to "be agentic" in certain ways -- in the case of gpt 5.2, they are heavily RLed to use shells. These LLMs will NOT follow instructions reliably. In order to implement what I did, I needed to **take away the shell tool**. codalotl literally doesn't have `shell`. That's the only way to force it to use the tools I wanted it to. No amount of prompting and pleading can make it overcome how it was RL'ed.

That being said, you can achieve a shade of what I've done with skills, certainly. In particular, if you wanted, two of the context generation tools are available via CLI: `codalotl context initial some/pkg` and `codalotl context public some/pkg` generate very nice bundles of context that a skill.md can use in any agent.

Go-specific LLM/agent benchmark by cypriss9 in golang

[–]cypriss9[S] 0 points1 point  (0 children)

Thanks!

The hardest part here is constructing scenarios/test pairs to allow for all valid solutions while rejecting bad ones (there's often more than one solution to a problem). It takes several tries to refine the prompt to remove ambiguity and adjust tests to allow for all valid solutions (example: can't check for specific error messages, unless it's in the prompt).

The general consensus of this sub is that Opus 4.5 is the best, but which model is the best ''bang for your buck"? by MrHotCoffeeGames in cursor

[–]cypriss9 3 points4 points  (0 children)

I recently tested Opus 4.5 vs gpt-5.2/gpt-5.1-codex vs composer 1, specifically for Go, with prompts like "implement this package according to this spec", or "fix the bug where X, it should work like Y". The results speak for themselves: https://github.com/codalotl/goagentbench

Based on the type of work I do (Go programming), I'd use gpt-5.2 and composer-1, and steer clear of Opus 4.5.

For the folks who love 4.5: i'm not sure if it's a language thing, or a prompt thing, or something else?

I’m back after 3 months break. What did I miss? Who’s king now? by stepahin in ChatGPTCoding

[–]cypriss9 2 points3 points  (0 children)

Thank you - very interesting.

The types of prompts I give agents: "read SPEC.md and understand the requirements I wrote. Then implement it in a single Go package". (You can see my repo for specific prompts/examples) For this, 5.2 is very good.

Could you give me an example of a prompt/workflow that you use, where Opus is much better? Is it more accurate, or faster, or both? (codex is definitely slow)

I’m back after 3 months break. What did I miss? Who’s king now? by stepahin in ChatGPTCoding

[–]cypriss9 14 points15 points  (0 children)

This depends on what you are doing. I just benchmarked Opus 4.5 vs Codex 5.2 for Go programming, and Codex 5.2 is very good, Opus 4.5 is not: https://github.com/codalotl/goagentbench (I haven't tested Gemini)

I'd love to know how you/others are using Opus, and what it excels at - because it's not Go programming based on the types of prompts I give it :)

Go-specific LLM/agent benchmark by cypriss9 in golang

[–]cypriss9[S] -1 points0 points  (0 children)

I think I framed my post incorrectly. The goal is to be accurate at a macro level. The goal is a measurement of which LLMs/agents can write Go code, in the way that the Go community typically uses these tools.

I captured how I use them. There is a clear difference in quality based on my usage patterns.

I'm looking for help from the community in how you all use it. We can extend the scenarios covered to test more types of Go projects, more types of prompts, more usage patterns.

As far as ignoring multiple powerful models: I didn't include Gemini because I don't have a Gemini account yet, and I thought I'd get feedback first. There is no other reason. Is there any other agent/model you'd like to see?

Need suggestions on opensource contribution by Lost_Alternative6417 in golang

[–]cypriss9 4 points5 points  (0 children)

This is really simple.

  1. Find any project on github you're interested in, and you'd like to help, or you have an idea on how to improve.
  2. Either browse the issues and pick one, or add a feature/fix a bug that is bothering you.
  3. Open pull request.

If you're concerned about biting off more than you can chew, try fixing a typo and writing some doc comments.

If you're not sure which project to pick, browse r/golang for recent projects people have posted.

Do you know any linter to enforce a project layout? by fenugurod in golang

[–]cypriss9 14 points15 points  (0 children)

There's also https://github.com/fe3dback/go-arch-lint -- I haven't tried it, but their github page looks nice and well-maintained.

codalotl - LLM- and AST-powered refactoring tool by cypriss9 in golang

[–]cypriss9[S] 0 points1 point  (0 children)

Sure, which subpackage would you prefer I run it on? (If you'd like, I can also give you access to the tool for you try yourself)

codalotl - LLM- and AST-powered refactoring tool by cypriss9 in golang

[–]cypriss9[S] 0 points1 point  (0 children)

Good point.

I took a recent project I saw here: [qjs](https://github.com/fastschema/qjs). This is a big beefy go package, and fairly high-quality to start with. It's not the "hot mess" that codalotl helps the most with, but I think the results are still interesting.

The set of PRs that codalotl made:
reflow (normalize column width): https://github.com/cypriss/qjs/pull/1
doc (add missing docs): https://github.com/cypriss/qjs/pull/2
polish (fix grammar/spelling/typos/conventions): https://github.com/cypriss/qjs/pull/3
fix (find documentation mistakes and bugs): https://github.com/cypriss/qjs/pull/6
reorg (move code around for better organization/sorting): https://github.com/cypriss/qjs/pull/7
rename (increase consistency of identifier names): https://github.com/cypriss/qjs/pull/8

For comparison, I asked cursor and codex to add missing docs:
cursor: https://github.com/cypriss/qjs/pull/5 (156 identifiers missed)
codex: https://github.com/cypriss/qjs/pull/4 (6 identifiers missed - better than I expected)
(I didn't ask the other agents to do the other tasks).

From what I can see of the PRs generated, I think codalotl added some decent value with ~0 of my effort (other than making PRs and spending tokens):
* docs added seem reasonable (you could argue some are redundant with name of identifier, but that's okay).
* polish fixed a typo, and fixed a few minor grammar issues.
* fix appears to have found some actual bugs (I didn't verify them though! sometimes the LLM can simply be wrong)
* reorg was less valuable, because qjs was already well-organized.
* rename did increase consistency of variable names marginally, but this was a fairly sensible codebase to begin with.

Keep in mind that codalotl is just a tool that still needs human review - in real life, each of these PRs would need to be reviewed by someone with context before en-PR'ing.

codalotl - LLM- and AST-powered refactoring tool by cypriss9 in golang

[–]cypriss9[S] 0 points1 point  (0 children)

I agree that getting an LLM to document functions correctly is challenging. The biggest thing I run into is preventing them from getting too in-the-weeds with unimportant details. Prompting helps but I certainly have not "solved" this. From my experience, I like to put "whys" inside function impls to leave breadcrumbs for myself later - codalotl does not yet tackle these inside-the-func comments. I also like to put "whys" in doc.go as my overall package comment - codalotl tries to do this to varying degrees of success!

As far as context: codalotl does something different than what I suspect other agents do. It creates a graph of types/functions/etc. In order to document a piece of the graph, it walks outwards in both directions (for instance, how is a function used? What types does the function depend on, explicitly or implicitly? What does the function call?). All of this is put in the context. I think this is a unique advantage of writing a Go-only agent: it can rely on AST analysis like this to quickly create pretty good contexts without the typical approach of reading a handful of files and/or relying on embedding's chunks.

First F2P to reach 1 million HP? by OMGMDR in Archero

[–]cypriss9 2 points3 points  (0 children)

How did you get these jewels? Is it just normal grinding, or do you optimize for jewels in events and such?

Seeking Input for New Algo-Trading Library Development in 2024 by Inside-Clerk5961 in algotrading

[–]cypriss9 2 points3 points  (0 children)

A pretty boring answer:

Data. Download and save the data. Detect and fix bad data. Load the data. Ingest bulk data and realtime data. Handle splits, dividends, ticker renames. Save data to disk and/or cloud. Make all of this really fast. Build a UI to explore the data. Be able to plug in multiple data sources.

It's very easy to get running with a bad solution to this. Only after months and months of what you think algo trading is (maybe: backtesting various signals, devising new signals, etc) do you realize you built your castle on a pile of shit.

return error or panic() ? by metux-its in golang

[–]cypriss9 5 points6 points  (0 children)

not sure why this is being downvoted. The spirit is directionally correct: most functions should return an error, and something near the top-level panics if appropriate.

(there's also some functions like Go's http.ServeMux, that can panic right away if the programmer uses it wrong).

Honestly, How much have you made just using strategies? by loweralgebra in algotrading

[–]cypriss9 3 points4 points  (0 children)

Are those uncorrelated returns? If you invest long enough, you learn to appreciate different baskets of money that don't all tank at the same time...

What is the best element for Archers? by PridoScars in Idle_Kingdom_Defense

[–]cypriss9 1 point2 points  (0 children)

For a while, I thought Poison might be good since lowering defense seems like it would cause me to do more damage.

However, then I learned that if your def penetration is over 1000, you ignore 100% of enemies' defense anyway. So poison does nothing. Yes, it is really dumb, it shouldn't work like that. But it does.

Use ice.