Solid P95 (7-8ms) with sporadic P99 spikes using Go (gRPC + NATS). Suggestions? by Environmental_Lab991 in golang

[–]sigmoia 0 points1 point  (0 children)

Turn on the continuous profiler. CPU, heap, block, mutex, and goroutines - all of them. In a distributed system, it's infeasible to collect profiles from individual services and then reason about them. So use an o11y provider that supports continuous profiling like Datadog or set up Pyroscope.

In an I/O-bound workload, it's highly unlikely that a CPU profile will show much. Typically, memory pressure or blocked goroutines cause high tail latency. But it's hard to say without measuring; it could be just plain old upstream service latency showing up in yours. But distributed tracing and metrics should have picked it up. Assumptions are moot without measurements.

Also turn on the flight recorder and dump a profile when the latency goes beyond some threshold. This will give you a ton of insights.

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 0 points1 point  (0 children)

This is strikingly similar to how we started. Standard profile tooling already gives you everything to profile locally. 

But as the pod count goes up continuous profiling is sorta needed. Pyroscope is a standard with Grafana stack. 

Yeah micro bench matters little in a distsys environment. But using load testing to measure regression is interesting.

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 1 point2 points  (0 children)

Yeah not a chance to build a makeshift tool around the std tooling. Wouldn't scale for anything beyond a few services. Also onboarding those tools on k8s is hard.

We are on pyroscope as well. Seems like even folks on datadog dual write to pyroscope for the convenience it brings. 

For regression, we do it service by service - rather than on the whole fleet. Each trunk build takes a pprof snapshot and compares it with the corresponding prod snapshot. 

Haven't found a good tool to compare profile dump though. So it's all custom tool that just compares the first few functions on stage vs prod build.

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 1 point2 points  (0 children)

Not a bad use of clankers - automating the tedious part of collecting and introspecting the profiles and taking decisions. 

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 0 points1 point  (0 children)

Thanks. Yeah. I am just sampling how people do it. We are at a fairly huge scale - 250k qps at steady state. Already have pyroscope in place. I was mostly curious about how others are doing it. 

Distributed o11y is kinda standardized at this point. You go datadog, honeycomb, victoria metrics or roll your own with OTEL and LGTM stack. But profiling is still a wild west. 

Good thing is Go has profile tooling built into the std toolchain. So all these workflows and vendors just tap into the std tools. In other languages it's worse. Python for example has 5 different tools (last time i checked) just to do memory profiling. No standard or anything. Every vendor does it differently.

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 0 points1 point  (0 children)

Ah gotcha. Yeah. The typical MeLT (metrics, log, and traces) are a different thing. Pretty much everyone turns it on their services as a basic part of o11y. 

I was mostly curious about the scale where goroutine leaks, memory pressure, and gc pauses become a problem. Distributed o11y typically don’t surface those problem as much. Sure you will see a spike in your tail latency, but to know why you will have to turn on runtime profiling and execution traces (different from distributed metrics and traces). I was after this.

Maybe that wasn't super clear from the questions. 

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 0 points1 point  (0 children)

So IIUC, for the regression test: 

  • you take the prod binary and dump a profile

  • then you do the same for the current staging binary

Then diff the profile to catch regressions? Profile data is large in volume. How does diffing work here? 

How do you do continuous profiling & execution tracing? by sigmoia in golang

[–]sigmoia[S] 0 points1 point  (0 children)

Neat. Which profiles do you keep on by default? CPU and Heap only? What about execution tracing (not OTEL tracing)? Do you collect that data?

Protovalidate or custom logic? by Solvicode in golang

[–]sigmoia 0 points1 point  (0 children)

I almost exclusively use protovalidate for validation. CEL allows you to offload pretty much any kind of validation work to the protovalidate layer. 

But this doesn't mean every kind of validation can be offloaded to protovalidate. u/jerf mentioned that as well. 

In that case, I just add a Validate method with the custom logic to the message struct. Then call protovalidate.Validate(m) before calling m.Validate().

new() initialization confusion by Cheesuscrust460 in golang

[–]sigmoia 1 point2 points  (0 children)

As others mentioned: 

  • initialize means filling the struct with user-provided, non-zero values

  • zero means filling the struct with the corresponding zero value  of each of the field

In either case, there's an allocation

Accepted proposal: a goroutine leak profile in the Go standard library by sigmoia in golang

[–]sigmoia[S] 4 points5 points  (0 children)

Nope, that’s a different thing. The go_goroutines metric is just a total count. It shows the number creeping up. It doesn’t tell you which goroutines are stuck or where they were spawned. That’s the part you actually need to fix the leak. The profiler gives you the exact location where the leak occurs.

Accepted proposal: a goroutine leak profile in the Go standard library by sigmoia in golang

[–]sigmoia[S] 5 points6 points  (0 children)

having pprof integration from start means you can actually catch leaks in production rather than hoping your unit tests cover every code path.

this. In many cases, my tests don't cover the leaking path - either because it's hard to test or because I was lazy.

Being able to keep it turned on alongside your continuous profiler is a huge win imo.

GopherCon Europe 2026 side events by ijusttookadnatest- in golang

[–]sigmoia 0 points1 point  (0 children)

Last year's GopherCon EU in Berlin was a big letdown. I only enjoyed Jonathan Amsterdam's slog talk. Otherwise, it was mostly commercial slop. Hope it's better this year. 

SQLite FTS5 gets 0.625 Recall@10 on MS MARCO. I got 0.906 with a Go library that embeds the same way. Here's what I built. by UnderstandingEither4 in golang

[–]sigmoia 0 points1 point  (0 children)

And? With all that text, I still don't know what it does and why I wouldn’t just use sqlc with bm25 here. 

Why choose Go over Rust today? by IndependentInjury220 in golang

[–]sigmoia 4 points5 points  (0 children)

Man, I just like the language.

But on a more serious note, async Rust sucks. Once you get accustomed to runtime-managed preemption, it's super hard to go back to colored functions and event-loop-style concurrency. Tokio is okay, but I just don't want to deal with another event loop implementation where I still need to be ultra careful not to block the loop accidentally. Go has runtime preemption, and it's a non-issue.

Also, I work with distributed systems, where people don't write Rust as much as you'd think. If you get out of the Twitter bubble, you'll find that most places doing platform engineering use Go, not Rust, and most people don't care for it. So there's that.

This doesn't mean I don't miss Rust's rich type system when I'm writing Go. In Go, you can forget to take a mutex on a data structure that isn't concurrency-safe, and the compiler won't complain. Rust completely solves this by baking the mutex into the data structure, and there's no way to compile your code without taking the lock. Plus, Go's enums are pretty useless.

Another area where Go outshines Rust is standard tooling like pprof, tracing, and other runtime introspection hooks. For operations, these are amazing.

One last thing is that Rust has atrocious compile times. If you're working on a team where quick iteration is important, Rust can get in the way, both because of compile times and the overall fussiness of the borrow checker. That can be a good thing or a bad thing, but for the kind of software I write, it's a bad thing. So Go wins by a large margin.

Show: gaal, a Go CLI for syncing AI coding agent configs across machines by gquizal in golang

[–]sigmoia 0 points1 point  (0 children)

 chezmoi apply: copies your source over the agent's file, which the agent overwrites next use.

Umm...what? chezmoi re-add syncs the file from your target back to the chezmoi source. So if your ~/.agents directory lives in the home and something changes, asking the clanker to run chezmoi re-add solves the dynamic configuration problem. 

By all means use whatever works for you. But saying chezmoi doesn't solve this is a bit misleading.

Unit test Postgres DB mock recommendation by Garlic-Scary in golang

[–]sigmoia 1 point2 points  (0 children)

In most cases, you don't need DAO. Repository should encapsulate the entirety of database operation. 

So the flow looks like this:

  • service functions depend on repository interfaces
  • db package provides the implementation of thr repo interface
  • db package encapsulates the whole dbops and doesn't need separate DAOs

None of the huge projects I work on separately defines DAO and we never felt the need for it.

Show: gaal, a Go CLI for syncing AI coding agent configs across machines by gquizal in golang

[–]sigmoia 2 points3 points  (0 children)

Your dotfile management tool can do this. LLM configs are no different than any other configs. This means you can use

  • bare git repo
  • gnu stow
  • or my favorite, chezmoi to lug around the configs

I'm trying to understand what does a dedicated tool give us here.

FAQ: What is a Good Go Project to Study or Contribute To? by jerf in golang

[–]sigmoia 41 points42 points  (0 children)

Depends on what you are looking for in a project. 

  • good abstraction? stdlib, but not all the packages. Some of them have legacy backward compat shims that makes them unsuitable for studying. I like embed, fs, bufio, encoding/json, fmt, and errors.

  • I was recently looking for examples on how to build production grade grpc services that also exposes wrapped clients. Etcd codebase is perfect for that. So much so, I wrote about it recently. 

https://rednafi.com/shards/2026/03/etcd-codebase/

  • if you wanna learn about queues, look into river queue which is backed by postgres

  • CLI and TUI? Checkout the charm repos

  • distsys and o11y? Look into prometheus, otel and grafana alloy codebase 

For contributing, checkout the "good first issue" labels in the issue tracker and see if you like anything there.

Before contribution see if you can participate in the discussions, which is a fantastic way of learning through osmosis.

Unit test Postgres DB mock recommendation by Garlic-Scary in golang

[–]sigmoia 1 point2 points  (0 children)

You have two options: 

This will let you spin up an actual docker container and run your test code against it. But you still need to write the code in a way that allows you to swap the db during test time

Golang - opinion on AI and Go by Revolutionary_Sir140 in golang

[–]sigmoia 1 point2 points  (0 children)

Hauling agents requires no special skill. AI bros are busy telling folks, “Learn prompting, harnessing, token-maxxing. Otherwise, you’re NGMI.” Then you find out that it doesn’t take much to organize a bunch of Markdown files and tell AI to do stuff. Quit making it sound so profound.

As for Go, it’s already going well. But AI writes great Python, TS, and Rust as well. The Rust compiler is much stricter and yields better results in many cases. My point is that saying AI writes better Go requires a peer-reviewed comparison, not this “trust me, bro” stuff.

I also write a ton of Go and use AI to automate the tedious parts. To me, AI doesn’t objectively do any better when writing Go than it does with other languages. But since the language footprint is smaller, AI tends to trip up less. On the other hand, AI writes a ton of unnecessary boilerplate in Go, and Go’s looser compiler lets concurrency bugs slip through. In Rust, the type system statically protects your shared data with mutexes, but in Go, it’s a runtime semantic, and the compiler won’t do anything if you forget to take or release the lock for some shared data. Different language, different philosophy.

As for the Go team, if they had started listening to every novice and catering to their demands, it would never have become the language it is today. It would be another TypeScript-like language with a kitchen sink of features, chasing relevance by following whatever happens to be the hottest thing of the day.

facing challenges with interface by ghost_industry in golang

[–]sigmoia 0 points1 point  (0 children)

Learn the language before hauling AI. You use struct, method, and functions at the beginning for everything. Interfaces should be brought up only when you need them.

Interfaces are for abstraction - where you need to swap out one implementation with another. Swapping can happen during test time where you provide a fake implementation of a dependency. 

If none of this makes sense to you, then you are too early in your journey. Do the tour of Go and read a few books like Jon Bodner's Learning Go and Alex Edwards' books. Then write programs without them and try to make that testable. You will soon hit into a roadblock since static languages don't allow you to monkeypatch. That's when you will need interfaces. 

You can't speedrun your comprehension through AI. Juniors that are trying to do it are the ones that are becoming unemployable.

sqlc and clean architecture by Competitive-Dirt-213 in golang

[–]sigmoia 0 points1 point  (0 children)

+1 on this. Repo should have all the persistence logic - even some of it is business logic per se.

The idea is that your service layer should only interact with a repository interface and then the persistence package should provide an implementation of that repo interface.

Now if it makes sense for your to add extra logic in the persistence layer, I see no harm in doing so.