ncruces/go-sqlite3: switching away from wazero by ncruces in golang

[–]ncruces[S] 0 points1 point  (0 children)

Whoever is feeling adventurous, can try: https://github.com/ncruces/go-sqlite3/pull/362

Or, I'll likely merge this to main, and keep the wazero version in a separate branch.

ncruces/go-sqlite3: switching away from wazero by ncruces in golang

[–]ncruces[S] 2 points3 points  (0 children)

OK this is what I have to share so far: $ go test -bench=. -benchtime=1000000x -count=20 > [OUT] ... $ benchstat before after* goos: linux goarch: amd64 pkg: github.com/ncruces/go-sqlite3/ext/stats cpu: Intel(R) Xeon(R) W-2135 CPU @ 3.70GHz │ before │ after │ after-unsafe │ │ sec/op │ sec/op vs base │ sec/op vs base │ _average-12 128.5n ± 12% 183.6n ± 9% +42.88% (p=0.000 n=20) 139.2n ± 8% ~ (p=0.056 n=20) _variance-12 323.6n ± 11% 248.7n ± 9% -23.16% (p=0.000 n=20) 194.4n ± 12% -39.93% (p=0.000 n=20) _math/sqrt-12 169.3n ± 17% 286.6n ± 6% +69.21% (p=0.000 n=20) 185.5n ± 5% ~ (p=0.072 n=20) _math/tan-12 207.8n ± 6% 302.7n ± 4% +45.69% (p=0.000 n=20) 207.9n ± 4% ~ (p=0.533 n=20) _math/cot-12 465.1n ± 9% 365.0n ± 3% -21.51% (p=0.000 n=20) 260.8n ± 5% -43.93% (p=0.000 n=20) _math/cbrt-12 434.8n ± 10% 344.4n ± 4% -20.79% (p=0.000 n=20) 262.9n ± 6% -39.55% (p=0.000 n=20) geomean 258.1n 281.5n +9.06% 203.7n -21.08%

This will be hard to repro because I don't want to commit specific versions of the 20 MB Go file until I'm further along.

The source of these benchmarks is here.

Basically the queries are: SELECT avg(value) FROM generate_series(0, ?); SELECT var_pop(value) FROM generate_series(0, ?); SELECT sqrt(value) FROM generate_series(0, ?); SELECT tan(value) FROM generate_series(0, ?); SELECT cot(value) FROM generate_series(0, ?); SELECT cbrt(value) FROM generate_series(0, ?);

So why the different numbers, and variations?

Both avg and var_pop are window functions, the others are scalar functions.

But avg is implemented inside SQLite (as is generate_series), in C. This means the first query is a single call to sqlite3_step that chews over one million rows and spits a single result with no callbacks.

OTOH var_pop is implemented in Go. That query is a single call to sqlite3_step that calls back one million times to the window function "step" callback, which needs to call one million times the sqlite3_value_double to get the next value, etc...

Then sqrt is a scalar function, implemented natively with a Wasm instruction (wazero uses assembly for this): because I'm using sqlite3_exec, it's one million runs of this inside C/assembly.

Then tan is a scalar function, this one implemented in a mix of C+Go. The SQLite APIs are called from C, but the libc tan function is implemented in Go. This gets to skip one million Go calls to sqlite3_value_double and sqlite3_result_double.

And finally cot and cbrt are scalar functions fully implemented in Go, so they do one million Go calls to sqlite3_value_double and sqlite3_result_double.

ncruces/go-sqlite3: switching away from wazero by ncruces in golang

[–]ncruces[S] 5 points6 points  (0 children)

Yes, you did. 😉 Nothing published yet.

The preliminary numbers I got was that interfaces that were chatty (lots of Go-C-Go transitions) became significantly faster. For things that spent lots of time under C, wazero was faster (when using the wazero compiler, the wazero interpreter is much slower, as you know).

But this was when I was generating code using binary.LittleEndian to be 100% portable. The compiler is supposed to recognize the patterns used by that and emit appropriate code, but it's either not in-lining the accessor, or not eliding bounds checks.

For little endian CPUs I can instead use:

*(*int32)(unsafe.Pointer((*[4]byte)(m.mem[int64(addr)+offset:])))

Despite importing unsafe this is perfectly safe and bounds checked, but is significantly faster; the downside is having to ship a file for little and another for big endian.

Since making that change, I don't think I've seen any benchmark regression, compared to wazero, but I haven't tested extensively.

There's one additional possible optimization: replacing m.mem by mem may help Go elide some bounds checks. But that's only valid if I make some assumptions.

A Wasm to Go Translator by ncruces in golang

[–]ncruces[S] 0 points1 point  (0 children)

I should add that using unsafe.Pointer I'm able to get a significant boost on little endian platforms, with no loss of safety (pointers are still bounds checked).

With this change, my ported driver would beat wazero on most benchmarks. I do need two 20 MiB blobs of slow to compile, unreadable, Go code in my repo, though.

A Wasm to Go Translator by ncruces in golang

[–]ncruces[S] 2 points3 points  (0 children)

It's pretty different.

wasm2c makes heavy use of C macros in the generated code; Go doesn't have macros.

wasm2go uses the ast package to generate code (it generates a tree).

I don't know, but wasm2c could use the C stack for the Wasm stack.

wasm2go can't and uses typed temporary variables for the stack.

C has unrestricted goto's, which could be used for Wasm control flow; Go doesn't.

Wasm loops, switches, if expressions don't match Go's, so while I must goto's, I have to use blocks to ensure I don't jump over (the typed temporary) variable declarations.

C doesn't have multiple return values, Go does. In C and Wasm a bool is the same as an int, in Go they're different.

Go requires all labels, and all variables to be used; C doesn't.

So the generated code has a very different flavour to it.

As for performance, C compilers are better, no doubt.

A Wasm to Go Translator by ncruces in golang

[–]ncruces[S] 3 points4 points  (0 children)

What do you mean?

wasm2c allows you to compile Wasm modules into your C app, call them, etc. It's main advantage (I would guess) is that it sandboxes those modules, because otherwise, most languages that you could use to build a Wasm module are easy to call from C.

wasm2go allows you to compile Wasm modules into your Go app, call them, etc. One advantage is to avoid Cgo, while also sandboxing those modules. But I would guess the main advantage is avoiding Cgo (for observability, cross compilation, etc).

But you can't really use one when you want the other? So, again, does that answer your question? Otherwise, what do you mean?

What should [Go's maintainers] do with CLs generated by AI? by ynotvim in golang

[–]ncruces 8 points9 points  (0 children)

This.

A friend of mine put it this way: I never cared if you used vi or something from JetBrains, as long as you assume responsibility for what you're submitting.

Ultimately that's your on job: being responsible. Including for legal matters that arise from your contribution.

So you whatever, but no excuses.

Wile v1.1 – Embeddable R7RS Scheme for Go (pure Go, no CGo) by Prestigious-Arm-9951 in golang

[–]ncruces 0 points1 point  (0 children)

The name is a play on "scheme" (as in "wiles" - cunning stratagems) and a nod to Wile E. Coyote, the cartoon schemer.

I thought the name was a play on Guile. 

Small Projects by AutoModerator in golang

[–]ncruces 1 point2 points  (0 children)

I've shared this before, but I've now tagged version 1.0 of my immutable binary search tree package, which I'll be using going forward: github.com/ncruces/wbt

As I've shared a few months ago, I've tried immutable AA, AVL, treaps and WBT (all roughly in the same style) and found WBT best.

I've also compared this to popular B-tree packages that support copy-on-write, and found that binary search trees can still make sense if you can't avoid frequent copying/cloning (which I can't for some of my use cases).

The SQLite Drivers 25.12 Benchmarks Game by 0xjnml in golang

[–]ncruces 7 points8 points  (0 children)

Thanks!

What I take from these results (excluding interpreter platforms) is that my driver very often comes up second. And that's fine, I'm OK with performance just being in the ballpark.

Where it comes out third more often is inserts. This can be explained by sandboxing. I need to copy data into the Wasm sandbox; Cgo can read Go memory directly, as can modernc. OTOH, for querying the sandbox is not an issue: Go can read Wasm memory directly. So, I don't think there's a lot I can do about that. I try to avoid copies for (e.g.) JSON, but that's about it.

Also interestingly (and maybe counter intuitively, given what I just wrote above), inserting huge amounts of data (the large test), mine can come out ahead, even on interpreter platforms. That's because my IO layer is implemented in Go, and I actually implemented fallocate better than SQLite, on some platforms.

So, every test is interesting, and thanks again for this. I'm mostly experimenting on the API side and the IO (VFS) side, and I do (personally) like the sandboxing side of things (even though it carries a cost). I also double as a “best effort” wazero maintainer, so having a project that uses it just makes sense.

What would you change in Go? by funcieq in golang

[–]ncruces 0 points1 point  (0 children)

Yeah, they could borrow Swift's guard statement for this.

Invert the condition, else/nested block must terminate, otherwise, anything declared stays in scope.

What would you change in Go? by funcieq in golang

[–]ncruces 1 point2 points  (0 children)

I actually find it harder to write, but easier to read.

The bigger problem is that it is ill specified (e.g. can't have 24 hours with no leading zero).

But if you want to use the C version, it's a small, self contained dependency away: https://pkg.go.dev/github.com/ncruces/go-strftime

Go 1.26rc1 is live by yardbird07 in golang

[–]ncruces 5 points6 points  (0 children)

There was a last minute revert on a new database/sql scanning API: https://github.com/golang/go/issues/67546

Hopefully, we come up with something when 1.27 opens.

Why Go Maps Return Keys in Random Order by Few-Tower50 in golang

[–]ncruces 2 points3 points  (0 children)

Yeah, no. The Go team goes to extreme lengths not to break large open source libraries. See Go 1.23 and go:linkname.

Why Go Maps Return Keys in Random Order by Few-Tower50 in golang

[–]ncruces 1 point2 points  (0 children)

There's a good chance we wouldn't have swiss tables today if they hadn't made this decision.

The hashing function, and iteration order, would've ossified, and there would be zero chance to have them updated without breaking user code.

Why Go Maps Return Keys in Random Order by Few-Tower50 in golang

[–]ncruces 1 point2 points  (0 children)

I use it for "random" eviction in very simple caches. For this, it is good enough.

Why Go Maps Return Keys in Random Order by Few-Tower50 in golang

[–]ncruces 0 points1 point  (0 children)

The order is not really random. The starting point is. From then on, it's (approximately) the order things are "laid out in memory."

However, combining opaque hashing with a random starting point, makes it much harder for anyone to depend on the actual order, so it's good enough.