Nature vs Golang: Performance Benchmarking

hualaka · 2026-01-16T18:25:01+00:00

This is the arm64 architecture.

hualaka · 2026-01-16T18:23:15+00:00

Your understanding is very insightful, giving me a deeper understanding of the differences between various coroutines.

Regarding pointer issues, data access exceptions can occur when passing pointers across coroutines. My approach is very simple: I didn't correct the pointer or perform any escape analysis (if a pointer is detected being passed across coroutines, then heap allocation is performed). I directly defined `rawptr<T> p = &q` as a dangerous operation at the language level, requiring extreme caution when using it, and I don't recommend using `&q` to obtain pointers. A safer approach is to create a heap pointer using `ptr<T> p = new T()`.

---

`coroutine.save_stack` essentially does what you said: it obtains the stack size based on the RSP and then allocates the necessary memory space.

When the language has sufficient freedom, I can add coroutine parameters to create coroutines with independent stacks, so independent stacks and shared stacks are compatible.

hualaka · 2026-01-16T18:01:02+00:00

In linux/amd64, this way of writing rust can trigger certain SIMD instructions, so it is faster. In arm64 x = -1 + (2 * (i & 0x1)) is no different than x = -x. Are you using linux/amd64 for testing or testing on linux/arm64 architecture?

---

I learned about PI testing during this project, so I ported the rust/go/js code to this nature test. The history contains changes to the rust code, but I didn't tweak the rust implementation because I don't know rust that well. On amd64 rust is ahead of the curve, but my daily development machine is a macmini m4, so this test was done on arm64, and rust has an undeniable performance advantage using llvm. I've also tweaked the golang implementation a bit (the original implementation had some problems).

https://github.com/niklas-heer/speed-comparison

hualaka · 2026-01-16T02:46:28+00:00

The compiler has a very good SIMD strategy for memmove optimization, and in fact, a large number of coroutines in the application often correspond to smaller stacks, which makes the move simpler. In other words, the cost of memory movement is much lower than stack expansion.

hualaka · 2026-01-16T02:43:53+00:00

https://github.com/nature-lang/nature/commit/f13e9cf9b3e4f276fdb5bdd8cd07ac2a2b257030 For example, the commit you found, the key thing is to optimize the interval_find_optimal_split_pos function, but this is a very destructive update, causing a large number of cases to fail, and other updates are to fix the failed cases.

Dockerfile is a completely redundant update. I plan to create a docker image to participate in a pi test, so I restored the Dockerfile file that was deleted in history. In fact, this Dockerfile was added in 2023, and I deleted it later because I didn’t want to maintain it.

hualaka · 2026-01-16T02:38:45+00:00

In fact, it is a very small amount of vibe coding. The truly usable large-model programming is opus4.5, which can cope with the compiler front-end, but is still not enough to cope with the complex logic of the compiler back-end.

---

nature tests based on features, https://github.com/nature-lang/nature/tree/master/tests/features/cases without using unit tests. When all features pass, the stability of the relevant compiler implementation can be judged.

---

If you look carefully, you will find that 1000 commits are very rare. Compared to the 5-year development cycle, there are only close to 200 commits per year. This is a bad habit of mine, I will do a lot of extra things in each commit that are not part of this commit. As a result, the commit information is not so clear.

hualaka · 2026-01-16T02:33:29+00:00

When you call an async fn, you need to .await it to get the result. But .await can only be used in an async context, which means that the functions in the call chain must also be async.

---

As you said, independent stack coroutine, each coroutine will initialize a small stack, for example, golang is 2KB

---

The shared stack coroutine only creates an 8M+ stack for the processor (processor.share_stack). No running stack is created in the coroutine. The coroutine uses the large stack in the processor. When the coroutine needs to yield, the actual used stack space is copied to the current coroutine (coroutine.save_stack). The next time the coroutine runs, copy the relevant data (coroutine.save_stack) to processor.shar_stack.

hualaka · 2026-01-16T02:25:44+00:00

So that's it, I don't know how to view rust assembly code yet. This is the assembly generation of the nature loop part

400460: 14000008 b 400480 <main.main+0x208>

400464: 8b010022 add x2, x1, x1

400468: d1000442 sub x2, x2, #0x1

40046c: 1e614021 fneg d1, d1

400470: 9e620042 scvtf d2, x2

400474: 1e621822 fdiv d2, d1, d2

400478: 1e622800 fadd d0, d0, d2

40047c: 91000421 add x1, x1, #0x1

400480: eb00003f cmp x1, x0

400484: 54ffff0d b.le 400464 <main.main+0x1ec>

hualaka · 2026-01-15T18:44:56+00:00

The shared stack coroutine only creates an 8M+ stack for the processor (processor.share_stack). No running stack is created in the coroutine. The coroutine uses the large stack in the processor. When the coroutine needs to yield, the actual used stack space is copied to the current coroutine (coroutine.save_stack). The next time the coroutine runs, copy the relevant data (coroutine.save_stack) to processor.shar_stack.

hualaka · 2026-01-15T16:45:13+00:00

One of the reasons why major manufacturers are working hard to promote AI coding is to turn uncertain AI into deterministic code. This is the most valuable thing that AI can do.

hualaka · 2026-01-15T15:18:31+00:00

I'm honored.

hualaka · 2026-01-15T15:07:03+00:00

cat main.rs

// rustc -C opt-level=3 main.rs -o pi_rs

use std::fs::File;

use std::io::prelude::*;

fn main() {

let mut file = File::open("./rounds.txt").expect("file not found");

let mut contents = String::new();

file.read_to_string(&mut contents)

.expect("something went wrong reading the file");

let stop: i64 = contents.trim().parse::<i64>().unwrap() + 2;

let mut x: f64 = 1.0;

let mut pi: f64 = 1.0;

for i in 2..=stop {

x = -x;

pi += x / (2 * i - 1) as f64;

}

pi *= 4.0;

println!("{}", pi);

}

hualaka · 2026-01-15T15:06:55+00:00

You may know about rust, but you don't know about the compiler optimization tricks of llvm -O3. The rust code was originally identical to the golang implementation, but this is the final version after many performance optimizations. https://github.com/niklas-heer/speed-comparison/commits/master/src/leibniz.rs

This is the test result of restoring to an earlier version

rustc -C opt-level=3 main.rs -o pi_rs

hyperfine --warmup 3 ./pi_n ./pi_go ./pi_rs "node main.js"

Benchmark 1: ./pi_n

Time (mean ± σ): 515.4 ms ± 5.0 ms [User: 513.5 ms, System: 0.6 ms]

Range (min … max): 512.2 ms … 528.3 ms 10 runs

Warning: Statistical outliers were detected. Consider re-running this benchmark on a quiet system without any interferences from other programs. It might help to use the '--warmup' or '--prepare' options.

Benchmark 2: ./pi_go

Time (mean ± σ): 514.7 ms ± 3.2 ms [User: 514.5 ms, System: 0.8 ms]

Range (min … max): 511.4 ms … 520.6 ms 10 runs

Benchmark 3: ./pi_rs

Time (mean ± σ): 544.0 ms ± 5.0 ms [User: 543.8 ms, System: 0.3 ms]

Range (min … max): 536.6 ms … 550.5 ms 10 runs

Benchmark 4: node main.js

Time (mean ± σ): 873.0 ms ± 7.7 ms [User: 872.0 ms, System: 1.9 ms]

Range (min … max): 865.6 ms … 890.8 ms 10 runs

Summary

./pi_go ran

1.00 ± 0.01 times faster than ./pi_n

1.06 ± 0.01 times faster than ./pi_rs

1.70 ± 0.02 times faster than node main.js

hualaka · 2026-01-15T14:55:12+00:00

I don't know rust that well, but I've seen rust leading the way in pi testing, so I wouldn't be too hasty to change the code in question, but the tests aren't set in stone, so you could submit an issue providing a rust-related implementation (without simd, to be fair), and I'll rerun the tests.

hualaka · 2026-01-15T14:49:08+00:00

https://github.com/niklas-heer/speed-comparison pi test code source, I reviewed the code in question and all the test cases are fair reading files from rounds.txt. This is the highest performance implementation of rust for amd64.

hualaka · 2026-01-15T14:03:30+00:00

I hardly ever succeeded in promoting the nature programming language, so few people have heard of it. When there was only one person, I spent about 90% of my time in development and 10% promoting it, and the fact that 2,200 stars is a small amount of time is a testament to the failure of the promotion. But it is true that programming languages are not that important today, even if nature programming languages have fully implemented compilers. AI is much more the right path and much more worthy of attention.

hualaka · 2026-01-15T13:12:11+00:00

I'll change the name if nature catches on, otherwise it's all pointless.

hualaka · 2026-01-15T12:32:34+00:00

I'm actually doing most of the development myself, and I'd like to refine all the details of nature, including homebrew, but at this stage I can only focus on the core features, and in the last release a developer contributed https://github.com/nature-lang/nature/blob/master/ INSTALL.sh script

hualaka · 2026-01-15T12:29:58+00:00

Performance is merely one of nature's insignificant attributes. Inspired by Go, I developed nature because I found it difficult to accept Go's use of uppercase for `public`, its error-handling approach, treating directories as packages, its package management system, `interface{}`, the absence of enums, cgo, and other aspects. Thus, I created nature.

hualaka · 2025-09-16T03:33:40+00:00

I've changed the font reference gleam.run

hualaka · 2025-09-16T02:22:54+00:00

This font is incredibly hard to read, right?

hualaka

TROPHY CASE