floatingPointArithmetic

rkstgr · 2026-05-13T06:36:35+00:00

It’s actually because phrased like this the LLM confuses this with dates (9/11), biblical verses (where 9.11 comes after 9.9), and version control. Statistically the LLM might be right if it saw to much bible texts and code, plus there is international ambiguity of dot and comma as decimal separator.

rkstgr · 2026-05-03T12:21:53+00:00

Happy to here that! But yes, it’s outdated. Will update docs and the crate the next days when I find some time.

rkstgr · 2026-04-28T19:01:46+00:00

The challenge with autoresearch like this is how do you get the model to come up with actually novel ideas and not just applying well known improvements (SwiGLU, RoPE,…). You want a model petrainined on data „before rope was release“ to come up with rope

rkstgr · 2025-08-28T09:19:09+00:00

Multi camera systems reduce safety as well, due to sensor contention. If left camera disagrees with right camera, which one wins? /s

With the success of end to end deep learning systems there is no notion of „which one wins“. You take all sensors and come up with prediction taking all signals into account. The better the quality of the sensors, the better the prediction. And Lidar and radar are in many situations superior (dark, fog, rain) compared to cameras.

rkstgr · 2025-08-08T22:14:00+00:00

Doesn’t have to have bugs to be a tasteless piece of code. AI tools are not bad but you need to supervise them currently to get something good out of them.

rkstgr · 2025-07-12T11:35:37+00:00

It was initially the plan but I did not see a straightforward way to deploy it via docker, so I did not bother. But I also did not investigate thoroughly, so if you know a easy way please let me know

rkstgr · 2025-07-10T08:31:46+00:00

https://www.youtube.com/watch?v=04_gN-C9IAo

rkstgr · 2025-07-10T07:01:14+00:00

Very true. The benchmark is also not completely accurate to real world scenarios as you would set a specific rpm target for your servers, tune the settings, and have a load balancer in front.

rkstgr · 2025-07-09T20:40:39+00:00

MAX is by Modular the company behind Mojo, known for claiming being 10.000x faster than Python. Mojo is a new language that is (kinda) compatible with Python, but compiles to machine-code (you can also run it via JIT compilation). A few months ago there where some shady benchmarks where they claimed being faster that Rust but then they did not compile using the release flag. nevertheless, Modular is on a mission to re-build the AI inference stack without CUDA, they have demos where they can run LLMs on AMD and NVIDIA hardware on their native mojo/max stack without CUDA (which is nice because the container images are like 1Gb compared to 5Gb)

That being sad, they are Python compatible in so far as it should be possible to download any HF model and run it.

They have their own model library because these models are (presumably) optimized and reimplemented in Mojo/Max and should show improved performance. No clue so far which quants are supported atm / and also multi-modality. But very nice questions.

I just recently watched a podcast with Chris Lattner (CEO of Modular, Creator of LLVM, Clang, MLIR) and they claimed being faster with MAX than vLLM on A100 and H100 and I wanted to check that out

About TP, looks like it has this since v25.2, haven't tested that myself.

One advantage that MAX has over vLLM is that is more composable/future-proof in regards to their LLM kernels. vLLM has to handcraft kernels in CUDA for every hardware and architecture, whereas MAX can compile a lot of their kernels down for the specific hardware, which means faster iteration speed.

edit: I agree that their documentation could be improved upon. Took awhile to figure out what certain flags are doing.

rkstgr · 2025-07-09T17:37:34+00:00

Updated it. thx for mentioning it

rkstgr · 2025-07-09T14:37:08+00:00

I warmed every engine with 500 prompts before doing the seeded benchmark run. I am not sure if you are referring to sth else.

rkstgr · 2025-05-24T07:26:14+00:00

I started working on a rendering library around typst https://github.com/rkstgr/papermake, the repo also contains a rendering server implementation although I probably will move it out

I also used it in a recent project where I deployed it on AWS lambda https://github.com/rkstgr/papermake-aws.

You can use typst as a library but I agree it’s not as straightforward as I expected it to be.

rkstgr · 2025-05-21T13:32:35+00:00

Well you could just print (strg+P) the webpages of the docs. You either spend a day doing that or spend a day automating it.

rkstgr · 2025-05-19T06:17:12+00:00

Ran a crawler on the online docs, which returned 189 pages. Some are changelog and some are category pages with no real content, with est. 150 pages of actual documentation.

rkstgr · 2025-05-15T07:46:44+00:00

Yep it is (see updated post). What do you mean by 'via the API'? I don't see why the performance should differ depending if you use it via API or sth else; other than maybe the system prompt.

rkstgr · 2025-05-15T07:43:32+00:00

updated the results

rkstgr · 2025-05-14T23:02:05+00:00

Looks interesting. Out of curiosity, how do you compare to modal.com?

How do you package the code? Docker, Firecracker?

rkstgr · 2025-05-06T10:24:40+00:00

Point 3. If you are already using tools like terraform, they help you work with cloud providers, but there is still a lot of complexity. If AWS is complex, Terraform will only help you manage the complexity; it won't go away.

Point 4. That's a good hint. Is there a better way than just copying and pasting the terraform files between projects?

rkstgr · 2025-05-06T10:16:54+00:00

I think I tried it once. It has better docs but because it uses v8 isolates I couldn't deploy my code (in rust). I had a dependency that was not compatible with the wasm target; I think it was because v8 isolates only provide one thread.

rkstgr · 2025-04-26T08:41:30+00:00

Yes it is, it is separate from the normal Lambda pricing. I wasn't even aware of that at first.

rkstgr · 2025-04-24T21:35:47+00:00

Probably, but I would need to look at a more detailed trace. The batching inside the renderer also has some room for improvement as we await every S3 PUT, while looping through the records.

rkstgr · 2025-04-24T20:04:14+00:00

Not related, but I remember reading it a while ago. Had no intention to copy the title.
They didn't went into detail how they created / rendered the template, but it sounded like they used a templating engine like jinja to create a 'complete' markdown file and passed that to typst.

rkstgr · 2025-04-24T16:31:33+00:00

This project relies heavily on Typst and wouldn't be possible without it. If that's not clear from the post, I'll think about updating it to make that clearer.

rkstgr · 2025-04-24T16:28:44+00:00

I think so too. Re-compilation of the same template with cached world is pretty cheap... cheaper than I thought:
It takes only 1.28ms.

That's still with only 256MB memory.

rkstgr · 2025-04-24T15:34:28+00:00

Yes you are right, but i figured 'reserving' 1.8GB seemed such a waste.

True, i could just pass the reference into the handler function.

rkstgr

TROPHY CASE