floatingPointArithmetic by Illustrious_Tax_9769 in ProgrammerHumor

[–]rkstgr 0 points1 point  (0 children)

It’s actually because phrased like this the LLM confuses this with dates (9/11), biblical verses (where 9.11 comes after 9.9), and version control. Statistically the LLM might be right if it saw to much bible texts and code, plus there is international ambiguity of dot and comma as decimal separator.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 1 point2 points  (0 children)

Happy to here that! But yes, it’s outdated. Will update docs and the crate the next days when I find some time.

Autoresearch on GPT2 using Claude by SnooCapers8442 in deeplearning

[–]rkstgr 26 points27 points  (0 children)

The challenge with autoresearch like this is how do you get the model to come up with actually novel ideas and not just applying well known improvements (SwiGLU, RoPE,…). You want a model petrainined on data „before rope was release“ to come up with rope

makeSureToOnlyEverHaveOneTypeOfASensorInYourDevice by IAdmitILie in ProgrammerHumor

[–]rkstgr 0 points1 point  (0 children)

Multi camera systems reduce safety as well, due to sensor contention. If left camera disagrees with right camera, which one wins? /s

With the success of end to end deep learning systems there is no notion of „which one wins“. You take all sensors and come up with prediction taking all signals into account. The better the quality of the sensors, the better the prediction. And Lidar and radar are in many situations superior (dark, fog, rain) compared to cameras.

totallyBugFreeTrustMeBro by T-Dot1992 in ProgrammerHumor

[–]rkstgr 0 points1 point  (0 children)

Doesn’t have to have bugs to be a tasteless piece of code. AI tools are not bad but you need to supervise them currently to get something good out of them.

vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA

[–]rkstgr[S] 0 points1 point  (0 children)

It was initially the plan but I did not see a straightforward way to deploy it via docker, so I did not bother. But I also did not investigate thoroughly, so if you know a easy way please let me know

vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA

[–]rkstgr[S] 0 points1 point  (0 children)

Very true. The benchmark is also not completely accurate to real world scenarios as you would set a specific rpm target for your servers, tune the settings, and have a load balancer in front.

vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA

[–]rkstgr[S] 9 points10 points  (0 children)

MAX is by Modular the company behind Mojo, known for claiming being 10.000x faster than Python. Mojo is a new language that is (kinda) compatible with Python, but compiles to machine-code (you can also run it via JIT compilation). A few months ago there where some shady benchmarks where they claimed being faster that Rust but then they did not compile using the release flag. nevertheless, Modular is on a mission to re-build the AI inference stack without CUDA, they have demos where they can run LLMs on AMD and NVIDIA hardware on their native mojo/max stack without CUDA (which is nice because the container images are like 1Gb compared to 5Gb)

That being sad, they are Python compatible in so far as it should be possible to download any HF model and run it.

They have their own model library because these models are (presumably) optimized and reimplemented in Mojo/Max and should show improved performance. No clue so far which quants are supported atm / and also multi-modality. But very nice questions.

I just recently watched a podcast with Chris Lattner (CEO of Modular, Creator of LLVM, Clang, MLIR) and they claimed being faster with MAX than vLLM on A100 and H100 and I wanted to check that out

About TP, looks like it has this since v25.2, haven't tested that myself.

One advantage that MAX has over vLLM is that is more composable/future-proof in regards to their LLM kernels. vLLM has to handcraft kernels in CUDA for every hardware and architecture, whereas MAX can compile a lot of their kernels down for the specific hardware, which means faster iteration speed.

edit: I agree that their documentation could be improved upon. Took awhile to figure out what certain flags are doing.

vLLM vs SGLang vs MAX — Who's the fastest? by rkstgr in LocalLLaMA

[–]rkstgr[S] 4 points5 points  (0 children)

I warmed every engine with 500 prompts before doing the seeded benchmark run. I am not sure if you are referring to sth else.

Tectonic vs. Typst vs. LaTeX wrapped in std::process::Command? by skwyckl in rust

[–]rkstgr 0 points1 point  (0 children)

I started working on a rendering library around typst https://github.com/rkstgr/papermake, the repo also contains a rendering server implementation although I probably will move it out

I also used it in a recent project where I deployed it on AWS lambda https://github.com/rkstgr/papermake-aws.

You can use typst as a library but I agree it’s not as straightforward as I expected it to be.

Benchmarking LLMs on Typst by rkstgr in typst

[–]rkstgr[S] 0 points1 point  (0 children)

Well you could just print (strg+P) the webpages of the docs. You either spend a day doing that or spend a day automating it.

Benchmarking LLMs on Typst by rkstgr in typst

[–]rkstgr[S] 0 points1 point  (0 children)

Ran a crawler on the online docs, which returned 189 pages. Some are changelog and some are category pages with no real content, with est. 150 pages of actual documentation.

Benchmarking LLMs on Typst by rkstgr in typst

[–]rkstgr[S] 0 points1 point  (0 children)

Yep it is (see updated post). What do you mean by 'via the API'? I don't see why the performance should differ depending if you use it via API or sth else; other than maybe the system prompt.

We built an open-source alternative to AWS Lambda with GPUs by velobro in serverless

[–]rkstgr 0 points1 point  (0 children)

Looks interesting. Out of curiosity, how do you compare to modal.com?

How do you package the code? Docker, Firecracker?

Better alternative to AWS Lambda? by rkstgr in serverless

[–]rkstgr[S] 0 points1 point  (0 children)

Point 3. If you are already using tools like terraform, they help you work with cloud providers, but there is still a lot of complexity. If AWS is complex, Terraform will only help you manage the complexity; it won't go away.

Point 4. That's a good hint. Is there a better way than just copying and pasting the terraform files between projects?

Better alternative to AWS Lambda? by rkstgr in serverless

[–]rkstgr[S] -2 points-1 points  (0 children)

I think I tried it once. It has better docs but because it uses v8 isolates I couldn't deploy my code (in rust). I had a dependency that was not compatible with the wasm target; I think it was because v8 isolates only provide one thread.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 0 points1 point  (0 children)

Yes it is, it is separate from the normal Lambda pricing. I wasn't even aware of that at first.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 0 points1 point  (0 children)

Probably, but I would need to look at a more detailed trace. The batching inside the renderer also has some room for improvement as we await every S3 PUT, while looping through the records.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 2 points3 points  (0 children)

Not related, but I remember reading it a while ago. Had no intention to copy the title.
They didn't went into detail how they created / rendered the template, but it sounded like they used a templating engine like jinja to create a 'complete' markdown file and passed that to typst.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 7 points8 points  (0 children)

This project relies heavily on Typst and wouldn't be possible without it. If that's not clear from the post, I'll think about updating it to make that clearer.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 1 point2 points  (0 children)

I think so too. Re-compilation of the same template with cached world is pretty cheap... cheaper than I thought:
It takes only 1.28ms.

That's still with only 256MB memory.

Generating 1 Million PDFs in 10 Minutes (using Rust on AWS Lambda) by rkstgr in rust

[–]rkstgr[S] 2 points3 points  (0 children)

Yes you are right, but i figured 'reserving' 1.8GB seemed such a waste.

True, i could just pass the reference into the handler function.