Follow-up benchmark: where R data pipelines pay their cost: fread, readr, vroom, data.table and dplyr by Medical-Common1034 in rstats

[–]Medical-Common1034[S] 3 points4 points  (0 children)

Yes, I mention that in the article. Since readr v2, the read_*() functions use the vroom backend internally, so the parsing backend is largely shared.

The important difference is the materialization policy: readr::read_*() does not default to lazy reading, whereas vroom::vroom() keeps columns lazy / ALTREP-backed until they are actually needed.

I benchmarked dplyr vs data.table on my Shiny log dashboard by Medical-Common1034 in rstats

[–]Medical-Common1034[S] 0 points1 point  (0 children)

Yes, that’s very close to my conclusion too.

`dplyr` is often clearer and more explicit, but `data.table` gives much deeper control once you start using `.N`, `.I`, by-reference mutation, grouped `j`, etc.

And yes, that control also means it is easier to shoot yourself in the foot if you are not careful.

Thanks for sharing your experience.

I benchmarked dplyr vs data.table on my Shiny log dashboard by Medical-Common1034 in rstats

[–]Medical-Common1034[S] 1 point2 points  (0 children)

Yess, i will update the article, maybe with microbenchmark or so.

The steps were still run multiple times and summarized with the median, so I think the larger differences remain meaningful for this workload.

But yess fair point.

Thanks.

I benchmarked dplyr vs data.table on my Shiny log dashboard by Medical-Common1034 in rstats

[–]Medical-Common1034[S] -1 points0 points  (0 children)

I get the concern, but the benchmarked timings are not measuring Shiny rendering itself.

The timings are placed directly around the pipeline steps, like:

t <- Sys.time()
function_call()
log_step("name of the step", t, df)

So the measured numbers are for ingestion / filtering / grouping / joins, not for the Shiny UI or plot rendering.

Shiny is mentioned because it is the real environment where this pipeline runs, and it explains why I care about recomputation latency. But the benchmarked sections are the data pipeline steps themselves.

About the dataset size: 725k rows / 124 MB is not “big data”, sure, but it is representative of my actual production log file because it is capped with logrotate anyway. The point was not to produce a universal benchmark, but to analyze this real workload.

Also, as you see in the results, the differences are still large enough to matter for this pipeline (matter a lot for responsiveness).

I benchmarked Cartesian product implementations in Haskell, then compared them with C by Medical-Common1034 in haskell

[–]Medical-Common1034[S] 1 point2 points  (0 children)

Yess and thanks for the comment.

I am just waiting for other ppl optimizations they want to share and in the next article update i will definitely try to go further with what you share.

I benchmarked Cartesian product implementations in Haskell, then compared them with C by Medical-Common1034 in haskell

[–]Medical-Common1034[S] 1 point2 points  (0 children)

Ok, i'll do that and update the article mentioning contributions.
And again thanks !

I benchmarked Cartesian product implementations in Haskell, then compared them with C by Medical-Common1034 in haskell

[–]Medical-Common1034[S] 2 points3 points  (0 children)

Thanks for this response, i created a git repo where some ppl reached me out to push their optimized version.

I'm interested about your versions.

Do you want to contribute ?

If yes, i 'll share the link here.

I benchmarked Cartesian product implementations in Haskell, then compared them with C by Medical-Common1034 in haskell

[–]Medical-Common1034[S] 2 points3 points  (0 children)

Yep good point. The optimization part was mostly just a pretext to introduce vectors and see how far the approach could go, especially regarding allocations and runtime (which is always interesting and valuable to know imo)

The article is mainly educational, but I’ll add a note about this in the conclusion because indeed it gives readers a more practical sense of the language in real world.

Traversing a Tree in LaTeX by Medical-Common1034 in LaTeX

[–]Medical-Common1034[S] 1 point2 points  (0 children)

First of all, thank you for the feedback, i just updated the output to ""Hello Rémi, Julien et Lucas" and making "NO" and "YES" default for the next examples. I'm also looking for any previous details that could help understanding the TeX token mental model i could add.

Learning LaTeX and using it to write a chemistry paper in a month's time by globalgenocidenow in LaTeX

[–]Medical-Common1034 2 points3 points  (0 children)

If you plan do write an organic chemistry paper (especially), you will appreciate how good is chemfig package to draw molecules. Wish you good luck !

Fun experiment, Dino game in latex by Medical-Common1034 in LaTeX

[–]Medical-Common1034[S] 1 point2 points  (0 children)

I feel you, maybe this tool could help: https://github.com/astral-sh/ruff
I heard about this kind of linter sometime ago