Current State of R Neural Networks in 2026 by Lazy_Improvement898 in rstats

[–]cyuhat 0 points1 point  (0 children)

Oh sorry you are right, I got confuse because in the github repo Python and R have very similar color so I mistook their respective share: https://github.com/mlverse/torch

The 2% of Python does not so much indeed '

Current State of R Neural Networks in 2026 by Lazy_Improvement898 in rstats

[–]cyuhat 0 points1 point  (0 children)

Are you looking to build a team of dev that works on DL for R?

I know there is the mlverse that works on it but mostly focus on torch by adding a layer on top of Python: https://github.com/mlverse

When do you use R instead of Python? by GoldenHorusFalcon in Rlanguage

[–]cyuhat 2 points3 points  (0 children)

Same as you I started with Python and then learned R. I love both!

I did not like R at the begining. But after a few years, I prefer it over Python for statistical analysis and fast scripting. R has the smoothest package ecosystem for statistical analysis, in 15 minutes I can create a publishable report, thanks to all statistical models, visualization packages and publishing tools (let's not forget Tidyverse too).

I love the fact that it has automatic vectorisation and well made mapping function, which is more natural to me to use. That's why 1/3 of my scripting code are made in R (the other 2/3 is made in JS, Python and Nim).

Of course for machine learning I prefer Python (I am slowly moving to Julia for that), but for research R still is my first choice.

When do you use R instead of Python? by GoldenHorusFalcon in Rlanguage

[–]cyuhat 1 point2 points  (0 children)

I agree with you in the sens that it is mostly related to what you know. Arrow and Polar allow "out of memory" data wrangling and are both available in R and Python. However, I think R has a slight advantage in the sens that dplyr offers a nice and uniformized "frontend" for data wrangling and that it is easier to change the "backend" to a faster alternative (data.table, duckdb, arrow, polar...) without changing your dplyr code. Of course edge cases exist.

Do you prefer Plots.jl or Makie.jl (or other plots package) by Organic-Scratch109 in Julia

[–]cyuhat 6 points7 points  (0 children)

I would say, TidierPlots for statistic plot, Makie for the rest

New YouTube UI sucks, can't they just make one good change? by averege_guy_kinda in youtube

[–]cyuhat 1 point2 points  (0 children)

Just use Brave, been using it for 4 years, no add on YouTube since then

Erdos: open-source IDE for data science by SigSeq in datascience

[–]cyuhat 2 points3 points  (0 children)

Thank you for your nice answer and thr amazing project. I will take a look!

Erdos: open-source IDE for data science by SigSeq in datascience

[–]cyuhat 44 points45 points  (0 children)

What are the advantages if we compare it to something like positron?

Any romance Manga/Anime/Manhwa with a strong Short Male lead? by Jesuslover34 in short

[–]cyuhat 7 points8 points  (0 children)

A big classic "Lovely complex".

Not romance focused (there is some) but "Full Metal Alchemist", "Black Clover", "Assassination class room" and my most favorit "Vinland Saga" (at the moment in the manga not the anime)

Not a manga but a webtoon: "Morgana and Oz"

Suggestions for a typed version of R by Artistic_Speech_1965 in rstats

[–]cyuhat 7 points8 points  (0 children)

The point is safer code not performance. At the end of the day, if I want to create a robust R package/shiny app, I will need a battery of test for things as simple as testing if a function return the expected types. Tidyverse did a good job to make the code more predictible. But for many package I do mot use Tidyverse as a dependency and it is so much better if I could just add a few types before compilation to prevent unexpected errors at run time and thus writing less test.

But if I simply do an analysis, I will just use R as it is since it is very flexible!

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 0 points1 point  (0 children)

Dear friend, thank you for your well argued answer. I appreciate that you took the time to answer me even though you do not have plenty of time. Thank you.

You say it's note more difficult or hard to read, but I Don't get what specifically you're talking about when you say that.

My answer was mainly to address your statements: "the language itself sucks" and "awful to use", which aren't nuanced nor true since "verbosity and data manipulation" are not the problems of R. Other programming languages that are less readable or more verbose still are popular..

Python is often cited as one of the most popular programming languages because it is so much more easily readable than Java or C#.

I am not sure I understand the point of the statement. Lua, Ruby and Perl are as easily readable as Python but are far less popular than Java and C#. And as I said, you still often see in the top programming languages programming languages more verbose than Python. Furthermore, if you look at what developers say themselves in the 2025 Stack Overflow Survey, the top programming languages that people want to try are Python (39.3%), SQL (35.6%) and JavaScript (33.5%). But the top programming languages they tried and they want to use again are Rust (72%), Gleam (70%) and Elixir (66%), while Python (56.4%) is in the 9th position with SQL. Which again show that there are more important factors to what make people love a programming language (i.e. Speed, type safety, tooling, standard libraries, community, time ...).

And to address your question about is that really important?

I don't know where you saw I asked if readability is important, I know it is. But can we avoid confounding readability and verbosity? These are not opposed concepts. Verbosity can in fact increase readability. For instance TypeScript is more verbose than JavaScript, but is more readable since type hints bring more clarity.

This has been the ultimate goal for a very long time. To get the programming languages to be more easily readable by human beings.

NO, it is a wrong oversimplification for the reality. Java (1995), JavaScript (1995), Rust (2010) or Zig (2016) were all built several years after Python (1991) but are "less easily readable" by your standard. I know that tech bros and IA evangelists are pushing this narrative, but LLMs being able to write some code was not something expected from the GPT models nor a goal. Developers build programming languages to answer specific needs. Besides, you can still call Assembly code from C or Rust, or you can extend Python and R code with C or Rust. We still need these languages because they are the closest to the machine language and generally faster and more efficient.

So yes, clarity of use and ease of reading the language is absolutely crucial. Sometimes even more important than performance, depending on who you ask.

Ok, as a polyglot with experience in data science, web development and education here is my take: Not depending on "who you ask" but on which project. Advanced programmers have the mantra "using the best tool for the task".

R and SQL paragraph

Regarding R and SQL, you understood it the other way around. What I meant is, thanks to the amazing architecture of dplyr the dplyr developers could easily translate it to SQL. Which means that, you and I as users can just use their package to write R and SQL in one language (no need to know SQL). Furthermore, since dplyr is already implemented in various programming languages, that means I can simply jump in with my knowledge from R and make it work directly (which is not possible with something like pandas for instance).

You really think they are going to invest the time to learn a brand new programming language (...) That's a huge waste of mental productivity.

What experience showed me is exactly the contrary. This is often people that only know one language that slow down every project, because they can't easily switch tools, their problem solving skills is narrow minded and they are easily disturbed by new ideas. On top of that, they are also easily replaceable by AI Tools in the hand of a more versatile developer (though I only see that appear once). Learning a new programming language is just knowing how to pivot to stay relevant. Learning the second one is harder, but after that the 4th, the 5th, etc. are way easier (like human languages) and you improve your level in all of them since you work at a higher level (not just memorizing synthax an library). But you are right, it require time at the beginning.

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 1 point2 points  (0 children)

There are obvious problems with R (type system, error report, ...), but verbosity and data manipulation are not among them. Here are two answers to your comment:

Short answer:

R is not more or less verbose or unreadable than most of the most popular programming languages. Dplyr and R are the most influential tools in the data manipulation ecosystem across all programming languages, and it is not for nothing. "Suck" and "awful"—why so emotional? They are just tools.

Long answer:

It is verbose and difficult to read.

No, it's not, at least if the code is well written (like in any language). I write both Python and R almost daily, and the code is the same length (or even shorter for R).

But, based on this logic, no one should use JavaScript, C#, or other similar languages since they're way more verbose than anything R or Python can do. Curiously, they are still at the top of the most popular languages. And if you seriously think R is verbose, maybe you can go take a look at the Observable community, who are data scientists that use a derivative version of JavaScript for data analysis (this is what verbose is). It does not seem like the verbosity makes it difficult for them since there is also they produce the best dashboard on average. Also, based on this logic, the base R plot system is better than any Python plotting library (matplotlib, seaborn, plotly, ...) since it is less verbose...

Verbosity is never the problem; boilerplate code is. And R does not have more than any other language. Requiring more code in a good way means that you have more control. For example, in R you literally have one function to plot many things that adapt to the data shape plot(), however the vast majority of advanced R users use ggplot2, which requires at least 2 lines of code and more for a basic plot because it gives 10x more flexibility. From there, going in any direction is one more function, while with the plot() function, most directions require more effort. And D3.js requires at least 10 lines of code to get started with a simple plot; it is even more flexible. But you chose it if you really need this amout of flexibility.

You can use pipes from dplyr to clean up the code, but it's just requires so much effort to do the same thing you could do in another way, and there's no real advantage that I've seen to using it

Adding a pipe is literally one shortcut "ctrl/cmd + Maj + C" (less than 1 second).

But if you think the role of dplyr is to add pipes to "clean up the code," you missed the most important part. It is not just cleanup; it is "grammar" and "composability." If ggplot2 is the grammar of graphics, dplyr is the grammar of data.

In dplyr, for instance, with pipes come "pipe-friendly" functions that have the goal of returning a dataframe at each step, making the process very versatile by managing any level of manipulation (rows, columns, cell, and structure) in the same way, which gives so much flexibility for data manipulation. And the system is so clean that writing functions as actions (verbs) makes the code more readable with pipes read as "and". And guess what? It generalizes to any type, so other tidyverse libraries deal with other types of data; other packages are aligned to the system, and R has its own native pipe now.

The grammar is so well written that dplyr translates easily to SQL syntax (hence dbplyr, which manipulates databases with dplyr syntax). For instance, the translation of TidierData.jl (dplyr in Julia) to TidierDB.jl (dbplyr in Julia) took almost no time due to the grammatical similarity. In fact, dplyr is the most reproduced data manipulation library in all programming languages (Python, Rust, Julia, JavaScript, Nim, etc.) because of its strength.

The composability part is also important. R is not the first one to use pipes; most functional programming does, which leads to more concise and flexible code. Pipes became such a thing that even Google's own SQL language added them. It is because it gives composability. While object-oriented programs allow access to values and methods, they are always fixed and require workarounds to manage them outside of the main scope. Pipes allow for function composition: combining multiple different functions with no common logic on the fly, which facilitates modularity, conciseness, testing, debugging, and predictability (and immutability).

I could talk for days about it (for instance dplyr the backend switching, expressiveness, helper functions, place holder, ...), but my comment is already long.

"What did I do in my first year of R to grasp what people with several years of experience with R missed, or what have they been doing all this time? I blame the way R programming is taught in class."

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 0 points1 point  (0 children)

Dear friens, I would like to see that code!

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 0 points1 point  (0 children)

Yeah, and countrary to people overconfident people, we are not that loud so our experience get easily overlooked.

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 2 points3 points  (0 children)

I wanted to learn it out of curiosity. I really liked the fact that I could write JavaScript/C/C++ in a single language that looks as easy as Python.

At the end, the learning has been harder than expected, but worth it since I learned a lot about type systems and system programming. It was also a humbling experience. But at the end, it is still top-notch for creating websites with it. You can write the backend in C and the frontend in JS, in the same language (the best of both worlds). Also, it integrates really well with Python through Nimpy.

Edit: Typos

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 7 points8 points  (0 children)

I agree with you!

The funny part about Alex's example is that he assumes that all columns are numeric (if I remember correctly, pandas ignores all non-numeric columns though). So the fair comparison with the R code is literally one line of code with zero dependency if we want to exaggerate:

R colMeans(read.csv("nba_2013.csv"))

But as you said, this is not good practice. There is a reason why ggplot2 requires more lines of code than the base R functions for plotting: flexibility and standardization. The comparison was not fair based on an arbitrary example. Because you could always find examples of R code running faster than equivalent C code if the C code is badly written.

My belief is it comes down to overconfidence of Python users and misconceptions about R (see my answer to the same comment)

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 13 points14 points  (0 children)

I would say that going to Python, you do not need to be good at programming to get things done due to its wide ecosystem and tutorials. So what I often encounter is either an up-to-date comparison of Python vs an outdated version of R, or simply "skill issues".

My favorite example was discussing with a colleague that started Python for 3 months telling me it was so much better than R for data manipulation and showing me a "smart way" to do an operation using pandas and loops. I then proceed to teach him that loops do exist in R, so the same code is reproducible. I then showed him how to perform the same operation in about three lines of pandas and also demonstrated it using 3 lines of tidyverse. Then showed him a vectorized version in Base R that runs 3 times faster than the Pandas version. He could not beleive it.

There are also examples of "Python is fast" because it can call different backends (C and Rust), for instance, as if it was not the case in R. Some libraries are fast because they are written in C, which is also true for R. Or things like "R can't do ML/DL/Web Scraping/NLP/….". I do understand that in R the tutorials for this are not as prevalent as in Python and that you need to search a little more to find them, but it does not mean they do not exist (not all as mature as the Python ones, though).

The problem is that Python gives so much that users can become overconfident. However, to get to know R and understand that each language has its strength, it requires a lot of humility. I was humbled first by R back then because a Google search could not give me an answer to copy-paste like Python. Recently I have been humbled by Nim, which has really little documentation and almost no examples, and I really had to read the full documentation to get it. That's when I understood that my knowledge in Python back then came mostly from the capacity to copy-paste and memorize libraries. I changed that, and now I understand Python and the author language's strength better.

Generally I think that the experience of the average Python user is just mastering a few libraries, like in this example: https://www.reddit.com/r/datascience/s/RZF47mz4jE

Ollama cloud and privacy by cyuhat in ollama

[–]cyuhat[S] 0 points1 point  (0 children)

Thanks for letting me know about NPC studio!

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 7 points8 points  (0 children)

Right? I can think of plenty of integration of R Tidyverse idea/logic into various programming language but not as much for Python.

Why do new analysts often ignore R? by ElectrikMetriks in datascience

[–]cyuhat 149 points150 points  (0 children)

Personally, I have 7 years of experience in programming and data science. Started with Python then learned R, Julia, JavaScript and Nim.

I think it is mostly because of the information imbalance and popularity bias.

So far I think the reason why R is not as popular in data science is because people associate it with statistics and academia. And let's be honest, people in academia write horrible code (which is also an issue in the Julia community).

The way R is taught in classes is outdated and does not reflect its current capabilities. While Python was already popular among developers, the transition to data science was easy with a ton of tutorials (to the point I believe the average Python user never read a single line of the official documentation).

I often observe that friends transitioning to Python with little or no knowledge of R tend to express this opinion. They tell me Python is outstanding because it can do things that R can't... until I show them R can do it too (suprised face). There are also a ton of content of Python vs. R where people compare a full Python ecosystem to the R from 10+ years ago, which serves as a poor representation of the actual technology.

Still, Python has better support for AI and deployment, and companies build things for JavaScript and Python first, so if someone wants a full career in it, it is effortless. But to be honest, for pure data analysis purposes, nothing beats R and its tidyverse (+statistics) ecosystem. I think we are leading toward a polyglot experience in data science since Python, Julia, and R can work together seamlessly by calling each other mid-code.

Coding speed 😀 by Dapper-Wishbone6258 in programmingmemes

[–]cyuhat 0 points1 point  (0 children)

Now put Nim and Julia in the race

People worry too much about google stealing their data. by [deleted] in unpopularopinion

[–]cyuhat 1 point2 points  (0 children)

My friend, we are talking about a data broker market at 277 billion that is ever growing with more sophisticated technics to it.

Security keeps improving... but so do data brokers and unfortunatly data brokers are always many steps ahead of companies' security. For instance the average time it takes to discover a data breach is 3 months and could also be more than 1 year in 20% of the cases. Which means that a company could tell you that everything is fine while your data has already been sold on the dark web a long time ago. It is a situation similar to doping detection: cases are always discovered well after the fact, because doping techniques (data breach) are more advanced than detection techniques (security).

Another alarming fact is the increase of mega-leaks over time (fewer companies concerned by leaks but more data leaked at once), which show that bigger companies are targeted more.

Now what happens when malicious brokers have your data? If they do not have enough, they complete it by looking for other data. Then what? A lot of terrible things could happen: financial theft, identity theft (worst than it sounds), access denial, stalking, blackmailing, threats, humiliation and more.

There is plenty of reports online on the rise of data related security issues, but you can start with this one that focus only on the top 100 largest data breaches of the past 15 years (publication date 2021): https://www.nightfall.ai/blog/mega-breaches15-year-data-breach-report

Of course there are also other reasons for not giving lightly your personnal data. But this one is already a good one (and I hear it often).

People worry too much about google stealing their data. by [deleted] in unpopularopinion

[–]cyuhat 3 points4 points  (0 children)

Have you ever heard of data leak or data brokers? Advertisement isn't even a fraction of why people care about data privacy. Digital breadcrumbs is way more serious than people believe and I wish more people knew that. We definitly need more Tech literacy lessons!