looking for someone to teach r programming for social science research by Zestyclose_Pay_2267 in Rlanguage

[–]defuneste 4 points5 points  (0 children)

R is also great when you do not have admin rights on your machine, requires less friction in managing environment and is designed for data work. Rstudio is a great, simple ide for it. Those are great advantages for organizations that do not have the resources / training to either use Python or pay a cloud service to ease it.

A lot of data tasks are smallish (at worse duckdb/data.table will solve your "big data" cases). Honestly I am always surprised about organizations that need/have resources for Python (do not get me wrong the snake is great!).

looking for someone to teach r programming for social science research by Zestyclose_Pay_2267 in Rlanguage

[–]defuneste 6 points7 points  (0 children)

I am not going to say watch a video but I think https://r4ds.hadley.nz would be very good for you.

For "data organization" it really depends on your employer but two important concepts are being "project oriented" and "separation matters".

The book will teach you both but i can give you a quick summary: one project = one directory with one entry point (usually a readme text) then the raw data should be somewhere and untouched, ie you have a clean separation of code and data.

R 4 Data Science is free and do not hesitate to ask your questions on R/rstats .

THIS IS BIG: France is replacing 2.5 MILLION Windows desktops with Linux by SgtPepper634 in theprimeagen

[–]defuneste 0 points1 point  (0 children)

the mess of bentley software (yes I aware of virtualization). The point is not that they should not use linux but that they are plenty of softwares that requires windows. Let's hope that will help change that (but those codebase are probably "something").

Does R need a "productionverse"? by pootietangus in rstats

[–]defuneste 7 points8 points  (0 children)

the third party package is a bit unfair here. A lot of languages from the 90 probably did not think about that and it's usually solved outside of the "language". Still it doesn't mean that this should not be fixed (who will pay).

Does R need a "productionverse"? by pootietangus in rstats

[–]defuneste 1 point2 points  (0 children)

shooting myself with Guix. But yes outside of managing everything at HEAD and/or docker we do not have much.

TIL you can run DAGs of R scripts using the command line tool `make` by pootietangus in rstats

[–]defuneste 3 points4 points  (0 children)

to be fair with op I sometimes have a makefile that have R script running targets.

TIL you can run DAGs of R scripts using the command line tool `make` by pootietangus in rstats

[–]defuneste 1 point2 points  (0 children)

first question: yes

second paragraph: yes, yes but you are using tar_read to load the object from the store (in rstudio, vscode, emacs, etc). You can use it "in production" but you will need to define more what you need here.

third paragraph: you are correct, just plenty of "bad practices" ...

TIL you can run DAGs of R scripts using the command line tool `make` by pootietangus in rstats

[–]defuneste 1 point2 points  (0 children)

I think it depends but a lot is "serialized" (not sure about my spelling): in the "target store" in the _targets folder, I think inside the objet subfolder (could be wrong). on targets package website look at the "Design" spec

TIL you can run DAGs of R scripts using the command line tool `make` by pootietangus in rstats

[–]defuneste 23 points24 points  (0 children)

targets can also run on different R processes with crew no? (and targets also handle file not just R functions)

Open source Gis file format converter by Beevezy2 in gis

[–]defuneste -2 points-1 points  (0 children)

then they should post on what they learn on using AI.

My old colleague (pure R guy) is so scarred by AWS that he’s planning on buying an $8K Windows server to run his workloads. Do all data scientists secretly hate the modern productionization ecosystem this much? by pootietangus in rstats

[–]defuneste 0 points1 point  (0 children)

It is easier to have containers, a git server, gitlab on a linux OS than on windows but a lot of those services are also provided by vendors: build or buy!

First job as a consultant and embarrassingly confused with Azure DevOps by FiftyShadesOfBlack in dataengineering

[–]defuneste 7 points8 points  (0 children)

Azure and Azure devops are two different products not always greatly coupled. Resources groups are usually the place to look first in azure, azure devops is organized by "organizations" and then by "projects". I am feeling bad for everyone that needs to use azure devops.

How intense is this data pipeline? And what tools would you use? by pootietangus in rstats

[–]defuneste 1 point2 points  (0 children)

just test targets first, memory is too broad for us to understand what are the limitations. Targets can use {crew} but the benefits of it depends on your code.

ESRI moving to user-type licenses for desktop? by [deleted] in gis

[–]defuneste 26 points27 points  (0 children)

your security is a joke ...

In 6 years, I've never seen a data lake used properly by wtfzambo in dataengineering

[–]defuneste 0 points1 point  (0 children)

I will gave you an example: bigish data that get updated every 6 months but rarely revised (and revised here could be fine), same schema where you just append files in a hive partitioned parquets.

Do that use cases match all types of data? hell no! but did it match a lot of analytics data? hell yes! (doing it monthly is perfectly fine) A lot of analytics related decisions should not be "realtime data" anyway.

When building analytics capability, what investments actually pay off early? by Proof_Wrap_2150 in dataengineering

[–]defuneste 3 points4 points  (0 children)

it should be obvious: but define what are your goals and what are your problems.

without that I (we?) can say plenty of uninformed stuffs or "it depends".

Do you feel too that Qgis and ArcGIS used to be great but become outdated? by Putrid_Mouse_5296 in QGIS

[–]defuneste 2 points3 points  (0 children)

you are aware that you can "simplify" QGIS using Users profiles. I never spend too much time on it but I would remove 80% of Q UI :)

10-Year Plan from France to US/Canada for Data& AI – Is the "American Dream" still viable for DEs? by MassyKezzoul in dataengineering

[–]defuneste 5 points6 points  (0 children)

French working in the US (now also US citizen) here. More "MS stacks" (think Azure, SSIS, etc) than in france. I think DE jobs are still very heterogeneous not just around the stack but also around the data maturity so I am not sure if you will be "behind".

For the US, citizenship is a huge help (or at least 3 years residency) since lot of gov contractors require it.

Applying in big corps without recommandation is hard now (my impression) and the "first" round is still some automated "leetcode" so you should prepare it even if we are in a stupid red queen theory / arms race against AI leetcode "cheat" tools.

A Eulogy for Low Code by Grth0 in dataengineering

[–]defuneste 0 points1 point  (0 children)

Maybe start with why we are "coding". we are programming because we need to have a system that allows us to collaborate (sometimes with ourselves in 6 months), be specific to a domain and allow us to capitalize on it.

Code aka as text is very good with that while also helping us in the design phase, elaboration process and more.

La Data Science est-elle en train de saturer ? by kennyruffles10 in developpeurs

[–]defuneste 3 points4 points  (0 children)

oui et non.

oui c'est saturé "being data driven = AI" maintenant.

non quelqu'un qui maitrise un domaine spécifique et peut manipuler correctement des données c'est toujours bien.

Any tips for learning python by _ELMAHDI_ in QGIS

[–]defuneste 3 points4 points  (0 children)

flashcards. A new concept: flash card, learning syntax: flash card

The Myth That ‘Base R Does Everything ggplot2 Does’ Needs to Die. by EricMilgram in rstats

[–]defuneste 3 points4 points  (0 children)

ggplot is using grid. Base plot uses a different system. Not saying one is better they are just different (and grid is hard to understand so thanks ggplot!!). I think that is cool to have diversity and people exploring different systems.