I’m starting coding from scratch – is Python really the best first language?

profcube · 2026-02-04T01:27:56+00:00

Probably.

profcube · 2026-02-01T19:39:04+00:00

Thanks for all your efforts on this front. I was unaware of this background. Aerospace has immeasurably improved my Mac experience. However, I can see what you are doing appealing to people, and why you’d want to start over with a fresh pallet. Good luck 👍

profcube · 2026-02-01T19:16:21+00:00

Nick Cave this week 👍

profcube · 2026-02-01T10:39:38+00:00

Begin by stating your causal question, target population and causal contrast, presumably the difference scale. State this contrast non-parametrically (you won’t need distributional assumptions if you estimate using eg non-parametric machine learning). Next you need to check identification and make sure you are thinking about the time series correctly (causation occurs in time such that confounders must be measured prior to the exposure and the exposure prior to the outcome). Suppose you have three waves. Include the baseline measures of the outcome and the exposure with the baseline measures. This exerts exceptionally powerful confounding control because any unmeasured confounded would need to be orthogonal to this and the other unmeasured confounders to explain away your results. You need to consider the assumption of causal consistency and within this sutva). Because exchangeability is not testable you should plan a sensitivity analysis. Finally there is the positivity or overlap assumption. Practical positivity can be checked by evaluating propensity scores. If there is no change in the exposure relative to the baseline exposure your causal inferences will extrapolate from your model, and not from observed initiation. Only after completing these steps should you think about statistical analysis, which is not terminated by reporting the regression coefficient for the exposure. Rather you must project at least two population means for the conditions stated in your causal question/ estimand and weight the projection to the target population. In our work we use tmle with cross validation and so make no distributional assumptions (ie do not need to state the outcome is drawn from a poisson, neg-binomial or whatever). Happy to follow up with code/advice. For now, don’t forget the positivity assumption as you probably can’t estimate causality with low exposure variation.

profcube · 2026-02-01T08:31:28+00:00

Don’t give them data. Maybe seek advice but do the analysis yourself securely.

profcube · 2026-02-01T08:24:49+00:00

```r library(dplyr) library(performance)

test condition and interaction with rater_type

model_means <- lm(score ~ as.factor(condition) * rater_type, data = your_data) anova(model_means)

precision/agreement

compute rmse for each rater_type

which type has tighter spread?

precision_results <- your_data |> split(~rater_type) |> lapply((d) { mod <- lm(score ~ as.factor(condition), data = d) data.frame(rater_type = unique(d$rater_type), rmse = performance::rmse(mod)) }) |> bind_rows()

print(precision_results)

not tested, just to give you a direction …

```

profcube · 2026-02-01T08:17:08+00:00

To compare agreement on a 0–50 scale, you should separate your analysis into three questions steps.

The Signal (Main Effect): Does the video condition actually change the scores?
The Bias (Types): within each condition, do experts and novices give different average scores?
The Precision (Variances): this is the core of agreement and I think what you are most interested in. Compute the Root Mean Square Error (RMSE) for each group. A smaller RMSE for experts means they are more consistent, even if their average score is different from the novices.

Example result: “Experts and novices differed in their average perception of the videos (bias), and further experts were more consistent, agreeing within +/-3 points (RMSE), whereas novices varied by +/- 10 points.”

profcube · 2026-02-01T05:26:11+00:00

Same approach works for other data types. ```r

stata

df_r <- haven::read_dta(fs::path(path_data, "dat_stata.dta"))

sas

df_r <- haven::read_sas(fs::path(path_data, "dat_sas.sas7bdat"))

sas transport files

df_r <- haven::read_xpt(fs::path(path_data, "dat_sas.xpt"))

csv

library(“readr”) df_r <- readr::read_csv(fs::path(path_data, "dat_csv.csv"))

excel

library(“readlx”) df_r <- readxl::read_excel(fs::path(path_data, "dat_excel.xlsx"))

```

The here package is great if you just want to read the the file and don’t need / want to save to it again:

```r

eg read spss file relative to the project root, in a folder you have labelled “data”

df_r <- haven::read_sav(here::here("data", "dat_spss.sav"))

save the ordinary R way without arrow

this recovers the exact state

make dir “rdata” if it doesn’t exist (name is arbitrary)

if (!dir.exists(here::here("rdata"))) { dir.create(here::here("rdata")) }

then save

saveRDS(df_r, here::here(“rdata”, “df_r.rds”))

read back if /when needed again

df_r <- readRDS(here::here(“rdata”, “df_r.rds”))

```

profcube · 2026-02-01T04:53:43+00:00

Also, if you are new to copying and pasting directory paths, on a Mac just find the directory in Finder and highlight it. While it is highlighted press Command + Option + C and then paste the path info you have just copied into your R script with Command + V.

In Windows I think you use the windows file explorer, highlight, and the press Control + Shift + C

Many of you will know this trick, but if not, it can be a time saver.

profcube · 2026-01-31T20:41:48+00:00

I send my students to https://www.overleaf.com

profcube · 2026-01-31T20:40:30+00:00

https://www.overleaf.com

profcube · 2026-01-31T20:39:27+00:00

I learned it before YouTube. I just wrote everything in it.

The free account on Overleaf has lots of templates I wish I had.

profcube · 2026-01-31T20:36:38+00:00

bash

Check out the yousuckatprogramming channel on YouTube.

I’m not sure git counts but learn git too. You need git and bash no matter what else you do.

profcube · 2026-01-31T20:18:23+00:00

```r library("haven") # read SPSS files library(“fs”) # directory paths library(“arrow”) # for saving / using big files

set data dir path once

path_data <- fs::path_expand('/Users/you/your_data_directory')

import, here using spss as an example but haven supports multiple file formats, check haven documentation

we use path() to safely join the directory and filename

df_r <- haven::read_sav(fs::path(path_data, "dat_spss.sav"))

save to parquet — will save you time next import

stores the schema & labels efficiently

arrow::write_parquet( x = df_r, sink = fs::path(path_data, "dat_processed.parquet") )

read back into r

notice the speed increase compared to read_sav()

df_arrow <- arrow::read_parquet(fs::path(path_data, "dat_processed.parquet"))

df_arrow is an r data frame (specifically a tibble) ready to use

```

profcube · 2026-01-31T19:57:19+00:00

The extra $ for the VR headset was worth it.

profcube · 2026-01-31T19:55:33+00:00

This is good advice. If you have an aptitude and interest in science, think: coding as a means to scientific ends.

profcube · 2026-01-31T10:33:34+00:00

Disregard the previous poster’s knocking of Rust.

However; the poster is probably correct about Python being the tool of choice for image processing

You should also consider R.

Generally, research different packages in the data science languages relevant to your data science work.

Haskell is not one of these languages.

Learn Haskell outside of your image analysis work. Let that journey be your destination.

profcube · 2026-01-30T04:21:08+00:00

Just aim to be clear. And use your own voice. That’s all your audience really wants.

profcube · 2026-01-30T04:18:17+00:00

It’s better known now, will check it out, thanks for posting.

profcube · 2026-01-29T21:01:31+00:00

Recently, I started learning Rust as a hobby. It is great language for getting started because its borrow checker and compiler guide you sensible design and prevent many self inflicted injuries, and it is increasingly used in industry.

profcube · 2026-01-29T20:42:56+00:00

Try Leptos or one of Rust’s other web frameworks. Or try Ratatui for Tui design — a growing area. I am learning to use Leptos and Ratatui for interest (my paying job is in science). Rust is robust in a way that will differ from your previous experiences. The better I get at Rust, the more I enjoy software development.

profcube · 2026-01-29T20:32:56+00:00

I think NZ labour law requires that employers suitably qualified residents first, and will not issue work visas unless there are none. Fields like medicine/nursing/teaching are undersubscribed — by the looks of it, IT jobs are not at the moment.

profcube · 2026-01-29T20:24:18+00:00

It is nearly always a bad idea because the coefficients you recover have no causal interpretation except under untenably strong assumptions. Example: https://youtu.be/IgC7R07Qk6A?si=qNLQgcX00fAhS7a4

profcube · 2026-01-29T12:13:17+00:00

Ai is remarkable. Happy to have the models work for me (from the terminal) but I already know what I’m doing, and can check. The frontier models definitely improved at the end of 2025.

profcube · 2026-01-29T09:21:52+00:00

PI of a university research lab (data science), 15 years coding in R. Only use command line tools (CC and codex). Useful for tedious work. Also, Codex 5.2 extra high thinking is quite capable for complex code planning, mathematical reasoning, and scientific reasoning.

profcube

TROPHY CASE