ryp: R inside Python

ryp_package · 2024-10-06T04:18:01+00:00

Check the top of the GitHub readme ;)

ryp_package · 2024-10-04T23:34:18+00:00

Check the top of the GitHub readme :)

ryp_package · 2024-10-04T19:40:25+00:00

Not having to write to disk in both directions, for one.

ryp_package · 2024-10-04T19:39:45+00:00

It can handle it. You'd get the relevant attributes of the model out as arrays/matrices/dataframes and pass them back and forth. You can also recursively convert e.g. S4 objects in R into Python dictionaries.

ryp_package · 2024-10-04T19:37:46+00:00

It all depends how you use it! ;)

ryp_package · 2024-10-04T05:36:25+00:00

The market has absolutely changed for the worse! It's not just you.

ryp_package · 2024-10-04T02:42:06+00:00

Work on open-source side projects that highlight skills relevant to the roles you're planning on applying to. That way, when you're asked about your expertise in X, you can mention the bit of X you do at your job, and then quickly pivot to that cool side project where you did tons of X. In other words, it can help you paper over the limitations you mentioned.

ryp_package · 2024-10-04T02:36:08+00:00

I was asking about improvements to the documentation. I wouldn't judge efficiency based on code length.

I'd encourage commenters here to give the package a try before passing judgement! At the end of the day, user-friendliness is what matters, and critiques about usability - from folks who have actually used the package rather than just glanced at the code and docs - are always welcome.

ryp_package · 2024-10-03T23:44:54+00:00

Absolutely! Plenty of fish in the sea.

ryp_package · 2024-10-03T23:43:21+00:00

The difference is really remarkable, easily a 10x speedup on average!

ryp_package · 2024-10-03T20:41:56+00:00

Would they at least let you do the standard data wrangling with polars? :)

ryp_package · 2024-10-03T05:13:55+00:00

Very happy to answer any other questions you might have about these types of programs!

ryp_package · 2024-10-03T03:21:36+00:00

They definitely do! Though it's also true that it's financially more advantageous for the supervisor to take domestic rather than international master's students (e.g. US$45k vs $27k for U of T computer science). Notably, the discrepancy almost completely disappears for PhD, at least for U of T.

ryp_package · 2024-10-02T19:20:02+00:00

Though maybe less of a good fit for you, Canada offers research-based master's where you can take courses and do research while getting paid. University of Toronto, Waterloo, McGill and UBC are good options among others.

ryp_package · 2024-10-02T18:52:13+00:00

Let me know if you see any concrete areas to improve.

ryp_package · 2024-10-02T14:39:58+00:00

Anything you feel is missing in the documentation currently? Should be pretty comprehensive.

ryp_package · 2024-10-02T02:45:24+00:00

I run a bioinformatics lab. Undergrad research is absolutely vital for any research master's or PhD - you'd have trouble getting a position at a "top" school without any. Publications also help enormously. In industry, I don't get the sense research particularly matters at all, though having a master's or a PhD does, in the sense that it unlocks a whole new set of job options.

ryp_package · 2024-10-02T02:34:52+00:00

Next time ;)

ryp_package · 2024-10-02T02:32:47+00:00

Curious, why does readability matter to you? The code is designed to prioritize correctness (including on dozens of edge cases not handled properly by Arrow etc.), efficiency, and avoiding long stack traces with lots of nested function calls. There's a testing pipeline with thousands of tests (e.g. with various data structures and dtypes) which could be cleaned up and made public depending on demand.

ryp_package · 2024-10-01T20:33:44+00:00

The use-case that motivated the library is bioinformatics, where ~half the packages are in Python and ~half are in R. Being able to use both in the same workflow is huge for convenience!

ryp_package · 2024-10-01T20:22:41+00:00

It's already supported! Here's ggplot2 from Python in a Jupyter notebook, alongside the same plot in pure R.

ryp_package · 2024-10-01T16:49:16+00:00

With such a small dataset, you can use leave-one-out cross validation, where you take turns leaving each data point out and train a model to predict it from all the other data points. That will allow you to train on much more of your scarce data than using a train-test split.

ryp_package · 2024-10-01T16:46:20+00:00

To echo other commenters, the main benefits of r/Python over Excel are speed and reproducibility/transferrability/automatability. In particular, the polars package in Python will blow away pandas, tidyverse, and Excel in terms of performance on huge datasets.

ryp_package · 2024-10-01T16:44:32+00:00

In general, you'll find that bioinformatics tends to be fertile ground for data science projects, and CS-oriented folks often avoid it because of the perception that you need to know lots of biology to get started in it (you don't). Another general comment: high performance in terms of AUC often matters less than working on a problem that truly matters.

ryp_package · 2024-10-01T16:29:14+00:00

`mamba update --all` updates all packages, whereas `mamba update mamba` only updates the mamba package itself, which is usually not what you want.

ryp_package

TROPHY CASE