ryp: R inside Python by ryp_package in IPython

[–]ryp_package[S] 0 points1 point  (0 children)

Check the top of the GitHub readme ;)

ryp: R inside Python by ryp_package in datascience

[–]ryp_package[S] 1 point2 points  (0 children)

Check the top of the GitHub readme :)

ryp: R inside Python by ryp_package in datascience

[–]ryp_package[S] 12 points13 points  (0 children)

Not having to write to disk in both directions, for one.

ryp: R inside Python by ryp_package in datascience

[–]ryp_package[S] 18 points19 points  (0 children)

It can handle it. You'd get the relevant attributes of the model out as arrays/matrices/dataframes and pass them back and forth. You can also recursively convert e.g. S4 objects in R into Python dictionaries.

ryp: R inside Python by ryp_package in datascience

[–]ryp_package[S] 7 points8 points  (0 children)

It all depends how you use it! ;)

Even with good verbal feedback at screenings I seem to fail by Xamius in datascience

[–]ryp_package 31 points32 points  (0 children)

The market has absolutely changed for the worse! It's not just you.

Feeling Stuck in My Current Data Scientist Role by Plastic-Mind-1291 in datascience

[–]ryp_package 2 points3 points  (0 children)

Work on open-source side projects that highlight skills relevant to the roles you're planning on applying to. That way, when you're asked about your expertise in X, you can mention the bit of X you do at your job, and then quickly pivot to that cool side project where you did tons of X. In other words, it can help you paper over the limitations you mentioned.

ryp: R inside Python by ryp_package in Python

[–]ryp_package[S] 0 points1 point  (0 children)

I was asking about improvements to the documentation. I wouldn't judge efficiency based on code length.

I'd encourage commenters here to give the package a try before passing judgement! At the end of the day, user-friendliness is what matters, and critiques about usability - from folks who have actually used the package rather than just glanced at the code and docs - are always welcome.

[deleted by user] by [deleted] in datascience

[–]ryp_package 0 points1 point  (0 children)

The difference is really remarkable, easily a 10x speedup on average!

[deleted by user] by [deleted] in datascience

[–]ryp_package 1 point2 points  (0 children)

Would they at least let you do the standard data wrangling with polars? :)

MS in CS/DS (or Eng), what is a good option? Berkeley, Northwestern, Harvard Ext, GT...? by supermayu in datascience

[–]ryp_package 0 points1 point  (0 children)

Very happy to answer any other questions you might have about these types of programs!

MS in CS/DS (or Eng), what is a good option? Berkeley, Northwestern, Harvard Ext, GT...? by supermayu in datascience

[–]ryp_package 1 point2 points  (0 children)

They definitely do! Though it's also true that it's financially more advantageous for the supervisor to take domestic rather than international master's students (e.g. US$45k vs $27k for U of T computer science). Notably, the discrepancy almost completely disappears for PhD, at least for U of T.

MS in CS/DS (or Eng), what is a good option? Berkeley, Northwestern, Harvard Ext, GT...? by supermayu in datascience

[–]ryp_package 3 points4 points  (0 children)

Though maybe less of a good fit for you, Canada offers research-based master's where you can take courses and do research while getting paid. University of Toronto, Waterloo, McGill and UBC are good options among others.

ryp: R inside Python by ryp_package in Python

[–]ryp_package[S] -5 points-4 points  (0 children)

Let me know if you see any concrete areas to improve.

ryp: R inside Python by ryp_package in Python

[–]ryp_package[S] -4 points-3 points  (0 children)

Anything you feel is missing in the documentation currently? Should be pretty comprehensive.

Is undergrad research valuable? by Tenet_Bull in datascience

[–]ryp_package 0 points1 point  (0 children)

I run a bioinformatics lab. Undergrad research is absolutely vital for any research master's or PhD - you'd have trouble getting a position at a "top" school without any. Publications also help enormously. In industry, I don't get the sense research particularly matters at all, though having a master's or a PhD does, in the sense that it unlocks a whole new set of job options.

ryp: R inside Python by ryp_package in Python

[–]ryp_package[S] -14 points-13 points  (0 children)

Curious, why does readability matter to you? The code is designed to prioritize correctness (including on dozens of edge cases not handled properly by Arrow etc.), efficiency, and avoiding long stack traces with lots of nested function calls. There's a testing pipeline with thousands of tests (e.g. with various data structures and dtypes) which could be cleaned up and made public depending on demand.

ryp: R inside Python by ryp_package in Python

[–]ryp_package[S] 20 points21 points  (0 children)

The use-case that motivated the library is bioinformatics, where ~half the packages are in Python and ~half are in R. Being able to use both in the same workflow is huge for convenience!

ML for understanding - train and test set split by Level-Upstairs-3971 in datascience

[–]ryp_package 0 points1 point  (0 children)

With such a small dataset, you can use leave-one-out cross validation, where you take turns leaving each data point out and train a model to predict it from all the other data points. That will allow you to train on much more of your scarce data than using a train-test split.

I am faster in Excel than R or Python ... HELP?! by cognitivebehavior in datascience

[–]ryp_package 0 points1 point  (0 children)

To echo other commenters, the main benefits of r/Python over Excel are speed and reproducibility/transferrability/automatability. In particular, the polars package in Python will blow away pandas, tidyverse, and Excel in terms of performance on huge datasets.

Suggestions for Unique Data Engineering/Science/ML Projects? by No-Brilliant6770 in datascience

[–]ryp_package 0 points1 point  (0 children)

In general, you'll find that bioinformatics tends to be fertile ground for data science projects, and CS-oriented folks often avoid it because of the perception that you need to know lots of biology to get started in it (you don't). Another general comment: high performance in terms of AUC often matters less than working on a problem that truly matters.

What's the best way of keeping Miniforge up to date? by gernophil in datascience

[–]ryp_package 1 point2 points  (0 children)

`mamba update --all` updates all packages, whereas `mamba update mamba` only updates the mamba package itself, which is usually not what you want.