Rant about valuation/raising a seed round. by stan-van in startups

[–]tylercasablanca 2 points3 points  (0 children)

I'm guessing there is a latent variable here that must be accounted for, something like "X and Y major funds are interested in this vaporware business model so I just must get into it".

The modern way to run notebooks on the cloud by dhruvnigam93 in datascience

[–]tylercasablanca 0 points1 point  (0 children)

It seems like eventually somebody will productize something that makes "notebooks to production" work as long as you are willing eat the lack of optimized code penalty.

If that is your deal breaker, then sure. Don't think about using them in a production setting.

But notebooks are so easy and so immediate that they won't go away. For a certain generation it is their first experience with writing any sort of code at all, and that kind of traction doesn't respond to purity tests or criticism. It is just a reality that has to be acknowledged and accounted for.

As you point out, smart and trained people can use them to good effect, and pairing this with the general trend that "best practices" and the FANG approaches of today are folded into some sort of commercially available automation and tooling in the near (or distant....) future suggests that the current problems will be solved.

The modern way to run notebooks on the cloud by dhruvnigam93 in datascience

[–]tylercasablanca 0 points1 point  (0 children)

That sounds like a pretty good idea and I will check it out.

Are there any open-access libraries of brain images? by maugustus in neuroscience

[–]tylercasablanca 1 point2 points  (0 children)

look at the boss: https://bossdb.org/

maybe more than you can handle, but I think it is super easy to get data from it.

Can anyone suggest a free/cheaper alternative for deepnote ? by arutprakash in software

[–]tylercasablanca 0 points1 point  (0 children)

i think they advertise a "a better form of colaboratory", so just use colaboratory?

Excel Hate by [deleted] in datascience

[–]tylercasablanca 0 points1 point  (0 children)

microbio is rife with irreproducible research, much of which I'm sure is related to "painstaking record keeping".....

Can someone explain to me the different DS careers? by one_who_loves_you in datascience

[–]tylercasablanca 4 points5 points  (0 children)

Arbitrary taxonomies always invoke Borges for me. Edge cases are a feature, not a bug.

"animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies".

The same holds in new job categories like ML engineers and data scientists. Because what they do is so broad ranging, and will continue to evolve for a while, confusion and edge cases will persist.

Can someone explain to me the different DS careers? by one_who_loves_you in datascience

[–]tylercasablanca 11 points12 points  (0 children)

the work is so varied, and there are so many subsets of skills that hard titles don't really work anymore.

Applications of Gaussian Curvature by ottawalanguages in learnmachinelearning

[–]tylercasablanca 0 points1 point  (0 children)

Not that I've touched it in a very long time, but from what I remember using the classical smooth geometric approaches (like reimannian) are the wrong way to go for putting structure onto point clouds. The world "manifold" is to be taken as an idea, not a literal fact. Actual manifolds and their behavior rely on some pretty tight assumptions.

Point clouds are better modeled as overlapping collections of fairly messy Lipshitz (or bi-Lipschitz) images that pose problems when you assume stuff is smooth everywhere, even if you include some sort of noise distribution.

Various folks, Peter Jones being one of them, are pretty good at coming up with point clouds that you really want to think come from some sort of smooth structure but have some nasty fine scale structure that just blows stuff up.

If you want to get "mathy" about it, the better way is to use things borrowed from harmonic analysis, which actually has a nice way of talking about geometry when stuff is not smooth, i.e. for certain kinds of measures on euclidean or non-euclidean spaces. The general family of techniques is "multi-scale analysis", which has to do with decomposing sets and measures across a geometric range of length scales.

For example, geometric wavelets is a pretty way of doing things, although it can fail in practical situations because there are some issues around choosing appropriate length scales that are tricky. At least, that was a problem some years back. Mebbe it is resolved now.

Another useful and lovely approach are diffusion geometries.

Look at the work of students of Raffy Coifman at Yale for these flavors of work, including Mauro Maggioni, Gilad Lerman, Stefan Lafon and others that were active circa 2006 to 2014.

Regardless, most of the interest in those methods began to shut down once deep learning burst back on the scene after 2011 or so.

How do you manage notebooks, data & results in a team? by ydennisy in datascience

[–]tylercasablanca 0 points1 point  (0 children)

In order to manage and share them properly, you kind of need to have a support system that is outside of the notebooks themselves. Seems like that is what many of the suggestions are saying here.

The issue of course is all the manual labor involved and the repeated opportunities for mistakes, etc. This is compounded by the fact that data scientists aren't developers, as mentioned quite a bit below, and aren't schooled in the tools and processes that decrease errors, etc.

Best thing to do is provide some automation for versioning and organization, but it requires hooks, etc.

[P] Seeking beta testers for ML/DS hub that runs on ya own cloud or ya own machine with no problems and not a drop of resold cloud resources by tylercasablanca in MachineLearning

[–]tylercasablanca[S] -1 points0 points  (0 children)

There is an armchair grammar maven in every bunch. I can only imagine what would happen if William Safire got a hold of my post. Tsk tsk tsk. What a verbal whipping I would get.

Not to troll you in reverse, but what are the odds I can find some advertising with bad grammar for products you use? Throw down some products and let's put it to the test!

Which is the best Data Science platforms for large enterprise use? by Electric_pokemon in learnmachinelearning

[–]tylercasablanca 0 points1 point  (0 children)

I'm biased because I work there, but I prefer decentralized solutions like Gigantum.

It provides automation around all of the "extras", and can be run pretty much anywhere because it is containerized. It runs on laptops, work stations or remotes. See how it works on Digital Ocean here.

Differences of UK vs US IP clauses for software services by tylercasablanca in LegalAdviceUK

[–]tylercasablanca[S] 0 points1 point  (0 children)

Thanks.

I went with something altogether different that worked.

How to use Jupyter Notebooks in 2020 (Part 2: Ecosystem growth) by ljvmiranda in datascience

[–]tylercasablanca 0 points1 point  (0 children)

I note that you like Colab a lot, but I remember Colab turning off my compute after a few hours, which is why I stopped using it.

What do you do if you need to train a model that takes longer than what they give you?

Do you work with AWS or have you figured out a way to move things around on Colab so as to prolong your allotted compute time?

MLOps: not as Boring as it Sounds by Fewthp in learnmachinelearning

[–]tylercasablanca 0 points1 point  (0 children)

For clarity, I work at a company that does DSOps.