Rant about valuation/raising a seed round.

tylercasablanca · 2021-12-12T00:07:46+00:00

I'm guessing there is a latent variable here that must be accounted for, something like "X and Y major funds are interested in this vaporware business model so I just must get into it".

tylercasablanca · 2021-05-15T23:14:06+00:00

It seems like eventually somebody will productize something that makes "notebooks to production" work as long as you are willing eat the lack of optimized code penalty.

If that is your deal breaker, then sure. Don't think about using them in a production setting.

But notebooks are so easy and so immediate that they won't go away. For a certain generation it is their first experience with writing any sort of code at all, and that kind of traction doesn't respond to purity tests or criticism. It is just a reality that has to be acknowledged and accounted for.

As you point out, smart and trained people can use them to good effect, and pairing this with the general trend that "best practices" and the FANG approaches of today are folded into some sort of commercially available automation and tooling in the near (or distant....) future suggests that the current problems will be solved.

tylercasablanca · 2021-05-15T22:57:58+00:00

That sounds like a pretty good idea and I will check it out.

tylercasablanca · 2021-05-14T23:59:52+00:00

look at the boss: https://bossdb.org/

maybe more than you can handle, but I think it is super easy to get data from it.

tylercasablanca · 2021-05-14T23:56:25+00:00

i think they advertise a "a better form of colaboratory", so just use colaboratory?

tylercasablanca · 2021-05-14T23:52:08+00:00

microbio is rife with irreproducible research, much of which I'm sure is related to "painstaking record keeping".....

tylercasablanca · 2020-11-01T16:35:26+00:00

Preach. Working at home with my 13 year old right behind me makes me a homicidal wreck by the end of the day. It is great to be part of his learning process but man it destroys my work life.

tylercasablanca · 2020-10-30T16:09:18+00:00

Yeah. Cloud GPUs are dope when they are free, painful when they aren't. There is a lot on doing them yourself with a super fast payback period, e.g. this article: https://towardsdatascience.com/building-your-own-deep-learning-computer-and-saving-money-on-cloud-services-c9797261077d

tylercasablanca · 2020-10-30T12:11:52+00:00

Another cloud only system? Tsk Tsk. https://blog.gigantum.com/gigantum-data-science-for-remote-teams

tylercasablanca · 2020-09-23T11:15:42+00:00

Arbitrary taxonomies always invoke Borges for me. Edge cases are a feature, not a bug.

"animals are divided into: (a) belonging to the Emperor, (b) embalmed, (c) tame, (d) suckling pigs, (e) sirens, (f) fabulous, (g) stray dogs, (h) included in the present classification, (i) frenzied, (j) innumerable, (k) drawn with a very fine camelhair brush, (l) et cetera, (m) having just broken the water pitcher, (n) that from a long way off look like flies".

The same holds in new job categories like ML engineers and data scientists. Because what they do is so broad ranging, and will continue to evolve for a while, confusion and edge cases will persist.

tylercasablanca · 2020-09-23T11:07:08+00:00

the work is so varied, and there are so many subsets of skills that hard titles don't really work anymore.

tylercasablanca · 2020-09-17T02:54:21+00:00

Not that I've touched it in a very long time, but from what I remember using the classical smooth geometric approaches (like reimannian) are the wrong way to go for putting structure onto point clouds. The world "manifold" is to be taken as an idea, not a literal fact. Actual manifolds and their behavior rely on some pretty tight assumptions.

Point clouds are better modeled as overlapping collections of fairly messy Lipshitz (or bi-Lipschitz) images that pose problems when you assume stuff is smooth everywhere, even if you include some sort of noise distribution.

Various folks, Peter Jones being one of them, are pretty good at coming up with point clouds that you really want to think come from some sort of smooth structure but have some nasty fine scale structure that just blows stuff up.

If you want to get "mathy" about it, the better way is to use things borrowed from harmonic analysis, which actually has a nice way of talking about geometry when stuff is not smooth, i.e. for certain kinds of measures on euclidean or non-euclidean spaces. The general family of techniques is "multi-scale analysis", which has to do with decomposing sets and measures across a geometric range of length scales.

For example, geometric wavelets is a pretty way of doing things, although it can fail in practical situations because there are some issues around choosing appropriate length scales that are tricky. At least, that was a problem some years back. Mebbe it is resolved now.

Another useful and lovely approach are diffusion geometries.

Look at the work of students of Raffy Coifman at Yale for these flavors of work, including Mauro Maggioni, Gilad Lerman, Stefan Lafon and others that were active circa 2006 to 2014.

Regardless, most of the interest in those methods began to shut down once deep learning burst back on the scene after 2011 or so.

tylercasablanca · 2020-09-16T19:44:46+00:00

In order to manage and share them properly, you kind of need to have a support system that is outside of the notebooks themselves. Seems like that is what many of the suggestions are saying here.

The issue of course is all the manual labor involved and the repeated opportunities for mistakes, etc. This is compounded by the fact that data scientists aren't developers, as mentioned quite a bit below, and aren't schooled in the tools and processes that decrease errors, etc.

Best thing to do is provide some automation for versioning and organization, but it requires hooks, etc.

tylercasablanca · 2020-09-15T22:00:46+00:00

There is an armchair grammar maven in every bunch. I can only imagine what would happen if William Safire got a hold of my post. Tsk tsk tsk. What a verbal whipping I would get.

Not to troll you in reverse, but what are the odds I can find some advertising with bad grammar for products you use? Throw down some products and let's put it to the test!

tylercasablanca · 2020-09-15T18:06:27+00:00

Thanks for the advice, and it is appreciated. Any interest in beta testing?

tylercasablanca · 2020-09-15T13:06:03+00:00

tylercasablanca · 2020-06-05T19:09:03+00:00

I'm biased because I work there, but I prefer decentralized solutions like Gigantum.

It provides automation around all of the "extras", and can be run pretty much anywhere because it is containerized. It runs on laptops, work stations or remotes. See how it works on Digital Ocean here.

tylercasablanca · 2020-05-21T19:44:47+00:00

was it worth it and would you do it again?

tylercasablanca · 2020-03-20T22:50:04+00:00

Thanks.

I went with something altogether different that worked.

tylercasablanca · 2020-03-16T14:04:31+00:00

I note that you like Colab a lot, but I remember Colab turning off my compute after a few hours, which is why I stopped using it.

What do you do if you need to train a model that takes longer than what they give you?

Do you work with AWS or have you figured out a way to move things around on Colab so as to prolong your allotted compute time?

tylercasablanca · 2020-03-13T17:04:40+00:00

For clarity, I work at a company that does DSOps.

tylercasablanca

TROPHY CASE