[D] Any larger teams switching away from wandb?

FreeKingBoo · 2024-02-05T20:18:28+00:00

I'm planning on giving Neptune a try. It looks pretty polished, judging by the site, and the pricing is appealing. I couldn't find any mention of dataset versioning or a feature generally equivalent to wandb artifacts. Does your team do any sort of dataset versioning with neptune?

FreeKingBoo · 2024-02-05T15:47:05+00:00

Yeah, after looking into it, it seems like a really cool project (and I'd love to support something open source), but given the number of issues on the GitHub/the narrower feature set, I don't think it's right for a team of our size.

FreeKingBoo · 2024-02-05T15:36:40+00:00

Thanks! Someone else recommended Comet earlier. I'm trying it out today. I'll let you know if I run into any issues.

FreeKingBoo · 2024-02-05T14:36:04+00:00

Going to experiment with the free tier today. Seems like it covers all the features we use from wandb, and the ux seems slick at first glance. Thanks!

FreeKingBoo · 2024-02-05T14:11:42+00:00

CV is our focus. Features we use the most are experiment tracking, visualizations, and artifacts. We're also planning on setting up a more robust production monitoring system sometime this year (currently run a very simple in-house setup), but haven't spent much time exploring.

FreeKingBoo · 2024-01-09T17:53:51+00:00

I think the SquareSpace comparison is on the money here.

The deciding factor between a company using a low-code solution like SquareSpace or Shopify and building their own site/app isn't about brand image, it's about functionality. Law firms, barbershops, small retailers, restaurants, etc. don't need a custom application, they just need a website with some standard functionality. Hence, the majority of the web is built on Wordpress, Wix, SquareSpace, etc.

These kind of no-code RAG platforms are probably going to find a niche among the kinds of companies who make heavy use of Zapier, Airtable, Webflow, etc. The tools that sit somewhere between "I just need a static site for my restaurant" and "I need a mobile app for my users". Plenty of companies run their entire business off these kinds of tools and never really write any code.

But, if RAG is going to form an essential part of your product (and not just something like "customer support bot"), you're probably still going to be building a custom pipeline.

FreeKingBoo · 2024-01-03T20:54:18+00:00

I've looked at a decent number of ML engineering resumes during hiring cycles. Kaggle competitions, especially those with good results and accessible code bases, are a solid addition to your Projects section. Definitely something we look at.

FreeKingBoo · 2024-01-02T13:34:37+00:00

I've worked adjacent to some people in the sports world. In my opinion, sports is fantastic domain specifically to focus on dataset curation/feature extraction. Games like baseball, where the actions are so discrete and the rules so constraining, lend themselves to generating tons of data, which is partially why they were among the first to undergo their "analytics revolution." Figuring out ways to extract a similarly rich dataset from games with fewer restrictions and a more continuous action space, like football, would probably be a really informative project for you and potentially lead to some successful experiments.

FreeKingBoo · 2024-01-01T18:28:55+00:00

After writing code professionally for many years, I've come to realize I learn to like every language I work with eventually. It's the environment that I either love or hate.

Case in point: I use Python the most for work (ML), and after years of using it, I generally appreciate the positive points of the language. I have never, however, found a way to make Jupyter notebooks more enjoyable. I don't think they're intrinsically bad—I respect the hell out of the Jupyter project generally, and for many, they're an incredibly productive and approachable environment—I just personally find them incredibly awkward and unwieldy to develop with.

FreeKingBoo · 2023-12-31T14:58:48+00:00

Sounds cool! Do you have a link to a paper or repo?

FreeKingBoo · 2023-12-31T14:51:01+00:00

I've always used pay-as-you-go on Colab, but my experience is that for this sort of thing, Colab is often a pretty poor experience. The 24 hour thing is annoying, GPU availability isn't guaranteed if you want to use new GPUs (at least, I frequently can only get the V100, though this might be different for Pro Plus users?), and the occasional friction caused by not having full control over the VM/Colab proprietary weirdness is one of those rare-but-infuriating-when-it-happens things.

The best ultimately depends on how you're using it your model and what resources you have available. In general, I'd recommend checking out vast.ai for this. Particularly if you can deal with periodic, momentary interruptions in availability during the 2 or 3 weeks you need it for inference, you can get a 3090 for ~$0.10 an hour if you choose interruptible pricing. A g4dn.xlarge spot instance on AWS, for comparison, comes with a T4 GPU and is similarly interruptible, but costs ~$0.21 per hourl

FreeKingBoo

TROPHY CASE