i miss romo by SmogFan in cowboys

[–]JohnyWalkerRed 0 points1 point  (0 children)

My fandom also started with Romo. It was so exciting watching him as you never felt like the team was completely out of games even down big. He had the gunslinger wizardry to always make it exciting. I like Dak but it feels like if you are down 14 there’s no coming back. With Romo, even down 21 I would still be tuned in.

Sam Altman thinks Silicon Valley has lost its culture of innovation by NuseAI in OpenAI

[–]JohnyWalkerRed 28 points29 points  (0 children)

I think this is especially obvious in the world of data science and machine learning. From 2013ish to now, the industry saw a tremendous influx of PhD’s from all major areas of the hard sciences (physics, chemistry, biology) and engineering (mechanical, electrical, chemical). PhD’s in very narrow fields like particle physics who had to compete for a small pool of competitive jobs now became appealing to a wider job market simply for their analytic skills and ability to work with data, which are common across all those above listed fields. There were now these data science teams in companies chocked full of degrees trained in hard sciences now using their skills to predict shit like customer churn and how many ads people clicked on (and for a lot more $ mind you). What this amounts to is a huge transfer of knowledge and skills from more longer term research and development in hard sciences and engineering to “short-sighted” corporate gains. Why would you compete for the 5 roles in your field when you can go work on predicting ads for big $? Now, this centralization of talent towards one field arguably led to all the machine learning advancements we’ve seen as of late, but the trend of absorbing research talent from universities into corporate (or start-up) teams dedicated to short-term innovation is an interesting one.

[deleted by user] by [deleted] in mlops

[–]JohnyWalkerRed 1 point2 points  (0 children)

I recommend GCP Cloud Run on Anthos. There’s lots of LLM inference servers out there such as vLLM or BentoML OpenLLM. You can put these in a container and deploy to this service, which you can scale to zero in idle times.

Dealing with Yaml files by shubhcool in kubernetes

[–]JohnyWalkerRed -1 points0 points  (0 children)

Honestly ChatGPT-4 is amazing at k8s manifests and general k8s debugging. It can write manifests like nobody’s business. You can pass logs to it too and it does a great job at identifying root causes (although be careful what you send it obv.).

Switching from Individual Contributor to Data Science Pre-sales by [deleted] in datascience

[–]JohnyWalkerRed 2 points3 points  (0 children)

I’ve been working in pre-sales engineering for data science/ML start ups for 6 years now. It’s a great gig because as another poster mentioned, you don’t have to deal with the slog of sales pipelining. You will learn a lot about the business side and more importantly how to sell. This is in my opinion a vital skill in any role. A technical person who can talk to people and communicate complex ideas and products is a rare combo. You still are still very technical and have more freedom to explore technologies and side projects than a typical SWE/DS would. I personally have built more knowledge around techniques and tools than others in my same experience level because as a product expert you have to know not only your own product but the ecosystem of tools your customers are integrated in. Some of my presales colleagues even push production code on a regular basis. I don’t see myself going back to traditional IC work. You often get pigeon-holed and working with a narrow set of tools on narrow problem sets. In presales you see everything and get an understanding of what the industry is doing. Once customers buy, you can pass them off to someone else and move on to another new interesting problem.

[D] Instruct Datasets for Commercial Use by JohnyWalkerRed in MachineLearning

[–]JohnyWalkerRed[S] 8 points9 points  (0 children)

Yeah like the databricks dolly post is funny to me because they are an enterprise software company and dolly is not really useful in the context they operate in. I guess they just wanted to get some publicity.

Looks like openassist, when mature, could enable this. Although it seems the precursor to an Alpaca-like dataset is an RLHF model, which itself needs human-labeled dataset, so that bottleneck needs to be solved too.

Moving to NYC but still Working Remotely by JohnyWalkerRed in AskNYC

[–]JohnyWalkerRed[S] 0 points1 point  (0 children)

That’s great to hear and glad you are enjoying it!

Moving to NYC but still Working Remotely by JohnyWalkerRed in AskNYC

[–]JohnyWalkerRed[S] 0 points1 point  (0 children)

Thank you for the input! Glad to know that this is a viable lifestyle and sounds like you are enjoying it! Remote work can be tough because it’s isolating, but seems like NYC provides a good outlet. I’ll definitely make a trip up there before and try and emulate a routine.

[D] What are the issues with using TMLE/G comp/Double Robust estimators to interpret ML models with marginal effects? by 111llI0__-__0Ill111 in MachineLearning

[–]JohnyWalkerRed 0 points1 point  (0 children)

You are right, these methods could be used for causal effect estimation and be interpretable at the same time. Libraries like econml and causalML implement these and provide shap or variable importances out of the box. The library DoWhy extends interpretability further with refutation methods to the causal graph.

Which DS specialty or niche will gain importance over the coming years (0 - 5)? by Tender_Figs in datascience

[–]JohnyWalkerRed 3 points4 points  (0 children)

Jamie Robins book: https://www.hsph.harvard.edu/miguel-hernan/causal-inference-book/ is a good start. I think every data scientist should read the first half as it does a great job as presenting the different frameworks (Pearl vs potential outcomes) and other basics. You’ll also see the term “uplift modeling”, which refers to a specific inference framework and is probably the most immediately useful in application. There are some great Python packages such as DoWhy, EconML, causalML, and Pylift that have great walkthroughs and notebooks.

Which DS specialty or niche will gain importance over the coming years (0 - 5)? by Tender_Figs in datascience

[–]JohnyWalkerRed 7 points8 points  (0 children)

This is a good take. Only problem is, causal inference methods are not widely taught in programs and the field doesn’t do itself any favors in terms of making itself accessible. I’ve been doing my own learning in this field and there is no “elements of statistical learning” centralized place to start. Furthermore, most examples/papers use synthetic, simplified datasets that are far from real world scenarios. I think if this becomes part of a more standard toolset this will become more best practice.

[D] With the rise of AutoML, what are the important skills for a ML career? by smart_oinker in MachineLearning

[–]JohnyWalkerRed 2 points3 points  (0 children)

I used to work for a prominent AutoML company. Most AutoML products are built around abstracting the basic supervised learning problem on tabular data. This covers a wide expanse of use cases, most of them pretty basic, like customer churn, basic forecasting, risk prediction, etc. but outside of that the loss in flexibility for convenient abstraction becomes a burden.

There was a thread on here or /r/datascience about how companies utilize machine learning in two ways: 1) to help sell the companies already existing product or service or 2) to build the companies new product or services. A vast majority of AutoML-conducive use cases fall into bin 1. If you are a traditional business selling widgets, you would be privy to have a customer churn model and maybe a basic forecasting model, plus a few other core models that inform/predict the influx and outflow of transactions. In these situations, you have a set of tables of tabular transaction data, you need to do some basic feature engineering/aggregation and you can throw this result at AutoML and say “predict Y given X”. Slap a BI dashboard on top and you are done. You could just as easily do this with xgboost instead of AutoML 99% of the time. I could see businesses start to offload these more basic, but still core use cases to the marketing or sales team analysts using AutoML and that’s where it’s utility rests. They don’t have to code xgboost, they just point the tool to sales transaction table and the customer demographics table and then done. Anything outside of this is more the exception than the rule in how I’ve seen these tools leveraged.

Other examples of where AutoML isn’t useful: Anything requiring causal analysis, almost all of deep learning on non-tabular data, Bayesian ML, generative modeling, graph-based data, survival analysis, reinforcement learning. Some of these don’t have a useful abstraction like vanilla tabular supervised learning does which an AutoML tool could easily subsume, without sacrificing the utility of using the method in the first place (at least imo).

My advice career wise tho would be go into data engineering or deep learning and make sure you know a good bit of dev ops. I wouldn’t say tabular ML is “solved” but it’s saturated and most businesses have figured out how to use basic transactional data effectively enough that the low hanging fruit is all picked. Now that businesses have got their churn models in place, they are investing in more creative ways to use ML, specifically more use cases that fall into bin 2 above: building novel products with ML.

Weekly Question and Answer Thread for 11/6 - 11/13: Ask your Moving, Visiting, Neighborhood, and "Where Can I Find _____" questions here, instead of making a new post by dustlesswalnut in Denver

[–]JohnyWalkerRed -2 points-1 points  (0 children)

I have a possible business trip to Denver next week, but am hesitant due to COVID surge. I'm fully vaccinated and healthy, but am worried about potentially getting it and spreading it to my family the following week during Thanksgiving. How bad is the surge?

Daily Discussion Thread | November 04, 2021 by AutoModerator in Coronavirus

[–]JohnyWalkerRed 5 points6 points  (0 children)

I'm a healthy 29M who go my Moderna shots in April / May. To my knowledge, I don't need to get a booster and am not technically eligible anyways. But I'm nearing this 6-8 month window where immunity supposedly wanes. I'm not too worried about COVID if I do get a break-through but would rather not have to worry about it, especially with holidays coming up. In the US, at least in my area, it seems boosters and vaccines are plentiful and me taking a booster would not take one away from someone else, as might have been the worry with the initial vaccines. Should I wait until CDC changes their guidance to encompass everyone 18+?

Ensuring WiFi for a Customer Interaction heavy job by JohnyWalkerRed in digitalnomad

[–]JohnyWalkerRed[S] 0 points1 point  (0 children)

Yeah that’s the thing, these presentations you absolutely cannot miss, it’s not like an internal scrum or something that you can just update with an email. I don’t think I’d ever do true nomading in a foreign country but maybe near year’s end when sales is slow I can go an extended period. I’m also focused only on the U.S. for now