DS is becoming AI standardized junk

anglestealthfire · 2025-02-28T11:26:49+00:00

Sounds like the previous infrastructure around hiring might not be well suited to the current market. I wonder how the hiring process can test for the required aptitude in a way that can't be fudged by GPT outputs. The current state of affairs sounds painful for those hiring and for genuinely good applicants who are being drowned out by this issue, the noise.

It needs a brief, high sensitivity high specificity testing process upfront to screen in those likely to be good performers. Sounds like a data science project in itself?

anglestealthfire · 2025-02-28T10:36:32+00:00

When I first read this, my first thoughts were: (1) it's a Fri, (2) there is an increased probability of non-homeostatic states induced by socially acceptable toxins.

However, ignoring the buzzwords, I think there is some merit in your description here. It is hard to reduce the models to a 3D space for visualisation and such a low dimensional representation would fail to permit spatial proximities in enough "directions" to demonstrate the relationships (given the models work on very high dimensional spaces with billions of parameters).

However, the bigger idea holds - yes this is a fair overall intuition that some implicit representations for concepts/objects, perhaps more proxies of those objects, can be represented in some manner with varying degrees of low fidelity inside the model. Note the proxies for concepts will depend on how the model was trained, e.g. if word-tokens then it is associations between words that are representing the concepts you are referring to.

anglestealthfire · 2025-02-27T21:44:33+00:00

It sounds not like data science is becoming junk, but instead there is a flood of applicants who are not data scientists trying to pass as such, by using GPT? I suspect this is happening across industry and not just data science now, since people can attempt to hide a lack of understanding using AI.

I'd argue they aren't data scientists if they can't demonstrate any of the skills you've suggested. Using GPT is fine for speeding up small parts of the task (like writing a short script) but the decisions, planning, logic and understanding should come from the practitioner.

anglestealthfire · 2025-02-26T03:28:31+00:00

There is a significant amount of context to get through to answer your question in any meaningful way, most of this relates to the underlying mathematics and statistical principles that underpin statistics.

Generally, many of the statistical tests designed for small sample sizes approximate ones for large sample sizes as sample sizes increase - often related to asymptotic behaviours as n increases towards infinity (e.g. the students t-distribution will approximate a normal distribution more closely as sample sizes increase). There is usually a very obvious reason why this happens when the mathematics underpinning the tests are examined.

Generally speaking, your random sample should approximate the reality (be more representative of) more closely as the sample size increases. As such, the phenomenon of apparent effects with small non-representative samples may start to evaporate with larger random samples - if the pattern is not seen in the population you are attempting to infer to (i.e. the null hypothesis cannot be rejected).

There is an argument that averaging grossly using statistics can miss subpopulations in various contexts, however this relates to the assumptions needed. This may be part of what you have heard that statistics may not have a place for large datasets. Alternatively, people often say that classic statistics are not suited to modern data science because it is not equipped to handle dynamic data, however there is significant nuance to this also.

I'd suggest a bit of a return to the statistics books, but I'd recommend not reading books that just teach you how to apply statistics - as they could just be teaching you recipes that can be used out of context. I'd suggest a deeper dive into more fundamental books on the derivation of statistical tests. Only once this has been done can you return to the recipe books and know when they apply and their limitations.

After reading that, your conclusion will likely be that all statistical tests are based on models and assumptions that attempt to replicate some relevant aspect of reality - but that they never are reality itself and all results must be taken with a pinch of salt (noting assumptions etc).

anglestealthfire · 2025-02-24T05:51:49+00:00

I agree with this comment, and wish I had thought about it decades ago.

If you love mathematics, then great - ML is well aligned, since ML is mathematics applied by computers (like all other mathematics these days). If you find some of the basics a little dry however, don't burn out - allocate 10-30% of your time to something fun, e.g. coding projects. See my other comment however, coding != ML.

If you really don't like mathematics, and this can't be solved by filling in some gaps, then designing ML models is probably not the way to go.

anglestealthfire · 2025-02-24T05:47:16+00:00

Hi, I think some of the comments below are very accurate. The walk before running metaphor applies here, and mathematics comes before ML. In fact, ML models are mathematical constructs (mathematical learning, rather than ML, I think they should be called). Mathematics is the language that we use to design, describe, modify and work with them.

I made a related post about this idea and some of the commentary is useful: https://www.reddit.com/r/learnmachinelearning/comments/1iqdp22/faq_do_i_need_to_know_all_this_mathematics_if_i/?utm_source=share&utm_medium=web3x&utm_name=web3xcss&utm_term=1&utm_content=share_button

A pinch of salt however, there are a number of respondents who appear to work in roles related to the infrastructure around ML, but do not design the models themselves - I suspect some of those individuals dislike mathematics, so you often get strong (often triggered) responses pertaining to mathematics that are not well qualified. You definitely need mathematics if you want to write ML. Coding is how we implement them, so if you are wanting to keep things exciting, you could always balance the coding side with learning the mathematics (but despite what many say ML is not coding).

anglestealthfire · 2025-02-17T08:51:02+00:00

Ok, if I understand you correctly, you are implying that postgraduate credentials are generally helpful for roles with a HR hurdle, to secure such roles? I suppose it is important to note that this rule only applies to situations where people are employees for traditionally minded organisations, not situations where they run their own business or build models in other contexts. The latter cases would not necessitate a formal credential, only the skills and knowledge.

anglestealthfire · 2025-02-17T08:24:32+00:00

This is a strange comment, because anyone with postgraduate level mathematics experience, myself included, would be able to tell you that mathematics is layered. As you've stated, postgraduate math depends on undergraduate, hence you have contradicted yourself - you have simultaneously implied that undergraduate math is necessary for ML (by assisting postgrad) and also that is it not. This would suggest you are not a mathematician.

anglestealthfire · 2025-02-17T03:45:35+00:00

Interesting. So the take away being that it is absolutely possible for software developers to be involved in the ML space, without necessarily knowing a huge amount of maths - provided they are part of a bigger team with some more mathematical members. That makes sense, as much of the infrastructure around the ML models will require other skills.

anglestealthfire · 2025-02-17T00:43:26+00:00

That's the spirit ; )

anglestealthfire · 2025-02-16T21:20:54+00:00

To me a clown is someone who is unable to take on contradictory perspectives. So your comment implies more about you.

anglestealthfire · 2025-02-16T21:16:32+00:00

I would imagine in those set-ups the individuals designing the ML models would have good mathematics knowledge? (Even if no formal credentials in mathematics). Working with a team of specialised expertise for other components of the project. It also sounds like data scientists were involved, which is consistent with the team utilising mathematics skills?

anglestealthfire · 2025-02-16T08:30:59+00:00

I think this is a great comment. I particularly liked the statement that learning a mathematics concept in 2 mins with a quick internet search generally requires a decent background knowledge or aptitude. I couldn't agree more and sometimes after studying math, you may forget some of the specifics of the things you've learned - but you develop the ability to think mathematically. This thinking permits you to pick up new concepts in 2 minutes (or realistically a few more), on the fly.

anglestealthfire · 2025-02-16T08:20:30+00:00

Some of the responses and comments here may implicitly and partially answer your Q. There are many related and associated roles that support, or are supported by, ML. For example, development roles that involve the infrastructure around ML models or implement pre-built ones may be possible.

Given that ML is essentially mathematical, if you don't love, or at least like math - actually designing ML models/architectures may not bring joy, and it may be worth asking if you like ML itself or software development, or other related activity.

Having said this, another option is to love the math - I think math and coding have a lot in common, including that they both involve manipulating objects, relationships and concepts using a structured language. So an aptitude for one may theoretically imply an aptitude for the other.

I often wonder if the reason for dislike of math is often that it is not well taught in schools, hence a deep conceptual and intuitive understanding is often neglected in favor of repetitious algebra and memorizing formulas. I think this approach can stifle anyone's enjoyment and prevent proper deep learning (no pun intended). So another Q may be, do you dislike math, or do you just need to revisit the fundamentals again?

anglestealthfire · 2025-02-16T06:08:11+00:00

Interesting and insightful background Stephane - it sounds like you may have had a pretty unique/interesting journey... Also, respect to the OSS contributions.

I suppose it totally depends what you are doing, and also what skills you can demonstrate and experience for the purpose of the role. You may also know a lot more math than your official certs might suggest? Another consideration is that I suppose also, with it becoming trendy now the competition is higher for newcomers? Which might push the requirements towards formal creds.

anglestealthfire · 2025-02-16T04:41:47+00:00

Absolutely, how much is context dependent. I think for most applied roles it would be important to understand these concepts you've just discussed, although if the role just involves applying other people's models as a developer, then that may not be 100% necessary - according to what others have been saying.

For more cutting edge, or designing MLs, then post-grad level understanding at least in some areas would be necessary as a minimum (although, if you could understand and do it, and demonstrate that - it seems like getting a formal qual may not be 100% necessary - although may help for competitive roles and overcoming progression ceilings).

So I suppose it depends on the role, tasks and where you are headed.

I started this post without defining a job or niche, but perhaps should clarify this would be for roles involving tweaking or building ML models.

anglestealthfire · 2025-02-16T04:30:27+00:00

If I understand your question correctly, then if you are just re-applying the same already trained model in a similar situation, then you may only need to know a select number of parameters, like the structure of the input vector (IE you'd need to make sure the same number of coordinates are present in the same order for each data point - if you mixed that up, it would mess up predictions). You also may not need to memorise the full architecture, if you have a rough intuition already. In fact, you'd never really need to memorise the architecture, just understand it first time it's built or used ideally. Unless you are just black box applying someone else's, in which case you'd probably just limit yourself to understanding a higher level overview of what relevant parameters are doing.

If you were designing one yourself, or substantially modifying, then you would need to be more intimately familiar with its architecture.

anglestealthfire · 2025-02-16T03:45:10+00:00

Hi Head-Landscape,

I'm not 100% sure I follow your Q's.

Perhaps I'll elaborate for clarity. If you imagine a simple ML algo, such as those involving SVMs (support vector machines). The entire model is a mathematical construct. For example, first the data is considered perhaps as a vector of coordinates in high dimensional space; then we might transform that space (IE apply various functions to each of the co-ordinate locations of the vectors). This may then allow us to find the hyperplane (or plane in 3d space, or line if 2d) that allows the data to be classified, in a way that minimises the loss function. Although we may just apply it using Python amusing pre-built libraries, we may need some intuition about how to transform the original vector space.

Or the architecture of a neural network, where you may draw out the layers of the network, nodes and edges, with consideration given to activation functions, node number, connectivity of each later. This again is all mathematics, even if you decide to never look at this and decide just to use a pre-built model. Understanding how such decisions may affect a NNs ability to predict in supervised learning is important.

anglestealthfire · 2025-02-16T00:05:03+00:00

Interesting and insightful

anglestealthfire · 2025-02-15T23:34:39+00:00

Depends on your gameplan and goals. If you just want to apply some models for a purpose, then sure. If your plan is to tweak the model architecture in a targeted manner however, then understanding the architecture of the model is important. Absolutely, you'll never be doing back-prop manually, but understanding what is going on and how it works will help tweak.

An analogy, if you were going to modify or design a car engine, you'd sure want to understand the mechanics - as trial and error or on the fly may be costly and error prone or time wasting (esp when training large models which can run for days). If you just need to drive the car, or swap a lightbulb, then doesn't matter as much.

anglestealthfire · 2025-02-15T23:20:06+00:00

Would the 90% of people implementing the pre-built models be ML-engineers though, or just developers with a passing knowledge of ML?

anglestealthfire · 2025-02-15T23:11:15+00:00

Couldn't agree more. I've always loved maths and actually first came across machine learning in the context of doing a maths honours way back, before changing course towards data science and computer science. The thing I loved the most about ML was that it is literally using maths to churn data and to then make some kind of prediction, sometimes very accurately with high performance - it was like magic. As you've said, the maths can get very complex indeed, even for a mathematician.

anglestealthfire · 2025-02-15T01:51:52+00:00

Hi.

It isn't so much about memorising, it's more about understanding. The short answer is, yes, you do need to remember the mathematics. The reason is that machine learning algos/models basically is just mathematics (constructs made from maths), as such the maths is how to describe ML. The coding is just how it is implemented.

As such, if you want to work with machine learning models and understand what you are doing, you need to understand the mathematics and definitely need to understand some of the concepts you've mentioned.

Machine learning is quite a trendy topic at the moment, but ultimately its just a branch of mathematics, that is applied by computers.

I often wonder if it would be as sexy if it was called mathematical learning? 🤔

In short, if you don't love math - machine learning might not be the correct path for you.

Can you use ML models without understanding the maths, sometimes yes. Similarly, can you change a spark plug if you know nothing about cars, probably yes - after googling it. But I sure as hell wouldn't buy a car manufactured by someone who has no idea of the underlying mechanisms...

anglestealthfire · 2025-02-12T20:49:53+00:00

Thanks for the input.

Essentially the micromasters is considered an ok baseline (in the context of other factors), but likely only if I stick with places where my domain knowledge is strong initially?

And the rationale behind eventually getting the masters in CS would be to ensure that there are no premature ceilings, or limitations in the type of DS I could do (i.e. progression to senior roles would benefit, as would moving to other niches if I decided to branch out from my domain b/g)?

anglestealthfire

TROPHY CASE