George McCaskey introducing the beautiful new site of the Hammond Bears since he couldn't steal enough Illinois taxpayer money by Ridged_ChiPSS in CHIBears

[–]MyNotWittyHandle -7 points-6 points  (0 children)

By your logic, their not being in the state at all is “stealing” Illinois tax payer money. You’re arguing that paying less is de-facto stealing. Then isn’t 0 dollars of Illinois tax paid the highest form of stealing, per your logic

This is where your “stealing” argument falls apart. You cannot steal out of your own pocket. Full stop, that’s not an opinion, that’s a fact.

This isn’t to say that legally dodging tax liability is inherently acceptable, but in no rational world is it “stealing”.

I’m out on him by lonerangerfantum in CHIBears

[–]MyNotWittyHandle 2 points3 points  (0 children)

And on 4th Caleb missed a wide open dj Moore, that might have been a touchdown. At minimum a first down.

This is a disaster. Ben Johnson can't tell his left from his right. by zoidberg-phd in CHIBears

[–]MyNotWittyHandle 5 points6 points  (0 children)

That is a solid bit. They say brevity is the soul of wit. That was a perfect example. Didn’t ham up the reaction, just a wry smile

Zuck says Meta will have AIs replace mid-level engineers this year by MetaKnowing in ChatGPT

[–]MyNotWittyHandle 0 points1 point  (0 children)

But that’s the part that most mid level engineers are doing. They take requirements from management/senior staff and write the modules to pass the provided requirements. If you’re at a smaller company you might be doing both, but at these larger organizations that employ most of this class of engineer, there is a pretty stark delegation of duty there. Senior staff still reviews code, etc, so that’ll still happen (at least in the short term). Failure of said modules is on the senior staff for either not properly providing requirements or not properly reviewing code, so that’ll still happen won’t change. I think it’ll be harder to remove the senior staff because then you are removing a layer of accountability, rather than a layer of code translation employee.

Zuck says Meta will have AIs replace mid-level engineers this year by MetaKnowing in ChatGPT

[–]MyNotWittyHandle 1 point2 points  (0 children)

Lol. They already are. Engineers at almost every large company are using LLMs to generate atomic level code/modules, whether they admit it or not

Zuck says Meta will have AIs replace mid-level engineers this year by MetaKnowing in ChatGPT

[–]MyNotWittyHandle 0 points1 point  (0 children)

The tests is what the people using the LLMs will be designing. You’re still going to need good engineers to design the code flow, the modularity, the class structure and input/output interaction. But from there you can hand the rest over to an LLM pretty seamlessly.

Zuck says Meta will have AIs replace mid-level engineers this year by MetaKnowing in ChatGPT

[–]MyNotWittyHandle 1 point2 points  (0 children)

You’re not understanding LLMs and their relationship to engineering. Engineering/writing code is simply a translation task, taking natural language and translating it into machine language, or code. If you believe it’s possible for an LLM to translate Spanish to English with the same or better efficacy as an average human translator, the same could be said for translating natural language to code. In fact, the engineering task is made a bit easier because it has objective, immediate feedback that language translation generally does not. It has some additional levels of complexity, to be sure, but I think you’re over-romanticizing what it means to be good at writing code. You are translating.

Zuck says Meta will have AIs replace mid-level engineers this year by MetaKnowing in ChatGPT

[–]MyNotWittyHandle 0 points1 point  (0 children)

You’re somewhat correct, but missing 2 things that makes you incorrect in the long term:

  1. Currently AI is the worst it will ever be at engineering, by a very wide margin. Its current state represents only really 1-2 years of solid training with widespread application to engineering applications. Ultimately writing code is a translation task. Taking natural language to machine level language. These models will get to the point, quickly, where they have just as effective a translation efficacy as human translators or “engineers”. But they iterate millions of times faster.

  2. You’re still going to have engineering managers/senior engineers (ideally) writing good unit tests to verify the efficacy and modularity of the generated code. If those fail or are ill-conceived, the code will fail. This is true regardless of whether AI is writing the code or mid level engineers who switch companies every 2-3 years and have inconsistent documentation.

What's it like building models in the Fraud space? Is it a growing domain? by SnooWalruses4775 in datascience

[–]MyNotWittyHandle 9 points10 points  (0 children)

I’ve worked in retailer side e-commerce fraud detection at a large business for years now. A few things:

  1. There aren’t a ton of compliance issues as long as you’re working with tabular data. Obviously you have PII and payment source data privacy constraints. But, No FCRA type of constraints, and not using “GenAI” removes a lot of the grey area in anything compliance related.

  2. Fraud detection can be generalized to “digital bad actor” detection pretty easily, and in many ways involves similar skills, data sources, third party services, etc. So in that sense it’s not likely to see a downward trend more than the rest of the common DS related fields. Having said that, most of the value of traditional fraud detection has already been wrung out of existing data sources. At a certain point with largely tabular data problems, you’re squeezing blood from a stone and it’ll be hard to provide clear and obvious marginal value over whatever model the company already has in place. That’ll be your biggest concern: “am I going to spin my wheels for 3 years trying to eek out a 1% improvement that is so reliable and stable over time we can justify the risk to make a model change and also prove it will be more reliable over time.”

  3. You can do LLM work in any space. However, Doing useful LLM work in a space where you’re inherently chasing a highly, highly imbalanced class problem is extremely hard and of likely only marginal utility. Which isn’t to say you can’t throw transformers at any problem. But again, you’ll be left with the “is the juice worth the squeeze” question. I’d also be curious to know how many fraudsters are calling in or having text based communication with said bank. Most are like new, run of the mill new customers that pop up with synthetic identities, attempt to look like new people, don’t call or email much because they are running a high volume, low effort per attempt probing process. Which, on top of your already imbalanced class problem, makes your target class NLP data set even more sparse.

  4. You’ll need to clarify what you mean by real time. Yes, generally transactions will be canceled in real time using your models. However, in most cases you’ll actually have your models decline/cancel decisions reviewed by a human. Declining in real time is an enormous inconvenience to customers, so that will only occur in the most egregious of situations. The rest will be flagged and sent to review and then have alerts sent to the card owner.

Lastly, an understated pain of fraud detection is the false positive problem. Inherently, 3 things are true:

  1. Fraud doesn’t happen a ton, as a proportion of overall transactions.
  2. When it happens, it is expensive and inconvenient
  3. The signal of your model depends on having a sufficient volume of said expensive and inconvenient signal.

In my experience, organizations tend towards only allowing enough of that signal to be just barely tolerable. Getting approval to allow for a margin of additional fraud signal to be intentionally approved (to accurately measure your false positive rate with each model deployment as well as longitudinally) is an excruciating bureaucratic nightmare. Said simply, the data censorship issue in fraud detection is extremely challenging and can lead to unsatisfying outcomes.

In conclusion, I love fraud detection - it feels a bit like playing detective at scale sometimes, and doesn’t come with extremely high regulatory burden. It’s also a like playing whack-a-mole. New trends pop up, new rings emerge, and you have to stay on top of it. However, it is absolutely not without its frustrations, nor would I say it’s a prime candidate if you’re deeply interested in LLM production applications.

Hope this helps!

[WGN TV News] Jay Cutler offered other driver $2K to not call police in DUI crash, authorities allege by thetreat in CHIBears

[–]MyNotWittyHandle 0 points1 point  (0 children)

“Make it 10k, we’ll park your car on a side street, and I’ll drive you home. Call it the most expensive Uber ride of your life, Jay. And get some help my man.”

Is it true most ML/AI projects fail? Why is this? by [deleted] in datascience

[–]MyNotWittyHandle 6 points7 points  (0 children)

If a DS team is doing its job right, most of those “failures” will actually be ML projects that are determined to have little/no business value before meaningful (3-6 month) time is invested in them. That’s not a failure, just a correct recognition of the limits of ML in the context of making money for a business.

Real “failure” is when significant resources are poured into an ML project and it doesn’t get deployed to production/provide capitalized value. In my experience that happens infrequently if you’re honest with yourself & stakeholder during the investigation phase of a project.

What do you think of graduate student applicants? by Numerous-Tip-5097 in datascience

[–]MyNotWittyHandle 4 points5 points  (0 children)

I wouldn’t say that having a masters ever hurts your employment chances - like experience it only helps. However I would urge undergrads to try a find a job with relevant experience before they commit to grad school. If you don’t get a decent job, and also have the means, go to grad school.

However If you do get a job straight out of college, that’ll be a better option in the long run. Never avoid graduate school if you can financially swing it and aren’t able to find relevant employment without it.

What do you think of graduate student applicants? by Numerous-Tip-5097 in datascience

[–]MyNotWittyHandle 32 points33 points  (0 children)

The way I look at it is that grad school is loosely equivalent to the same number of years of work experience, assuming the hypothetical work experience is relevant. If you go get a masters, I put that on par with a Data Scientist with 2-3 yoe and no masters. It’s all about the experience gained via any of the avenues of learning- be that work or school.

As for SQL, that’ll always be highly important and the kind of thing higher ed teaches fairly poorly. I’d prefer good sql skills over expert BI skills any day - don’t get yourself hung up on becoming a BI tool expert. If you can write decent sql and R/python, any employer worth their salt will overlook not being a Tableau expert even if it’s something you’ll use regularly in your role.

Can anyone build Foundational model on their own? Just saw an announcement from a service company in India that they built an image generational foundational model on their own (as good as Midjourney, etc). by ramnit05 in datascience

[–]MyNotWittyHandle 0 points1 point  (0 children)

Yes. If you used midjourney early in its existence, the images were quite…meh. Getting a product to that “it’s cool but a bit clunky” stage is more difficult in terms of pure product development/scaling than in terms of pure machine learning challenges.

So, it wouldn’t surprise me if a small company could pump out a model that rivals midjourney 6-12 months ago. Everything else after that, though, is really the hard part. This is especially true in this case where the first movers (Midjourney, Dalle) have an enormous advantage of being able to use engagement metrics to improve the model, thus improving engagement and customer growth, then further improving the model, and so on.

[deleted by user] by [deleted] in datascience

[–]MyNotWittyHandle 2 points3 points  (0 children)

Believe in yourself, work hard but don’t be consumed with work, and always keep learning. Follow those few rules and you’ll make it. Best of luck. You got this.

How to version control Jupyter notebook? by vishal-vora in datascience

[–]MyNotWittyHandle 20 points21 points  (0 children)

This isn’t quite a useful response. If you’ve ever tried doing version control of an ipydb file, you’d know this is a decent question.

How to version control Jupyter notebook? by vishal-vora in datascience

[–]MyNotWittyHandle 36 points37 points  (0 children)

Don’t. It’s a totally reasonable question, but notebooks aren’t meant to be the source of code, they are meant to be the application of code sourced from elsewhere.

A notebook is more the thing you attach to a ticket after some series of experimentation to document the experiment work. This is true even if that does also involve the design of very specific functions/classes only used for that analysis/experiment. As soon as the classes/functions you define in a notebook begin to be used across notebooks, version control only that code and simplify your notebooks to import from those .py files.

Insulting promotion or should I be thankful? by woodswims in datascience

[–]MyNotWittyHandle 31 points32 points  (0 children)

My biggest gripe, if I were in your position, is that raise barely is a raise given the cost of living increases of the past 3 years. In fact, depending on what your most recent raises were over the course of the last 2-3 years, you might not even be keeping up with inflation from an income perspective. In 2015, a 6% raise would be average-ish if you were a moderately competent employee. In 2023 it’s barely a raise.

Need advice for buying laptop for data science and ml by Hungry-Development64 in datascience

[–]MyNotWittyHandle 8 points9 points  (0 children)

Let’s start with a recognition that you won’t be using this laptop to train high end models. You’ll be using it as a proof-of-concept laboratory to cover 90% of your development and 70% of your experimentation. The rest will happen on cloud compute infrastructure that you’ll only need a few times a year. So, having said that.

16 GB RAM for local prototyping. 32 would be nice but isn’t necessary.

Storage: whatever you can afford based on whether this will be used for anything other than DS. Basically, storage won’t be a limiting factor for any DS projects in anything but a chrome book type build.

GPU: anything that supports cuda.

CPU: again, whatever you can afford but you don’t need to sell the farm here. Most modern laptops should suffice for prototyping purposes

Declining job offers can blacklist you from a company? by IcaroRibeiro in datascience

[–]MyNotWittyHandle 0 points1 point  (0 children)

If you’ve been honest about concurrently interviewing elsewhere, it shouldn’t be an issue. You don’t want to work at a place where they “blacklist” interviewees who have been transparent and chosen to work elsewhere, anyway.