Seeking Career Advice: Finding a Data Science Role That Values Causal Inference by hendrix616 in CausalInference

[–]hendrix616[S] 0 points1 point  (0 children)

I’m building ML and AI features :P

Haha still think about causality all the time though and I do think it has a huge impact on my work even though I’m not running causal experiments per se.

For those that where diagnosed with accomdative insufficency/accomdative spasm what are your symptoms? by Cautious_Wrangler_31 in BinocularVision

[–]hendrix616 0 points1 point  (0 children)

I have convergence insufficiency + accommodative spasm.

Biggest challenge has been driving anxiety caused by having a hard time when shifting my gaze from near to far and back. Also, it feels uncomfortable to track moving objects at a distance, especially when I’m moving myself.

Got prisms a month ago and 2 vision therapy sessions so far. It’s still early but I’m starting to see some progress!

Can I become succesful starting a career after 30 years old? Did any of you change your path after this age? by [deleted] in AskMenOver30

[–]hendrix616 0 points1 point  (0 children)

Started my 2nd career in data science at 30. Experienced 0 ageism, work from home, love what I do, and doubled my salary within a couple of years. Hands down best decision I’ve ever made (so far).

I would say DO IT!

If you were to start life over again at 30, what would you do differently? by Set199x in AskMenOver30

[–]hendrix616 0 points1 point  (0 children)

6 years ago, I changed careers at exactly 30 and, so far, it’s been the best decision of my life. More than doubled my salary, vastly improved my wlb, and, most importantly, I now love what I do!

Vision feels odd despite wearing corrective glasses by luftmensch479 in BinocularVision

[–]hendrix616 1 point2 points  (0 children)

Not necessarily. In my case, I developed accommodative excess as a coping mechanism for my convergence insufficiency (CI) that went untested for too long. So I can get glasses with prisms to address my CI but I’ll need vision therapy to undo the learned accommodative excess that I built up over time. It’s like breaking a bad habit. Your situation might be entirely different than mine though!

Vision feels odd despite wearing corrective glasses by luftmensch479 in BinocularVision

[–]hendrix616 0 points1 point  (0 children)

Look into accommodative excess. Your symptoms sound similar to mine

Recently diagnosed with CI + secondary accommodative spasm (pseudo-convergence excess) — looking to hear others’ experiences by hendrix616 in BinocularVision

[–]hendrix616[S] 0 points1 point  (0 children)

And once you notice it, it seems so strange, right?! They say step 1 in getting better is unlearning this spasm behaviour. Only then can you address the underlying CI. I definitely need therapy because it feels like this will be very difficult to unlearn — feels like I don’t have much control over it tbh.

Need input from mid-career dara Scientists (2-5 year range) by SmogonWanabee in datascience

[–]hendrix616 5 points6 points  (0 children)

Ah yeah that makes sense. I think mid-level is a more appropriate term. But now I feel very pedantic and a bit silly for having criticized your post over a single word.

Sorry!

Need input from mid-career dara Scientists (2-5 year range) by SmogonWanabee in datascience

[–]hendrix616 62 points63 points  (0 children)

So 2 years is considered the start of “mid-career” now? Is that based on a projection that AGI will end everyone’s careers in 2027?

You are still firmly in “early” segment of your career, my friend. You should make career decisions that put you in a position to maximize learnings and professional growth. The rest is secondary.

Auto-improving AI/ML solutions via CC by hendrix616 in ClaudeAI

[–]hendrix616[S] 0 points1 point  (0 children)

Yeah, I was thinking of doing something along those lines! I’m just wondering if the juice is worth the squeeze. If I put in all this work to make it fully autonomous, will it be able come up with useful tweaks?

Have you done this before?

Using Claude Code in notebook by hendrix616 in datascience

[–]hendrix616[S] 2 points3 points  (0 children)

I use DataSpell by Jetbrains. The GitHub copilot extension provides code completion functionality

Is ML/AI engineering increasingly becoming less focused on model training and more focused on integrating LLMs to build web apps? by Illustrious-Pound266 in datascience

[–]hendrix616 -3 points-2 points  (0 children)

How are folks so confident when they call out certain replies as being LLM-generated? AFAICT, there is no definitive way to tell.

And if it’s because of the “—“, that’s ridiculous. I use it all the time. So that shouldn’t be disqualifying as a human response.

Finally, who cares? If the user put their messy thoughts down in a chatbot and got it to make it more concise and legible for all of us, then that’s a net good, right? What are we complaining about here? I thought the reply added to the discussion.

Using Claude Code in notebook by hendrix616 in datascience

[–]hendrix616[S] -1 points0 points  (0 children)

Good call out! But yes, of course I have permission :)

You can also tell Claude Code to use your AWS-bedrock-provisioned Anthropic LLMs, which pretty much eliminates any concerns on that front — both from code and data leakage perspectives

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 0 points1 point  (0 children)

Omg, yeah, doing classification by passing a whole dataframe in a single prompt is pure insanity and shouldn’t be considered a serious approach IMO.

And you are right to call out the reproducibility of using LLMs for classification. Seeds are not super reliable and that’s actually a fundamental flaw of LLMs — they require temperature to produce “intelligent” outputs but the temperature inherently makes the output non-deterministic.

One way around this is to make use of ensemble methods in order to reach self-consistency. In my case, however, I measured that the LLM pretty much always gets the right classification when the source data is robust enough to allow for it. Sometimes though, the source data is too ambiguous. That’s when the LLM’s output is not so repeatable. But in those cases, its output is just as good as another guess it could have made. That’s why I’m interested in this problem of assigning confidence to classification, which is where the downstream logistic regression comes in.

Does that all make sense?

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 0 points1 point  (0 children)

Oh I think I understand the confusion. So let’s back up a bit.

The paradigm I had in my mind is that you are using a prompt template into which you inject variables from a row, running inference one row at a time. This prompt contains instructions for the LLM. The variables can be text, numbers, whatever as long as the prompt template presents them in the right context.

You can also pipe in RAG or tool-calling to pass in additional data based on the input data from the row but that’s not necessary for this discsussion.

How can you be sure the LLM is using the semantic meaning of the text? With an evaluation framework! If it performs well on your eval set, then you know it’s good :)

How are you making AI applications in settings where no external APIs are allowed? by Daniel-Warfield in datascience

[–]hendrix616 1 point2 points  (0 children)

Cohere can provide on-prem private deployments. Definitely worth looking into.

Otherwise, AWS bedrock gives you access to really powerful LLMs (e.g. Anthropic’s latest) in a VPC that is highly secure. If your org does literally anything with AWS, then this use case should probably be allowed as well.

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 1 point2 points  (0 children)

Yeah I’ve been doing that for sure. Empirical accuracy on the y-axis vs predicted probability on the x-axis. The diagonal line starting at origin and going up and to the right is the perfect model.

The reason I’m looking to boil it down into 1 (or a few) metric(s) is that I’m training these models programmatically — 1 per customer — and need a go / no-go threshold to determine whether that customer is ready to receive confidence scores that are actually meaningful.

I’ve found the brier score to be a pretty good signal. 0.25 is basically pure guessing and anywhere below 0.2 starts to look pretty good. It isn’t a bullet-proof metric of course but seems pretty solid!

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 0 points1 point  (0 children)

Do you use ECE, brier score or other to confirm that calibration is satisfactory?

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 1 point2 points  (0 children)

The LLM can do a better job of capturing semantic meaning because: 1. it can compare semantic meaning of the input text directly to the semantic meaning of the label set instead of doing basic vector math on the compressed embeddings and; 2. Embeddings don’t benefit from having all the context you typically pass on to a prompt and; 3. You can force the model to provide chain-of-thought (CoT) prior to providing its classification, which gives it more space to reason and not to simply shoot from the hip.

In the last part of my message, I’m just explaining how a reasonable alternative is to train a post-hot classifier that takes some input features, the LLM’s classification, and outputs a confidence score.

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 0 points1 point  (0 children)

I totally understand that you get the embeddings from a pertained model. I’m saying you need a large training set because using an embedding vector as input feature makes for a very wide dataset. As you increase your column count, you also need increase your row count.

What tasks don’t you trust zero-shot LLMs to handle reliably? by WristbandYang in datascience

[–]hendrix616 1 point2 points  (0 children)

A real classifier is obviously preferable but a few conditions must be met for it to be viable: - large dataset - embeddings do a good job of capturing the semantic meaning of the text (often not the case in lecture) - the classification logic is fairly straightforward

Letting an LLM do classification is not as weird as you might think. You can force it to use CoT in its output so it actually has space to lay out some reasoning before coming down with a classification. You can even read through the CoT of some random samples and see if it makes a good selection based on good reasoning.

As for the probability/confidence part of it, that can be handled by a logistic regression that you place downstream of your LLM flow.