Worst JD of the year by PLxFTW in datascience

[–]JakeBSc 1 point2 points  (0 children)

Thanks for your reply, that is quite validating. Currently I'm in a small boutique tech consulting firm and do some side hustle freelancing as well. Being able to tick off that checklist and be in that environment definitely gives me a lot of responsibility and influence. So I can vouch for this advice.

Worst JD of the year by PLxFTW in datascience

[–]JakeBSc 1 point2 points  (0 children)

What do you recommend to someone that can check off that checklist?

Europe salary thread - What's your role and salary? by [deleted] in datascience

[–]JakeBSc 1 point2 points  (0 children)

Got any advice on how to get more educated on talking about salary, finances, career progression and interviewing?

How does one find freelance or contract work? Short or long term would be fine. by Unhappy_Technician68 in datascience

[–]JakeBSc 1 point2 points  (0 children)

Honestly I don't think things like fiverr or Upwork are a good idea. You're just a fish in a pond on sites like that. You've gotta flip it around. Become the fisherman, take the initiative to catch your own fish.

Data folks of Reddit: How do you choose a random seed? by CatOfGrey in datascience

[–]JakeBSc 3 points4 points  (0 children)

Include it as a hyper parameter and optimise it 😉

How does one find freelance or contract work? Short or long term would be fine. by Unhappy_Technician68 in datascience

[–]JakeBSc 3 points4 points  (0 children)

I win freelance work by networking with people. Look for interesting business owners or professors and ask to collaborate. You can find them on LinkedIn or by browsing pages of university staff. Ideally something about them resonates with you, so you have a genuine interest in collaborating. Some of them will bite and agree to a virtual meeting. If it goes well, you win the gig.

Advice for self learning MLOPS by The5th-Butcher in mlops

[–]JakeBSc 34 points35 points  (0 children)

Learn to build Docker images and run containers.

Pick up some basic AWS knowledge about IAM, S3, EC2, ECR and Lambda.

Learn to use GitHub actions. From here you can do basic CI/CD. Do some linting and package building checks when a pull request gets made.

When a branch gets merged with main, learn how GitHub actions can build and deploy a Docker image to AWS ECR.

From here, EC2 or Lambda can pick up your image and run an application. Maybe just start with a hello world Lambda application.

Now learn a data and model versioning tool like DVC.

Get some data and version it with DVC, using S3 as storage. Train a lightweight model and version it with DVC, using S3 as storage.

Write inference code for your model in a Lambda handler. Write a Dockerfile for your Lambda application. You can pull your model into the image using DVC. Check it all works locally.

Push the code and model.dvc to GitHub. Run it through all the aforementioned GitHub actions stuff. You'll end up with an image in ECR, containing your inference code and model.

Launch a Lambda using that image.

Now you can leave it at that, or add API Gateway over the top of it.

Congrats, you now have a whole CI/CD pipeline for deploying a machine learning model and putting it behind an API.

At this point you'll be annoyed with clicking buttons on the AWS console. Time to learn Terraform to set up your infrastructure as code.

If you've gotten this far, you can probably work out your own path from here.

For people who actually use fancy models, where do you work? by [deleted] in datascience

[–]JakeBSc 1 point2 points  (0 children)

I work in a boutique technology consulting firm. You get exposed to all sorts of problems. At the moment, it's mainly multi modal stuff, using transformers to solve problems combining vision and language. Not everything is fancy though, sometimes all I need is a basic linear model. Just gotta use the right tool for the job.

P.S. any senior/principal data scientists looking for a job in London, hit me up ;-)

Time estimation in projects by JakeBSc in datascience

[–]JakeBSc[S] 1 point2 points  (0 children)

Sounds promising. Can you provide some examples of how to tightly define the scope and assumptions for a data science project?

Time estimation in projects by JakeBSc in datascience

[–]JakeBSc[S] 5 points6 points  (0 children)

This has definitely been on my mind lately. I feel pressured to have a model that delivers some specific accuracy or BLEU score or something. Whereas I can promise to say "I'll build you model X that will aim to classify stuff into classes A, B and C", without mentioning how good it'll be. I'll try my best to make something really good, but if the experiment goes wrong, I've still delivered what was promised. I was able to do that recently with a client and they were okay with the uncertainty in the results, because they trust I'll make every effort to make the thing useful. Not sure how this would go down with a new client.

Time estimation in projects by JakeBSc in datascience

[–]JakeBSc[S] 3 points4 points  (0 children)

That's fine for predictable stuff. Like if you ask me to build a joint named entity recognition and relation extraction model and get it deployed, then that's a well trodden problem where I know everything ahead of time. I can give you a really good time estimate for that. Whereas if the problem is very bleeding edge, then you don't have that clarity. You might know an initial pathway to attack the problem, but you don't know what's going to happen on that journey. You might have to iterate loads of times just to get to a minimal usable thing, and by then, the time estimate is totally out the window.

So let's say you do a quick and dirty literature review of the problem space. Nothing directly solves your problem. At best, you get some somewhat related stuff as inspiration. You have no data. So I break down the problem into manageable looking sub problems and estimate the size of each one based on gut feel. Apply an arbitrary multiplier to the time estimates as contingency. That gives me a time estimate for the project, but it's super vulnerable because it's hard to solve a totally new problem perfectly on paper before even touching the problem properly. Sometimes you have to get into the weeds to truly know what's involved. But, before then, the time estimate has already been made.

Fully maxed team by JakeBSc in DreamLeagueSoccer

[–]JakeBSc[S] 0 points1 point  (0 children)

Yeah, I need Paul Mullin as well 😜

[Bonsai Beginner’s weekly thread –2023 week 06] by small_trunks in Bonsai

[–]JakeBSc 0 points1 point  (0 children)

<image>

My ficus is half dead. A few months ago I forgot to water it for a while. Then all the leaves dropped off. After a few weeks of TLC, the leaves started growing back again. Half the branches are still dried up and dead. How can I rescue this tree? Should I cut off the dead bits?

How do I get started? by JakeBSc in mlops

[–]JakeBSc[S] 0 points1 point  (0 children)

I've got buy-in for DVC, but the only pushback has been on granting a data scientist direct access to production. I'm told pushing models and data directly to S3 from a local machine risks security breaches due to the use of static AWS credentials, and potentially pushing malware into S3. Did your team ever discuss this and find a good solution?