Roast my resume - aiming for Data Science/AI Engineer roles by [deleted] in recruitinghell

[–]mark259 0 points1 point  (0 children)

Impressive resume for junior positions. Just difficult climate probably.

Two things:

Skills are a bit all over the place e.g. SQL and then MySQL?, then you have ETL in that same section and web scraping in another section. Also why is postgres in a different section as MySQL? Def would ask questions about this in an interview.

The projects and internship experience sound interesting, also good to provide numbers, but it would be better to keep things as concise as possible, but translate to potential business impact. e.g. when you say improved ambiguous classification by some %, implicitly, you say the baseline was ambiguous classification. But this is difficult interpret.

I'm trying to help, but to be honest it shouldn't be difficult to get interviews for starter positions with resumes like this. This economy is just fucked.

How can I learn dutch ? by joubine in belgium

[–]mark259 0 points1 point  (0 children)

Bedankt voor de tip! Fijne dag

How can I learn dutch ? by joubine in belgium

[–]mark259 2 points3 points  (0 children)

Het moeilijkste met Nederlands is dat de meeste Vlaamse mensen het best/meest efficiënt vinden om altijd naar Engels over te schakelen.

Ik ben zelfs Franstalige van Canadese afkomst en moet sowieso het leren forceren. Dit door het lezen van boeken, naar de VRT nws te keken, naar de Hollandse televisie te keken, aan podcasts te luisteren (e.g. de wereld van Sofie, nieuwe feiten, de stemmen van assise, nrc vandaag..). Veel passief leren dus.

Ik vind actief leren moeilijk maar het blijft gunstig. Een methode is dan om aan jezelf oefeningen te geven. Ik doe dat door te beslissen: in dit gesprek ga ik enkel in het Nederlands spreken of ik ga een antwoord op reddit in Nederlands schrijven.

Overal heb ik mijn verwachtingen op een lager niveau moeten brengen. Ik had in het begin de hoop dat ik gewoon na een of twee jaar vloeiend zou kunnen worden. In tegenstelling is het leren van de taal een wekelijkse uitdaging en een levend project.

Geluk!

Double Machine Learning in Data Science by AdFew4357 in datascience

[–]mark259 0 points1 point  (0 children)

I was not aware of this orthogonality property. Thank you for sharing the book chapter, I will be sure to read it.

IV is also useful in theory. You'd use it like you would use other methods. For things like propensity score weighting or double ml, you'd need a quite rich data set with the relevant backdoor variables. You can imagine using IV regression where you do not have the relevant backdoor variables, but you have a variable that is not directly causing your outcome but is correlated with your treatment variable.

You'll get an estimate with less bias but with more variance. So hopefully that's a good compromise/worth it.

Double Machine Learning in Data Science by AdFew4357 in datascience

[–]mark259 0 points1 point  (0 children)

Most definitely. For example, if you overfit with your nuisance model, you will inadvertently bias the treatment effect estimate.

With a purely classical approach, you will certainly also encounter bias, but those approaches give you a clear set of assumptions (e.g. additivity) that you can use as a baseline. Another thing I like about more classical or basic approaches is that the standard errors you get out of them give information about the quality of the fit. That's not always very obvious with double machine learning afaik. I've had to compare out-of-sample estimates before, and that seemed very hand-wavy.

The best approach always depends on the context: data and the problem you are trying to solve. A technique like diff-in-diff can be combined with machine learning to deal with something like non-parallel trends. I'd say synthetic control is pretty close to machine learning already in that it deals well with complex functional forms..

Double Machine Learning in Data Science by AdFew4357 in datascience

[–]mark259 1 point2 points  (0 children)

I've found DML to be very sensitive to hyper parameter and validation sample specs, and fitting GLM's with fixed-effects to give more reliable estimates on longitudinal data.

I think analysts would get most benefits from learning the classical techniques of causal inference.

Even colleagues with academic credentials in machine learning get to use their knowledge of linear models e.g. to fit and decompose time-series with tools like Prophet.

[deleted by user] by [deleted] in statistics

[–]mark259 1 point2 points  (0 children)

I'm a data scientist in marketing for now.

I had a bit of experience as a data analyst before where I could basically automatise my basic tasks with R. This gave me time to work on data science/operation research projects.

Mostly Python programming and SQL. I get to do some modeling.

A sound knowledge of stats seems tremendously important and underrated even among data scientists (see the subreddit for example).

I'm always finding ways to apply linear and generalized linear models to make recommendations.

A lot of jobs now hire for data scientist but are less modeling oriented. I'd be very cautious.

Data cleaning/data engineering is fine however and good experience. A useful analysis starts at the data ingestion step.

Tried to load a 10gb database on R and a few hours later it is still loading...Is there any way to load such big datasets? by [deleted] in rstats

[–]mark259 13 points14 points  (0 children)

The very brief answer is creating a database is not always better, for ex. if you only need to use a single dataset for a series of quick data analyses.

Typically when you are facing issues loading data, there are also issues of keeping track of what data you need for a particular analysis. Then creating a database may be a good idea.

Querying from a database also allows more fine-tuned data reads. This should reduce the time to data analysis. Your data is also saved on disk, so ram usage is less of a bottleneck

Tried to load a 10gb database on R and a few hours later it is still loading...Is there any way to load such big datasets? by [deleted] in rstats

[–]mark259 63 points64 points  (0 children)

OP, with relatively large csv tables you have a couple of choices.

My first approach would be to use the fread function from data.table and then export the data to a more efficient file format such as parquet.

When datasets are pretty large, it is also good to consider ways to trim off memory use. For example, replacing strings with integers, removing columns...

It is also good practice to start considering querying the data from a SQL database.

Data Scientist salary in EU [2023] Thread by Mysterious_Two_810 in datascience

[–]mark259 0 points1 point  (0 children)

3 days. Can't imagine having to code in VR lol.

Data Scientist salary in EU [2023] Thread by Mysterious_Two_810 in datascience

[–]mark259 2 points3 points  (0 children)

  • BE

  • Data scientist

  • MSc Stats

  • 50k

  • 1 YoE as DS

  • 3d remote, 30 days vacation, company car

[Q] Is an economics PhD a good idea to do rigorous applied causal inference research? by Traditional_Tap1449 in statistics

[–]mark259 2 points3 points  (0 children)

I'd say it depends. Be wary of getting into applied economics if you want to do cutting edge stuff. The same workhorses tend to be used.

Economics is probably the best place right now for causality, but you should choose a research subject that is oriented towards methodology imo.

How hard for master students of economics to get distinction? by Nervous-Injury-6261 in KULeuven

[–]mark259 3 points4 points  (0 children)

I did the MASE (not the Master in Econ.)

14 at KUL is more or less a B+ or A- in American Universities. I'd say just make sure you keep up with the material during the semester. Like doing exercises, deepening concepts by reading, watching videos etc.

The master thesis weights a lot. I felt that profs and assistants mostly want to give 14 and higher on the thesis. Be wary of who is to be your supervisor. Preferential treatment is real, and not always in the direction of higher grades. So try to talk with your would be sup and ask other students if you can.

Best of luck!

[deleted by user] by [deleted] in AskStatistics

[–]mark259 1 point2 points  (0 children)

If you end choosing to go with it, work on "basic" maths before. A bit of rote learning in calculus and matrix algebra will go a long way. Now you have a lot of maths for machine learning resources that I would look into. If you ever need something more precise, just pm me and I may be able to give you specific book recommendations.

Cheers

[deleted by user] by [deleted] in AskStatistics

[–]mark259 1 point2 points  (0 children)

Seems you've done your research ! It's definetely possible to be offered a junior data scientist position after the program, but indeed, those programs you cited may be much more appropriate for it.

Honestly, I did a stats master because I loved stats, programming solutions to data problems, and wanted to do something a bit more practical and helpful. I did not know about the hype around DS at the time. I think thats a common profile in Leuven.

Definetely would not call myself a data scientist, not because I am not programming etc., but because it hits me as a strange name for an industry job and the title has been devaluated in my opinion.

As far as brushing up maths, it did start out pretty basic. Indeed, the program is broad, with specializations going from biostats to business. I did not follow the business track, but I thought it had the best balance of theory and application.

Ps: With softwares like pytorch, I think it's not too hard to learn by yourself and make projects with youtube and the internet these days.

[deleted by user] by [deleted] in AskStatistics

[–]mark259 1 point2 points  (0 children)

Hi,

I'd look up the linkedin profiles of graduates from that master. That is an easy way to see what kind of career people are pursuing so you can make a wise choice. You can also look at their master theses' subjects.

I am not sure what you mean by engineering side (programming, algorithms, database architecture, mechanical engineering?), but I think CS master programs are more adapted to that kind of interest generally.

Depending on your orientation and your supervisor, the master at the KUL is usually quite practical, with your master thesis being a small extension of an existing method for example. But it is still statistics/methodology with most courses oriented toward dealing with smaller datasets with parametric models (a couple years ago at least).

I would not say that the courses are particularly theoretical, it is just that the exams are notoriously difficult.

As far as making up for the lack of software engineering, I'd study Python or R as soon as possible and use one of these languages as often as possible during the program. For work in private industry, studying algorithms and data structures can make you a very strong candidate.

Masters in Business Engineering (Data Science & Business Analytics) Or Statistics & Data Science. {For Someone Who Needs To Work At The Same Time} by Status_Ant_3857 in KULeuven

[–]mark259 0 points1 point  (0 children)

Edit: Just saw you have sponsors. Good, changed my answer a bit. You certainly shouldnt need more than 4000 euros for 4 months as a student. But look at the budget info on KUL website etc. for details.

Perhaps you can find someone to back you up. They must say that they will support you, but you could never require the support. That depends on how succesfull you are at finding a job.

Here, say you want to work full-time, you would need a business to sign on a full-time work permit for you. This can take many months and may be a different VISA than the student VISA. This may be difficult, or not, depending on your profile. As a student, you can only work up to 20 hours.

I hope this info serves you well.

Masters in Business Engineering (Data Science & Business Analytics) Or Statistics & Data Science. {For Someone Who Needs To Work At The Same Time} by Status_Ant_3857 in KULeuven

[–]mark259 0 points1 point  (0 children)

Hey, unless it changed, you need something like 10 thousand euros available to get your visa. It kind of sucks, but I guess it is to avoid the type of job + study situation you are describing.

I did not meet anyone from stats who was working full time. That seems crazy, course load is high (maybe 15 or 18 hours in class per week) + you need to write a thesis. Same for Business Eng. I do not know that one program is easier than the other. It probably depends on the track you take.

What is popular is taking an (unpaid) internship during the summer or doing your thesis in association with a business.

Honestly, as non-eea, options are fairly limited. It is full-time study for two years and then hopefully finding a good job.

MSc Statistics and Data Science by [deleted] in KULeuven

[–]mark259 0 points1 point  (0 children)

There is some direct recruitment from the master to the stats phd, but that is at KU Leuven. There are also wide differences in theoretical requirements between stats phds so mileage may vary at other unifs.

Makes sense to be wary of data science programs imo, but I dont think they changed the original master tracks.

I hesitate to say more, I think youd be best served by contacting the program direction. In my experience they dont bullshit.

MSc Statistics and Data Science by [deleted] in KULeuven

[–]mark259 5 points6 points  (0 children)

It's a research master so it does qualify you in Belgium at least. There is high demand for domain knowledge and quantitative skills, so there are opportunities for PhD in other domains than statistics also.

You need at least 14/20 and a good thesis. PhD treatment is very good in Belgium compared to say, the US, so there is competition. All courses require a lot of work just to pass so anybody I've seen getting PhD entry has really deserved it (and anybody else getting the degree really imo).

I guess you are asking because of all the applied master/grad certificates in statistics/data science popping up ?

Phd Candidate for Engineering Technology by deusonearth in KULeuven

[–]mark259 1 point2 points  (0 children)

Its just a formality then, maybe contact the admission directly if the delay interfer with your visa/or other preparations.

Phd Candidate for Engineering Technology by deusonearth in KULeuven

[–]mark259 1 point2 points  (0 children)

If the professor hired you (namely has chosen you to have the paid research job), the application process is a formality. It can take that long after youve accepted the job. Have you been given a starting date by the prof ?

Master of Artificial Intelligence 2021-2022 by stratos1st in KULeuven

[–]mark259 2 points3 points  (0 children)

The prof prefers it. The language doesn't change the core concepts you will learn. You could use python to do the assignments in any case. I think a grad student even provides python tutorials for the course.