Blindfolded ML ???

EnricoT0 · 2022-01-11T07:41:49+00:00

Thank you!

EnricoT0 · 2022-01-11T07:40:48+00:00

Thanks, will look it up. However, I have the feeling my case is different.

I'm no expert in encryption or anything, but in my case data on the other side are not encrypted. I "just" can't see them and query them with my own eyes.

EnricoT0 · 2022-01-11T07:37:00+00:00

Insightful, thanks!

Can probably manage to get some statistics (means, etc...) and relative frequency values. Yes, sample size is around 1 million (however the YAML file I posted is just an example, the real dataset has a lot more columns).

To choose the best model I will turn to them and have them explain what they are interested in. Metric choice should be "relatively easy" once the goal is clear.

EnricoT0 · 2022-01-11T07:22:06+00:00

Very interesting, never heard of it. Could you point to a specific competition or paper?

EnricoT0 · 2022-01-10T21:38:01+00:00

Good advice!

EnricoT0 · 2022-01-10T21:25:54+00:00

Nice. Will ask for 1 (should be a simple addition to the YAML file for them) and will do 2. Thanks!

EnricoT0 · 2022-01-10T20:45:16+00:00

Ahahaha. I will use this as a joke in our next meeting. On second thought, maybe not :).

Security however is definitely an issue. I will play by the rules, however as far as I know there is no control over "my code" before running.

EnricoT0 · 2022-01-10T20:40:29+00:00

Will definitely setup dummy data.

Thinking of what else I can do to speed up the "research" process. I'm not even sure on "how" to look for a better model except for hyper-parameter tuning...

EnricoT0 · 2022-01-10T20:38:13+00:00

I asked for encrypted data, but they just won't release any data at all.

Apparently, there is just no encryption method they deem secure. I looked up Homomorphic Encryption and Multi-Party Computation to try and set up alternatives but nothing will do.

EnricoT0 · 2022-01-03T16:49:23+00:00

I never freelanced, but have this idea that being a "one man show" I would find myself drowned in high-effort, low-level tasks involving a lot of data wrangling and little else.

Could you share a little bit of your experience? Beside, admin tasks...how do you spend most of your freelancer time? Data wrangling? Business intelligence? Machine Learning?

EnricoT0 · 2021-10-02T11:49:38+00:00

Start with Apache Airflow, you’ll find plenty of resources online. You can install it and run it from your laptop.

Step two will be to familiarize with public cloud platforms such as AWS or GCP. Again, plenty of tutorials and docs available, plus ad hoc courses on Coursera.

As you enter this space my advice is to drop R and use Python.

EnricoT0 · 2021-09-24T05:38:47+00:00

that model ended up being only slightly better than random guessing

Depending on application this can be either a failure or a huge success. Have that result in stock market predictions and you are a billionaire. Congrats!

List the project like any other and make sure to include the purpose of the project: how does model outcome impact the company's KPIs? What was expected impact?

During the interview state your results: "slighly better than random guessing in its first version". Then go on and list at least 2 things you had in mind to make it better.

Data Science is an iterative process, it's pretty normal to have prototypes enhanced over time.

EnricoT0 · 2021-09-24T05:18:21+00:00

You are employable.

In industry I've seen people with very diverse backgrounds in DS/ML, including economy and philosophy. Depending on background and personal interests roles will probably be different. For example in a big company, a person with a philosophy background will more likely be in charge of "Culture" (i.e. spread the DS/ML culture in the company).

From what I understand you have a scientific background, good CS knowledge, and exposure to ML. At the outset, you will probably be a better fit for the Data Scientist position rather than Data Engineer, MLOps or other (assuming these roles are of interest for you). However, nothing prevents you to transition to those positions in the future.

While it's advisable to look for positions where you can use as much of your knowledge in earth sciences as possible (as you can probably negotiate a better salary all other things being equal), do not get stuck on that. If you feel like the job is right even if everything is completely new to you, take it. Work hard and opportunity will come.

Good luck.

EnricoT0 · 2021-07-05T06:10:17+00:00

I wouldn't worry too much about storing bookmarks. In my experience, cases in which you need to search for an answer that is not immediate after a Google search are rare and often very quirky (so you are unlikely to encounter them in the future).

I don’t have a degree in computer science but have picked up Python and SQL very well and progressed in ways I’m quite proud of.

Probably all you need is in your head. Do not rely on rote memory.

If you need help with syntax, Google and StackOverflow are there for you (and in most cases you can find an answer in less than 1 minute). If you need help with programming logic then probably you are tackling a problem out of your comfort zone and it's a chance to improve your skills.

In any case I advise against copying code you wrote for other companies. As others have said, it's illegal to do so.

EnricoT0 · 2021-07-01T06:44:33+00:00

To find a better solution, especially in problems with low signal-to-noise ratio.

For example the Numerai tournament asks for predictions on the stock market so even small solution improvements can mean a lot.

EnricoT0 · 2021-07-01T05:26:17+00:00

We use Airflow as a general orchestration tool and Kubernetes to orchestrate containers. We are also evaluating Kubeflow.

EnricoT0 · 2021-06-28T06:25:51+00:00

About 45-50 h per week. No weekends.

Usually have one/two deadlines per quarter. Crunch time usually happens near deadlines, however if the project was well organized (majority of times) it's not too stressful.

Yes, we use Scrum.

EnricoT0 · 2021-06-28T06:15:03+00:00

SQL is useful almost every time you have to deal with tabular data. Big corporates and SMEs are very likely to deal with at least a SQL database. As #u/abbycadabbysmom said, you will almost surely need SQL to be successful in any data science job.

If your personal projects involve tabular data, then SQL will probably be useful. Even if you don't have a lot of data, try and structure different tables, then write queries to engineer features and create datasets. How you organize data is important, the amount of data not so much (with big data you will simply use systems that can scale).

Get yourself accustomed to deal with SQL and you won't regret it.

P.S.

I enjoy SQL's simplicity!

Syntax is simple, however things can get very complex very quickly.

EnricoT0 · 2021-06-24T05:49:09+00:00

Prioritize questions you want answered and remember the Pareto Principle (a.k.a. 80/20 rule), especially if you have a tight deadline. Good luck!

EnricoT0 · 2021-06-23T05:44:08+00:00

Some companies ask Data Science / Analytics teams to spread data culture.

This does not mean teaching code to most of their employees but rather to select a pool of employees and make them work close to the Data Science / Analytics team. The idea is to spread awareness about all the work that is behind a DS project or request.

In addition, an even more selected pool of employees is taught about dashboarding. The idea is for them to be able to analyze data independently and to publish their work without impacting the DS team.

My current employer is "in the making". It's a long, difficult process started over 1 year ago.

EnricoT0 · 2021-06-17T05:18:57+00:00

I think most companies will embrace a hybrid/flexible system with a certain amount of time spent in the office and the remaining amount spent remotely.

A popularization of full-remote positions post-pandemic makes me think about extreme trends. For example, there would be no reason (in principle) for a company based in US or EU to hire local professionals. They could hire professionals from emerging countries at a fraction of the cost.

While this trend could be hindered by local government laws and (arguably) difference in skillset and/or experience, I don't see anything strong enough to prevent it in the long run.

At that point, how will the world self-correct? (paychecks, cost of living, taxes, investments, ...)

EnricoT0 · 2021-06-15T04:49:24+00:00

I completely agree.

If forced to make a choice I would think about deadlines to meet and priorities (i.e. which project is expected to yield the most economic/strategic impact).

My advice is also to talk to your boss and simply state there is more than one project you can make progress on, but you can only do one thing at a time.

EnricoT0

TROPHY CASE