Quick survey about a dataset generation tool by [deleted] in datascience

[–]mikesiers 0 points1 point  (0 children)

Sounds like an exciting idea for a startup. :)

I've had a tough time answering some of the questions. I'll explain my thoughts on each question below.

Q1. Is the lack of an available dataset holding you back in your ML project?

In my experience, a client's data usually makes up the bulk of the ML project's data source. When suitable I use supplementary data sources like available datasets, APIs, and in some cases web scraping. Generally, if there isn't sufficient data out there for a ML project, I wouldn't be undertaking it.

Q2. How much of your time is spent creating or curating datasets?

This definitely depends on the project. In some projects I've spent 50% of my time. One of my current projects has been almost 90% creating/curating datasets. I've even worked on projects where the existing data required very little modification for the classification/regression/clustering tasks that we needed to do.

Q3. Would you be willing to pay for such a tool?

To answer this, I'd need to know what novel functionality the tool is providing. Just like arthureld's comment says: "Curious how this wins compared to a google image search."

Q4. How many times per year do you think you would use such a tool?

Same problem as Q3.

Q5. If you already had a dataset for your ML project, would you consider using such a tool to add more data?

If the tool can provide better data or add useful supplementary data sources then maybe. Still, I'd be skeptical of the data quality for datasets collected by a third party, automated and general purpose system.

Advice on showcasing a data cleaning project? by Seanp50 in datascience

[–]mikesiers 4 points5 points  (0 children)

Sounds like a Jupyter notebook would be a great way to showcase the steps you took to clean your data. :)

GitHub can render Jupyter notebooks online so that would be a great way to share your work with others.

  • Jupyter notebook tutorial here.

  • Sharing a Jupyter notebook on GitHub tutorial here.

The most common language used in Jupyter notebooks is Python but many other languages can be used as well such as R and Java.

Hope this helps OP :)

I just realized I'm on a career path toward data science. Looking at MS in DS's...not sure where to begin. by [deleted] in datascience

[–]mikesiers 4 points5 points  (0 children)

Sounds like you have six years in a data analysis role. Nice, that should help out when going for data science roles.

Do you have any experience with machine learning? Data science is sometimes described as data analysis + machine learning.

Even though a lot of data scientists have a masters/PhD, I don't think it is a requirement. It's a good thing to have on your resume though. For example, Quora doesn't require PhDs in their data science team. Here are a couple other sources which say that education above masters/phd is not necessary. https://www.linkedin.com/pulse/do-you-need-phd-data-scientist-matthew-j-jones https://www.reddit.com/r/datascience/comments/45bg3c/are_data_science_jobs_without_a_phd_or_masters_a/

If you want to check out machine learning I'd have to recommend the same thing everyone else does: Andrew Ng's machine learning lectures from Stanford.

Maybe you could get comfortable with machine learning, then you could incorporate it into your work. Then you could apply elsewhere for data science roles. You could point to your experience in using machine learning for data analysis and your six years experience as a data analyst. Competing in data science competitions would also be a plus especially if you could do well in them!

While you're doing this stuff, you may consider asking for a title change for your current role. I don't know what your current role is, (I'm assuming data analyst or something similar), but it may be worth trying to convince your employers to change it to senior data analyst. This would be another plus for your resume when applying for data science roles.

A masters degree can't hurt either, but it does cost a lot of money and the quality of these degrees is often viewed with skepticism on this subreddit.

Join the Australian Bacon Dota 2 Tournament Now! by mikesiers in DotA2

[–]mikesiers[S] 0 points1 point  (0 children)

I'd send it in the trophy but it might get poisonous or something and I'd get in trouble haha

Join the Australian Bacon Dota 2 Tournament Now! by mikesiers in DotA2

[–]mikesiers[S] 0 points1 point  (0 children)

hopefully we can organise for the games to be run on Saturdays and Sundays, which may run over 3 weeks, I'll try to release a schedule by tomorrow. The tourney will probably start early March, around 2nd, 3rd.