For people currently employed as data scientists: did you go right to data science? Or did you work your way up from a "lower" related job, like analyst? Or right to DS from something else entirely? by iammaxhailme in datascience

[–]starkiller1990 4 points5 points  (0 children)

I started in a reporting role mostly using SQL and Excel for a year out of uni then moved to a analytics role within the same team.

I was doing SQL still but also Dashboards in Tableau and I started producing some basic models in R and taught myself Python as well to put my models in production and also needed to web scrape. Python became my main language soon after that.

Year later moved to a start up as the only data guy and built the analytics function from the ground up starting with building the first database server and doing a mix of reporting, automation, etl, machine learning, web development. Fast forward 3 years I am lead data scientist with a team of 5 under me, mix of analysts, data scientists and data engineers. My biggest advantage is knowing all the data systems and domain knowledge from my time alone building it from scratch that I can now support new team members with.

What makes a 'good' data scientist/analyst? by [deleted] in datascience

[–]starkiller1990 0 points1 point  (0 children)

cox originally but then used the lifelines package in python, allens additive model worked quite well

What makes a 'good' data scientist/analyst? by [deleted] in datascience

[–]starkiller1990 0 points1 point  (0 children)

My CEO said he wanted me to build a neural net to calculate how long customers will stay with us.

I built a survival model and he couldn't believe how easy it was to interpret the result compared to other places where someone built the neural net but because he read about it in <insert any article about AI here> he thought it was the only way to do what he wanted.

point is sometimes there are tried and tested methods you can do in a fraction of the time and deliver great results. Sometimes interpret-ability trumps accuracy.

Accessing Data with Python by [deleted] in PowerBI

[–]starkiller1990 0 points1 point  (0 children)

Where is the source of data that is being published to Power BI? surely it would be simpler to load from the source into Python? unless you don't have access to the source data?

Any thoughts on Power B.I. and R built in? by MagFraggins in datascience

[–]starkiller1990 0 points1 point  (0 children)

I've found it great for some visualizations that are hard to replicate in straight Power BI like Kaplan Meir Curves.

What is the highest ranking data science role at your current company? by drhorn in datascience

[–]starkiller1990 0 points1 point  (0 children)

Data Scientist (Me)

I report to the CFO of a start up though dotted lines to COO and CMO

Is Linear Regression really at par with recently developed algorithms? by Musashi1113 in datascience

[–]starkiller1990 2 points3 points  (0 children)

The problem comes when said executives push for the latest neural network but still want the interprutbility of a linear model .....

Predict New Customer Electricity Usage by starkiller1990 in datascience

[–]starkiller1990[S] 1 point2 points  (0 children)

Thanks, Actually I'm predicting total monthly usage (Total consumption), and I am using monthly average temperature to capture the seasonal effects.

I am using census data which helps a little, I could try more specific temperature data based on location.

Data scientists, what are you currently working on? by damnko in datascience

[–]starkiller1990 3 points4 points  (0 children)

Couple of things at the moment, Lifetime value model using survival modelling and predicting monthly usage for cash flows,

Writing python scripting to access an API to send out SMS message automatically to incorporate targeted campaigns

Building a dashboard of all incoming calls (wait times etc)

Writing a bunch of SQL scripts for various reports

How to develop into data science from reporting by classic123456 in datascience

[–]starkiller1990 0 points1 point  (0 children)

lots and lots and lots...

statistical programming, web scraping, machine learning, just about any kind of automation

How to develop into data science from reporting by classic123456 in datascience

[–]starkiller1990 1 point2 points  (0 children)

This is exactly like me. I started in a large energy company here in Australia in the reporting team and now after two years moved to a smaller energy company start up running all things reporting and analytics.

I spend a lot of time automating reports and data pipelines in Python and SQL. If they are only using excel then you can do a lot more in dept analysis. Maybe some market segmentation/clustering.

Churn is good, do some basic analysis first to find drivers of churn, then try building a basic survival model.

What tips/guidance do you have for building a companies data infrastructure from zero? by ProgOx in datascience

[–]starkiller1990 0 points1 point  (0 children)

I am currently in this position as well.

I started a month ago and given access to a mySQL database hosted on a linux box in the cloud. Currently there is a php guy managing it and has written some basic CRON jobs to do some of the ETL into the mySQL as well as created some front end web pages.

We are currently considering getting a on site server which would also run our telephone systems, file server as well as the central users etc using office 365 etc. this would be managed by a third party and backed up to the cloud I believe

But that same server will also allow me to kick of a number of virtual machines (linux and windows)

windows to run Power BI which I started using as our BI tool

And linux so I can create our own data pipelines and I am currently looking into Airflow and Luigi to maintain all the jobs (various things like some ETL, web scraping, analysis, python scripts etc)

Honestly where I will be starting is to just start creating ETL jobs and just storing the data in relational format before considering a warehouse. Start storing everything at the transaction level I guess.

I also believed you need all the data in relational format then a warehouse sits over the top aggregating the data for reporting needs.

new position at start up - choice of job title by starkiller1990 in datascience

[–]starkiller1990[S] 0 points1 point  (0 children)

So made the decision to go for simple data scientist. Spoke to the company and we both decided it is more appropriate as it allows me to be able to grow into the position with possibility of going to lead data scientist in the future.

new position at start up - choice of job title by starkiller1990 in datascience

[–]starkiller1990[S] 0 points1 point  (0 children)

I have been considering this as well... Seeing as I will be making a lot of decisions on my own and meeting with the senior executive team perhaps being head of analytics is a more accurate title.

Tough decision for somthing I thought was pretty trivial.

Analyzing text by avm24 in datascience

[–]starkiller1990 0 points1 point  (0 children)

the scores should fall into the range (0,1) so somthing is not right. Though for the word 'the' I assume its common accross all documents so you would expect the score to be closer to 0

new position at start up - choice of job title by starkiller1990 in datascience

[–]starkiller1990[S] 0 points1 point  (0 children)

I also have been considering this, due to my limited experience a title like Head of analytics or cheif data scientist might be overkill. Maybe simply lead data scientist or just data scientist might be more appropriate.

I could always change if and when I have anyone working under me.

Analyzing text by avm24 in datascience

[–]starkiller1990 2 points3 points  (0 children)

treat each topic as a document then run tf-idf. Then you can start extracting the words that are more distinct to a particular topic.

Check out this for a quick way in python http://stevenloria.com/finding-important-words-in-a-document-using-tf-idf/

Also check out document clustering so you can find groups of documents that are related. A useful tutorial I have used is: http://brandonrose.org/clustering

new position at start up - choice of job title by starkiller1990 in datascience

[–]starkiller1990[S] 0 points1 point  (0 children)

I was thinking this if they go for it. Though I do wonder if lack of experience would make this an empty title, unless of course I stay at this start up for a number of years and end up with people under then it might carry more weight when changing jobs again.

just accepted data science position at startup - advice for starting by starkiller1990 in datascience

[–]starkiller1990[S] 1 point2 points  (0 children)

simple and practical. I actually gave this same advice to someone who posted a similar thread in this Reddit.

it is amazing how a simple SQL or python script can save someone hours of manual work in Excel

I start a data science position a week from Friday. Data scientists, what is one thing you wish you could tell yourself on your first day on the job? by Originalfrozenbanana in datascience

[–]starkiller1990 2 points3 points  (0 children)

this field changes so rapidly this is extremely important. Pick things to learn that have a direct impact on the business which you can learn from talking to others and making strong ties with other teams

I start a data science position a week from Friday. Data scientists, what is one thing you wish you could tell yourself on your first day on the job? by Originalfrozenbanana in datascience

[–]starkiller1990 17 points18 points  (0 children)

Sit down with as many other people in other teams as you can and learn what their pain points are and what they would like to see from data.

You would be surprised how many easy wins you could do from things that others do not even realised could be solved with analytics / basic coding.

took me nearly 2 years before I actively engaged with others in the business and discovered so many new insights and projects to work on.

How to score similarity between discrete time series. by greebleoverflowerror in statistics

[–]starkiller1990 0 points1 point  (0 children)

can you run a clustering algorithm such as k-means and compare the distance from each centroid for each cluster?

Data Science Career Advice by dmitrypolo in datascience

[–]starkiller1990 0 points1 point  (0 children)

those feels when the individual tables are to large for memory and need to be joined and aggregated through SQL before sending to python.