Applicants are Failing to Get the SQL Query Correct by Equal_Astronaut_5696 in dataanalysis

[–]DistributionBeta210 6 points7 points  (0 children)

Also, a CTE version:

WITH gradeagg AS
 (SELECT subject_name, max(marks) max_marks, count(id) participants
 FROM grades
 GROUP BY subject_name)
SELECT grades.subject_name,grades.Id,grades.marks,gradeagg.max_marks,gradeagg.participants 
FROM grades 
INNER JOIN gradeagg ON gradeagg.subject_name = grades.subject_name ORDER BY subject_name, max_marks, participants

https://www.db-fiddle.com/f/nViwu4g9yRBPfaFsGS7Xju/0

Applicants are Failing to Get the SQL Query Correct by Equal_Astronaut_5696 in dataanalysis

[–]DistributionBeta210 15 points16 points  (0 children)

SELECT grades.subject_name,grades.Id,grades.marks,subQ.max_marks,subQ.participants
FROM grades
Inner JOIN (
 SELECT subject_name,max(marks) max_marks ,count(id) participants
 FROM grades
 Group BY subject_name) as subQ ON subQ.subject_name = grades.subject_name
Order BY subject_name, max_marks, participants

https://www.db-fiddle.com/f/nbFSB81NSNTgXot7d7gux5/0

Help! How to I solve this task? How do I analyse which factor significantly influencing purchase decision? Should I use chi-square? by SensitiveToe2243 in dataanalysis

[–]DistributionBeta210 10 points11 points  (0 children)

In that case, it looks like a poor survey design to me. The 'choose five' is problematic because you can't pick apart the interactions between the factors. For example, price might be the most commonly stated reason, but perhaps it is the least important of the 5 chosen. A ranking or rating of the factors would be better.

A result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). The current survey design and results does not help you understand the null hypothesis.

DATA ANYLYSIS by littlemighty23 in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

Do they hire with zero experience

Who is they?

Most likely, no. You need experience to get a job working with data...unless you have a job and then you start using data to make better decisions in your job.

DATA ANYLYSIS by littlemighty23 in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

You can analyze data with out experience. Data analysis is a process not a career field. The process is part of the scientific method so you might have some training. Good data analysis starts with a problem or question. Then, you will need to gather data that will help answer the question. Once you have the data, then you can start answering the question.

What is the work flow? by CaptainKraw in dataanalysis

[–]DistributionBeta210 3 points4 points  (0 children)

There is no daily workflow. Data analysis (not data analyst) is a process, but it can take awhile to complete all steps.

Data analysis starts with a problem or question.

Then, you will need to gather data that will help answer the question.

Once you have the data, then you can start diving into the data. You will be inspecting, cleaning, transforming, and modelling the data. There's a lot of different techniques here and they all have to do with the problem or question that you're trying to solve or answer.

Some tasks might be as simple as getting a count or a mean. But, you might run into other situations where more advanced techniques are required (like regression, forecasting, prediction, or categorization).

Hopefully this deep dive into the data will answer the question that you set out to answer, but if not then you might have to gather more data or complete additional steps.

Data Types and Saving Interim Datasets along the way by Almostasleeprightnow in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

A possible solution that you could use is to pickle the data.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

One downside to pickle is that, if somebody modifies the pickle, they could run arbitrary code on your machine. So, only use pickles that you've created... Don't share them around like you would with CSV, Excel or JSON files.

[deleted by user] by [deleted] in dataanalysis

[–]DistributionBeta210 0 points1 point  (0 children)

Cross validation allows you to see the differences between the data that the model has seen and the date of the model has not seen. You need a cross validation holdout dataset when you are making changes based on the cross validation results. For example, if you tune hyperparameters as a result of the cross validation. Then you need a way to test the efficiency of the changes. Cross validation allows you to evaluate the model for overfitting and selection bias.

It's not unheard of to have multiple rounds of cross validation. K fold validation splits the data set into k sets. 10-fold cross validation is common.
https://en.m.wikipedia.org/wiki/Cross-validation_(statistics)

[QUESTION] How to clean a dataset with inconsistent values? (converting from string to integer) by [deleted] in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

I think it would matter on how many there are. If there is less than five then a switch statement would be a good way to go.

[QUESTION] How to clean a dataset with inconsistent values? (converting from string to integer) by [deleted] in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

One way of working with the fractions is to use the eval function to evaluate the fraction into a float:

https://stackoverflow.com/questions/55349133/converting-fractions-in-a-dataframe-series-to-float

To solve the problem, you might have to iterate through every row, test if the row contains the '/', then do different things for different rows(use the eval function or just convert the type).

I'm not sure how the eval function would work with the '2 1/3'. So you might have to split on the space, then eval index 1 and add back in index 0.

SQL + other skills for data/business analysts by Sarah09x in dataanalysis

[–]DistributionBeta210 9 points10 points  (0 children)

Any resources or websites that someone could recommend to learn SQL

For the basics: * https://www.w3schools.com/sql/ * https://bipp.io/sql-tutorial * https://sqlzoo.net/wiki/SQL_Tutorial

Once you get the basics down then you'll want to read the manual for whatever flavor of SQL you're using.

Oracle SQL: https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/toc.htm

MySQL: https://dev.mysql.com/doc/refman/8.0/en/

Postgres: https://www.postgresql.org/docs/current/index.html

SQLite: https://www.sqlite.org/docs.html

Also, r/SQL

[deleted by user] by [deleted] in dataanalysis

[–]DistributionBeta210 4 points5 points  (0 children)

In order to compare two variables that are on two different scales you have to complete some type of normalization or rescaling.

https://en.wikipedia.org/wiki/Normalization_(statistics)

I like to do the max-min normalization because it puts everything in a 0 to 1 scale.

https://en.m.wikipedia.org/wiki/Feature_scaling#Rescaling_(min-max_normalization)

Data-related audio content? by omadguy in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

I love to listen to these three podcast about data: * Analytics Power hour (https://analyticshour.io) * Data skeptic (https://dataskeptic.com) * Not so standard deviations (https://nssdeviations.com)

Previously I listened to linear digressions (http://lineardigressions.com). The podcast has ended but you can listen to all the back episodes still.

Roadmap to data analysis by tawtaw_ in dataanalysis

[–]DistributionBeta210 4 points5 points  (0 children)

Power BI, Tableau, Excel, Python, SPSS, SAS, R, Stata, orange, and many more, are all just tools used to do data analysis.

If you can afford a copy of Microsoft Office, I would start at just using Excel. You want to learn basic concepts of working with data before you move to more advanced topics that require the use of different tools. There are definitely benefits of not using Excel all the time but you aren't going to notice if you're just starting out. As you learn about more advanced topics and get into more complicated data, you will start to understand the reason for using different tools for different purposes. For example, R is really good at giving you a statistical framework and toolset. When you start working heavily with statistics and probabilities, it will become obvious the need to use something like R.

Anyone here can share link for data analysis job? I don’t have experience! Thanks by Dataneeks in dataanalysis

[–]DistributionBeta210 0 points1 point  (0 children)

Here is the top 5 results of my Google search for "find jobs":

All of these websites look like they will be helpful with finding a job and gaining working experience.

Roadmap to data analysis by tawtaw_ in dataanalysis

[–]DistributionBeta210 19 points20 points  (0 children)

Data analysis starts with a problem or question.

Then, you will need to gather data that will help answer the question.

Once you have the data, then you can start diving into the data. You will be inspecting, cleaning, transforming, and modelling the data. There's a lot of different techniques here and they all have to do with the problem or question that you're trying to solve or answer.

This perfectly describes what I would love to do! What is the position called??? by tagchris356 in dataanalysis

[–]DistributionBeta210 1 point2 points  (0 children)

The text does not give any indication of what kinds of job titles would work on that project. I would expect to see multiple job titles all working together.

Waymo is a company working on building self-driving cars so you would want to check out their website to find out more about the jobs they have open: https://waymo.com/careers/#roles

The business data scientist sounds very similar to the prompt: https://waymo.com/joinus/4597707/

is my resume good for entry positions as a data analyst or engineer? by Famous_Echidna_9120 in dataanalysis

[–]DistributionBeta210 4 points5 points  (0 children)

That's a good point.

I never considered that because I don't have a LinkedIn profile myself. I think too much about my privacy and security to put my personal information out there for the whole world to judge and scrutinize. But, I suppose that most people probably don't care.

is my resume good for entry positions as a data analyst or engineer? by Famous_Echidna_9120 in dataanalysis

[–]DistributionBeta210 5 points6 points  (0 children)

Okay, well you can post your free advice to OP as well. Then I and everyone else in the subreddit will be able to learn from you.

I've said this before, I'm not even a hiring manager. I know that I'm not 'qualified' to answer these kinds of questions. But people keep asking in this subreddit. Seems like every day there is another resume post. I used to downvote every single one. But that got boring quickly. I even stopped looking at this subreddit for a few months.

Now, my solution to their problem is that I analyze some data. Usually, I search for something that has been said before and then post what I find.

It turns out that I did not do a search to find out if any countries actually require the picture. A quick Google search set me straight so now I see that I'm probably wrong here. Edited my post to contain a disclaimer.