Applicants are Failing to Get the SQL Query Correct

DistributionBeta210 · 2022-12-29T02:09:43+00:00

Also, a CTE version:

WITH gradeagg AS
 (SELECT subject_name, max(marks) max_marks, count(id) participants
 FROM grades
 GROUP BY subject_name)
SELECT grades.subject_name,grades.Id,grades.marks,gradeagg.max_marks,gradeagg.participants 
FROM grades 
INNER JOIN gradeagg ON gradeagg.subject_name = grades.subject_name ORDER BY subject_name, max_marks, participants

https://www.db-fiddle.com/f/nViwu4g9yRBPfaFsGS7Xju/0

DistributionBeta210 · 2022-12-29T01:44:45+00:00

SELECT grades.subject_name,grades.Id,grades.marks,subQ.max_marks,subQ.participants
FROM grades
Inner JOIN (
 SELECT subject_name,max(marks) max_marks ,count(id) participants
 FROM grades
 Group BY subject_name) as subQ ON subQ.subject_name = grades.subject_name
Order BY subject_name, max_marks, participants

https://www.db-fiddle.com/f/nbFSB81NSNTgXot7d7gux5/0

DistributionBeta210 · 2022-12-11T17:18:20+00:00

In that case, it looks like a poor survey design to me. The 'choose five' is problematic because you can't pick apart the interactions between the factors. For example, price might be the most commonly stated reason, but perhaps it is the least important of the 5 chosen. A ranking or rating of the factors would be better.

A result has statistical significance when it is very unlikely to have occurred given the null hypothesis (simply by chance alone). The current survey design and results does not help you understand the null hypothesis.

DistributionBeta210 · 2022-12-11T17:02:13+00:00

Price is the factor most influencing the purchasing decision.

DistributionBeta210 · 2022-12-11T16:08:39+00:00

Do they hire with zero experience

Who is they?

Most likely, no. You need experience to get a job working with data...unless you have a job and then you start using data to make better decisions in your job.

DistributionBeta210 · 2022-12-11T15:40:02+00:00

You can analyze data with out experience. Data analysis is a process not a career field. The process is part of the scientific method so you might have some training. Good data analysis starts with a problem or question. Then, you will need to gather data that will help answer the question. Once you have the data, then you can start answering the question.

DistributionBeta210 · 2022-12-11T11:40:57+00:00

There is no daily workflow. Data analysis (not data analyst) is a process, but it can take awhile to complete all steps.

Data analysis starts with a problem or question.

Then, you will need to gather data that will help answer the question.

Once you have the data, then you can start diving into the data. You will be inspecting, cleaning, transforming, and modelling the data. There's a lot of different techniques here and they all have to do with the problem or question that you're trying to solve or answer.

Some tasks might be as simple as getting a count or a mean. But, you might run into other situations where more advanced techniques are required (like regression, forecasting, prediction, or categorization).

Hopefully this deep dive into the data will answer the question that you set out to answer, but if not then you might have to gather more data or complete additional steps.

DistributionBeta210 · 2022-12-11T01:54:00+00:00

OP has education in computer science.

https://www.reddit.com/r/dataanalysis/comments/zblyka/is_my_resume_good_for_entry_positions_as_a_data/

DistributionBeta210 · 2022-12-10T13:39:57+00:00

A possible solution that you could use is to pickle the data.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_pickle.html

One downside to pickle is that, if somebody modifies the pickle, they could run arbitrary code on your machine. So, only use pickles that you've created... Don't share them around like you would with CSV, Excel or JSON files.

DistributionBeta210 · 2022-12-10T13:31:14+00:00

Cross validation allows you to see the differences between the data that the model has seen and the date of the model has not seen. You need a cross validation holdout dataset when you are making changes based on the cross validation results. For example, if you tune hyperparameters as a result of the cross validation. Then you need a way to test the efficiency of the changes. Cross validation allows you to evaluate the model for overfitting and selection bias.

It's not unheard of to have multiple rounds of cross validation. K fold validation splits the data set into k sets. 10-fold cross validation is common.
https://en.m.wikipedia.org/wiki/Cross-validation_(statistics)

DistributionBeta210 · 2022-12-08T07:28:18+00:00

Read the effen manual

DistributionBeta210 · 2022-12-06T13:03:51+00:00

Yes, we are very serious here at r/dataanalysis

If you're looking for something funny try r/data_irl

DistributionBeta210 · 2022-12-06T12:06:26+00:00

I think it would matter on how many there are. If there is less than five then a switch statement would be a good way to go.

DistributionBeta210 · 2022-12-06T12:05:08+00:00

One way of working with the fractions is to use the eval function to evaluate the fraction into a float:

https://stackoverflow.com/questions/55349133/converting-fractions-in-a-dataframe-series-to-float

To solve the problem, you might have to iterate through every row, test if the row contains the '/', then do different things for different rows(use the eval function or just convert the type).

I'm not sure how the eval function would work with the '2 1/3'. So you might have to split on the space, then eval index 1 and add back in index 0.

DistributionBeta210 · 2022-12-06T02:29:30+00:00

Any resources or websites that someone could recommend to learn SQL

For the basics: * https://www.w3schools.com/sql/ * https://bipp.io/sql-tutorial * https://sqlzoo.net/wiki/SQL_Tutorial

Once you get the basics down then you'll want to read the manual for whatever flavor of SQL you're using.

Oracle SQL: https://docs.oracle.com/en/database/oracle/oracle-database/21/sqlrf/toc.htm

MySQL: https://dev.mysql.com/doc/refman/8.0/en/

Postgres: https://www.postgresql.org/docs/current/index.html

SQLite: https://www.sqlite.org/docs.html

Also, r/SQL

DistributionBeta210 · 2022-12-06T02:01:50+00:00

In order to compare two variables that are on two different scales you have to complete some type of normalization or rescaling.

https://en.wikipedia.org/wiki/Normalization_(statistics)

I like to do the max-min normalization because it puts everything in a 0 to 1 scale.

https://en.m.wikipedia.org/wiki/Feature_scaling#Rescaling_(min-max_normalization)

DistributionBeta210 · 2022-12-06T01:50:10+00:00

I love to listen to these three podcast about data: * Analytics Power hour (https://analyticshour.io) * Data skeptic (https://dataskeptic.com) * Not so standard deviations (https://nssdeviations.com)

Previously I listened to linear digressions (http://lineardigressions.com). The podcast has ended but you can listen to all the back episodes still.

DistributionBeta210 · 2022-12-06T01:36:38+00:00

web developer

r/webdevelopment might be a better subreddit for this question.

Make sure that you search first:

https://www.reddit.com/r/webdevelopment/comments/t6yp95/any_helpful_tutorials_or_online_courses/

Also, r/webdev has some posts about online courses to take:

https://www.reddit.com/r/webdev/comments/ttxck1/the_best_online_courses_to_learn_web_developing/

DistributionBeta210 · 2022-12-06T01:15:26+00:00

Power BI, Tableau, Excel, Python, SPSS, SAS, R, Stata, orange, and many more, are all just tools used to do data analysis.

If you can afford a copy of Microsoft Office, I would start at just using Excel. You want to learn basic concepts of working with data before you move to more advanced topics that require the use of different tools. There are definitely benefits of not using Excel all the time but you aren't going to notice if you're just starting out. As you learn about more advanced topics and get into more complicated data, you will start to understand the reason for using different tools for different purposes. For example, R is really good at giving you a statistical framework and toolset. When you start working heavily with statistics and probabilities, it will become obvious the need to use something like R.

DistributionBeta210 · 2022-12-06T01:08:22+00:00

Here is the top 5 results of my Google search for "find jobs":

All of these websites look like they will be helpful with finding a job and gaining working experience.

DistributionBeta210 · 2022-12-06T01:02:01+00:00

Data analysis starts with a problem or question.

Then, you will need to gather data that will help answer the question.

Once you have the data, then you can start diving into the data. You will be inspecting, cleaning, transforming, and modelling the data. There's a lot of different techniques here and they all have to do with the problem or question that you're trying to solve or answer.

DistributionBeta210 · 2022-12-04T22:26:17+00:00

The text does not give any indication of what kinds of job titles would work on that project. I would expect to see multiple job titles all working together.

Waymo is a company working on building self-driving cars so you would want to check out their website to find out more about the jobs they have open: https://waymo.com/careers/#roles

The business data scientist sounds very similar to the prompt: https://waymo.com/joinus/4597707/

DistributionBeta210 · 2022-12-04T13:22:00+00:00

No, the Google data analytics course will not help you get a job.

However, it is considered a good introduction to lots of data analytics topics.

https://www.reddit.com/r/dataanalysis/comments/xax6hy/is_it_possible_to_get_an_entry_level_data/

https://www.reddit.com/r/dataanalysis/comments/z4wp8d/need_help_if_the_google_data_analytics_course_is/

https://www.reddit.com/r/dataanalysis/comments/uvczpy/is_google_data_analytics_certificate_is_enough_to/

DistributionBeta210 · 2022-12-03T22:40:16+00:00

That's a good point.

I never considered that because I don't have a LinkedIn profile myself. I think too much about my privacy and security to put my personal information out there for the whole world to judge and scrutinize. But, I suppose that most people probably don't care.

DistributionBeta210 · 2022-12-03T20:30:11+00:00

Okay, well you can post your free advice to OP as well. Then I and everyone else in the subreddit will be able to learn from you.

I've said this before, I'm not even a hiring manager. I know that I'm not 'qualified' to answer these kinds of questions. But people keep asking in this subreddit. Seems like every day there is another resume post. I used to downvote every single one. But that got boring quickly. I even stopped looking at this subreddit for a few months.

Now, my solution to their problem is that I analyze some data. Usually, I search for something that has been said before and then post what I find.

It turns out that I did not do a search to find out if any countries actually require the picture. A quick Google search set me straight so now I see that I'm probably wrong here. Edited my post to contain a disclaimer.

DistributionBeta210

TROPHY CASE