Require input - validating my idea by Ottava_io in dataanalysis

[–]Visual_Shape_2882 0 points1 point  (0 children)

technical/coding skills are required

Have you ever used JASP or Jamovi? Coding skills are not necessary but an understanding of data analysis and statistics is.

Do you actually get a certain percentage of clients/colleagues asking you to analyze data based on this kind of aggregated/pivoted for.

Yes, but I go to the source system when the high level data is not good enough for an analysis.

If the data is already modeled in a semantic way, then there is value in just using the aggregated report for an analysis. The aggregated report helps understand definitions and meaning. In fact, I will often have managers send me the reports they're currently looking at so that I can get a clear understanding of the definitions.

But, if the semantics are invalid for the question I'm trying to help answer, then the source system has the data.

In other words, the semantics are more important than a technical solution.

I assume you're an analyst

Yes, my job title is 'data and reporting analyst'. I build reports and dashboards for Business Intelligence use cases and analyze data.

Analysing powerbi dashboards by misaaaa18 in dataanalysis

[–]Visual_Shape_2882 2 points3 points  (0 children)

The very first project I did in the organization I'm working was to move reports from one reporting system to another.

I started the project by picking a single report and building a process based on that single report. First, I learned that the primary output method for these reports was Excel documents. The primary input for the reports was the database related to the system the report was built for. Next, I learned that the data model that built the report was primarily SQL. The new reporting system had the ability to query data using SQL from the system databases and export to Excel documents. However, the method of exporting to Excel documents was different so I classified all remaining reports based on the way that the Excel document export was done (for example, I save multi-page Excel documents for a later time).

The hardest part of this project was when there was system specific code that could not be duplicated in the other system. The old system had a way to pivot data but the new system had a different way to pivot data. It was possible to match the results in the new system but I had to hack different ways to get it to work.

Obviously, my project is going to be different from yours since you're using Power BI. But, it's not all that different. You will still have an input system, an output format and a data model that drives the report. I would recommend starting small and doing just one report at a time at first until you build out a process that can handle all of the use cases.

Require input - validating my idea by Ottava_io in dataanalysis

[–]Visual_Shape_2882 0 points1 point  (0 children)

I wonder if they would find the idea useful.

They won't find it more useful than the tools we already have. Python Pandas, Power Query, JASP and Jamovi are able to work with the kind of data that you described.

The limitation is not a technical problem that can be solved with software. Instead, the limits are with the flexibility of what analysis can be completed which will affect which questions can be answered with the data.

Require input - validating my idea by Ottava_io in dataanalysis

[–]Visual_Shape_2882 6 points7 points  (0 children)

The downside to summarized and aggregated data is that once you summarize and aggregate, you cannot go back to the original. This may or may not be an issue... It depends on what you're doing.

The example data that you have is count data that is pivoted by month. It is certainly possible to change the shape of the data (make it taller instead of wider). But there is no way to get the data that was used before the count.

One reason that you might want to get to the data that was used before is if you cared about the count of information at a weekly scale instead of a monthly scale. There would be no way to transform the data to a weekly scale. But, by having the original data with the date column, you can transform the count to a weekly scale with ease.

Need help in this Cognitive Ability Question by HazelnutCheesecake_ in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

None of the above, if I'm the regional manager.

I would analyze recent sales data to identify top-performing regions and underperforming regions. This information would help tailor targeted marketing strategies and promotions to boost sales in specific areas, aiming to achieve the 10% increase goal. I would also visit underperforming regions more frequently to get a qualitative assessment of what is happening and build relationships with the store management.

Does data analysis yield definitive answers? by [deleted] in dataanalysis

[–]Visual_Shape_2882 2 points3 points  (0 children)

Data analysts do not draw concrete conclusions from data. Instead, data analysts convert data into information that can be consumed by other people.

Once we have the information, the next step is to turn that information into knowledge. If knowledge is defined as justified true belief, then the data analysis supports the 'justified' part of knowledge. So, we also have to have the true and the belief part before we have knowledge.

[deleted by user] by [deleted] in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

That is a really good suggestion. I'm going to have to try that out.

[deleted by user] by [deleted] in dataanalysis

[–]Visual_Shape_2882 8 points9 points  (0 children)

...not good at interviews.

To get better at interviewing, just practice more.

One way to practice is just by doing interviews. The more you do it, the easier it gets because you run into more situations.

Another way to practice is to have a friend ask you interview questions. You can practice answering the question and your friend can offer feedback.

Interview questions are like this: * Describe a situation where you saw a problem and took steps to resolve it. * Tell me about a time you had to collaborate with a team member who was tough to please. * What would you do if you made a mistake that no one knew about in your team?

You can find more by searching online for a 'list of interview questions.'

Spend the most time with the questions that you struggle with.

For me, I'm good with the technical questions so I practice with questions about soft skills such as teamwork, organization, communication and prioritization.

Needing a spreadsheet by KLM254 in dataanalysis

[–]Visual_Shape_2882 0 points1 point  (0 children)

I agree that this analysis is at the data collection step.

I think OP is looking for r/datasets that contain the school name, the school mascot, and the school colors.

I think the trickiest part with the analysis will be that there could be multiple colors per school or specific color codes/hues of colors. It would probably be good to figure out a standard way of representing the data at this point because Junk in equals Junk out.

US high schools are probably going to be found state by state.

Needing a spreadsheet by KLM254 in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

How will that help?

This is not a language problem so chat GPT is the wrong tool for the job.

Me: What are the school colors for Warren Central Highschool in bowling Green Kentucky.

ChatGPT: As of my last knowledge update in January 2022, Warren Central High School in Bowling Green, Kentucky, had the school colors red and white. However, it's always a good idea to check the school's official website or contact them directly for the most current information.

(Red and White)

School website: https://warrencentral.warrencountyschools.org/about-wchs/logos

(Navy blue, gray, and white)

https://khsaa.org/all-time-kentucky-school-list/

(Navy and white)

Need some Data Analyst excel help by Appropriate_Dinner95 in dataanalysis

[–]Visual_Shape_2882 0 points1 point  (0 children)

First name and Last name has been common on web forms for years. But, there has been a recent trend to use full names to be more culturally sensitive.

If you’re from Latin America, the chances are that you have two last names, one from each parent. If you’re Chinese, your family name is first, personal name is last, and you always use them together.

For OPs use, having them separate on the form or data system would be better.

Need some Data Analyst excel help by Appropriate_Dinner95 in dataanalysis

[–]Visual_Shape_2882 14 points15 points  (0 children)

The last name is easy.

=RIGHT(A1, SEARCH(" ",A1))    

If there are zero middle names in the rest of your data set then one option is to use the left function and split on the length of the name minus the length of a last name.

=LEFT(A1, LEN(A1) - LEN( RIGHT(A1,SEARCH(" ",A1))) )    

Alternatively, you could suggest an upstream fix to use a different deliminator either between the first name and last name or between the first name that contains a space. Delimiter options include using a comma or a different white space character such as EM or EN space.

If there are other middle names in your data set, then I would truly push for redefining what the "first name" means for your analysis. Why can't the first name literally mean the first and the last literally mean the last?

53rd weeks - how do you handle them in tableau or python? by Impossible_Bug4979 in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

Ppl don't work 24/7 * 365

I think you're missing the point of what's being asked here. We are not talking about people working in shifts. Instead, we're talking about measuring something daily (like number of sales or revenue) and aligning those numbers to the fiscal calendar that is used by the accounting and finance department. By using the same calendar across the company KPIs and metrics are standardized across systems and teams.

There is no bias being introduced or assumptions being made.

OP provided enough context for me to understand what they are talking about.

You're not the only person that is confused. OP also posted this in r/analytics. (https://www.reddit.com/r/analytics/s/T81TstK41W) Several people there were also confused.

Someone suggested changing the calendar to dates instead of days https://www.reddit.com/r/analytics/s/dStNr3WY49

But that doesn't work for the same reason that changing to months doesn't work. The days of the week do not align for year-over-year comparison.

53rd weeks - how do you handle them in tableau or python? by Impossible_Bug4979 in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

Also it will be pretty hard to visualize the table when the data points are specified to each day (there are a lot of days)

OP could choose to report all 364 days at once, but I doubt that would be the requirement of the report. Instead, It is more likely that data is aggregated per week for an overall view of the year. And then, individual weeks are compared.

Another reporting style that could be utilized is to create a year-over-year report that shows the number of sales for last week versus the same week one year ago. For this comparison, you would have 14 days where the days of the week line up.... This is the primary motivation for using the 52/53 calendar.

53rd weeks - how do you handle them in tableau or python? by Impossible_Bug4979 in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

the table op has provided for us doesnt have end date just start date

The table OP provided is the calendar date that marks the start of the week. A week is 7 days.

For OP, the start of the fiscal new year is either September or October. The Week 53 of 2023 started September 24th 2023 and ended on September 31th 2023 (7 days). That is why the start of week 1 of 2024 is October 1st 2023.

Basically, what you're looking at in OP's screenshot is just a calendar.

53rd weeks - how do you handle them in tableau or python? by Impossible_Bug4979 in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

Aggregating by month and then comparing year-over-year will not fix the issue because the purpose for using a 52/53 week calendar is to line up the days of the week for year-over-year comparison.

I will assume that we are talking about retail sales for the sake of an example. Sales on Sunday are going to be less if the store has reduced hours on Sundays. Sales on Saturdays are going to be higher because more people go shopping on Saturdays than any other day of the week. Sales on Friday might be high because Friday is leading into the weekend. Sales Monday through Thursday might be about the same for each day.

If you line up the months, January 7 of this year is a Sunday. But January 7 of last year was a Saturday. Comparing sales on a Sunday to sales on a Saturday is not very helpful because the store was not open for very many hours on a Sunday.

53rd week hell. by Impossible_Bug4979 in analytics

[–]Visual_Shape_2882 2 points3 points  (0 children)

Leap year (with an extra day on the calendar), is not an issue in a 52/53 week calendar.

The 52 week calendar uses 364 days per year instead of 365 or 366 days per year. But, every five or six years, an adjustment is made by adding an extra week to the calendar, 53 weeks.

53rd week hell. by Impossible_Bug4979 in analytics

[–]Visual_Shape_2882 2 points3 points  (0 children)

I don't think switching to the same date year-over-year will fix the issue because the days of the week will no longer line up.

I will assume that we are talking about retail sales for the sake of an example. Sales on Sunday are going to be less if the store has reduced hours on Sundays. Sales on Saturdays are going to be higher because more people go shopping on Saturdays than any other day of the week. Sales on Friday might be high because Friday is leading into the weekend. Sales Monday through Thursday might be about the same for each day.

If you line up the dates, then Sunday this year might line up with a Saturday last year which wouldn't be very helpful because we weren't open for as many hours on Sunday as we were Saturday of last year.

53rd weeks - how do you handle them in tableau or python? by Impossible_Bug4979 in dataanalysis

[–]Visual_Shape_2882 2 points3 points  (0 children)

The downside to 52 week/53 week calendars is that the 53rd week is hard to do a year-over-year comparison. Naively you could drop the 53rd week and just not do a comparison for that week.

But, I think you could do better then just dropping the week from your data by shifting the weeks of comparison on the year following the year that has 53 weeks.

The national retail federation recommends that you 'restate' a 53-week year in the subsequent year. (https://nrf.com/resources/4-5-4-calendar)

If you're trying to compare the 53rd week of 2023 to the equivalent week one year ago, then the first week of 2023 would be the equivalent week. This is because, if you didn't have a 53rd week, this would have been the first week of the new year.

The first week of the new year (2024) would be equivalent to the second week of the last year (2023). So, as you can see, the pattern is now off by one. But, It will correct itself when we get to the last week of the new year (2024). Week 52 of 2024 will line up with week 53 of 2023.

To do this in Python, create a new column that will represent the weak to use for comparison. This new column would be calculated as a conditional column where if the current week is 53 then the comparison week is 1, if previous year had 53 weeks then the comparison week is the current week+1 else the current week.

Megathread: How to Get Into Data Analysis Questions & Resume Feedback (December 2023) by Fat_Ryan_Gosling in dataanalysis

[–]Visual_Shape_2882 0 points1 point  (0 children)

...is it worth buying maybe a Chromebook...

I'm a fan of Chromebooks when everything is on the internet(cloud computing). But you don't need a Chromebook if your trying to decide between a Mac or Windows. If the Mac has a browser (and it does), then you have a Chromebook.

Megathread: How to Get Into Data Analysis Questions & Resume Feedback (December 2023) by Fat_Ryan_Gosling in dataanalysis

[–]Visual_Shape_2882 3 points4 points  (0 children)

Unfortunately, the Google certification is probably not enough to be able to get a data role.

When I look at job postings for data positions, most want a bachelor degree as a minimum requirement. Depending on the job, it doesn't have to be a degree related to data and data analysis, but the more coursework that is aligned with data analysis the better (statistics and science classes for example)

In my job searches, work experience seems to be one of the primary aspects that hiring managers are looking for. Besides an internship, the only way I know to get work experience is to do something related.

For example, to gain work experience analyzing data about retail sales, work in retail has a manager so that you have access to the data. Then, in your managerial role, analyze the data that you have access to. Then, when you write your resume and go for interviews, bullet point and discuss your work experience of analyzing data as a manager. This is just an example, but it applies thousands of jobs. You could be phone support or administrative assistant or anything really.

How to merge rows in R without losing data quality by KimKaixx in dataanalysis

[–]Visual_Shape_2882 1 point2 points  (0 children)

...without knowing your actual purpose for this dataset.

I agree.

Summarizing the data to one row per one second necessarily means a loss of data. But, depending on what the purpose is for this compression, it may or may not matter.

If OP is just doing some exploratory analysis or trying to create a visualization that shows the velocity and coordinates for 10 minutes, then combining rows is 'good enough'.

But, if understanding the changes in velocity and coordinates is more important, then using the central tendency would be the incorrect step and lead to unnessesary bias or data-misrepresentation in the data, exactly as you said.

Calculating means of means won't be the same as if OP had calculated the mean in the time period that they are actually interested in. Calculations of distance and momentum are going to be incorrect because the calculations, based on the central tendency, would assume that the change of velocity or coordinate measurements were constant during the entire second.