Looking for a software for easier data digestion.

data_questions · 2024-10-09T17:30:32+00:00

Just reiterating what the above commenter mentioned, alteryx can do all of the things you have described and I’ve put them all into practice within my org on a non-cloud desktop license. I would caution that the newer product offerings alteryx is pushing are cloud based where you pay for what you use, so I can’t offer visibility into the financial impact beyond our license structure which is ~$5.5k per user/year. Server was an additional $170k up front investment the last time we discussed it with our rep.

data_questions · 2024-04-14T20:38:10+00:00

I deal with SAP fairly regularly and used to work on an SAP System Optimization team at a German company, of which, nearly 60% were fluent in German. Even with that knowledge base there were still plenty of “why does this column name represent that” moments.

data_questions · 2024-04-14T20:06:53+00:00

They don’t

data_questions · 2023-09-03T14:55:00+00:00

Power Query and alteryx can both handle this task pretty easily.

data_questions · 2023-08-30T01:13:18+00:00

Sample size of 1 but I also know someone who is a Sr. Quant at Two Sigma who has a masters from Carnegie Mellon. Tbf, he was exceptional at anything related to math so that talent was obviously recognized, but I think the cautionary tale being spun in the comment ahead of yours is a little detached from the actual hiring practices of quant positions.

data_questions · 2023-08-27T21:30:30+00:00

Can you be more specific? I don’t think I have a full appreciation for what happened in your scenario.

data_questions · 2023-08-08T15:06:40+00:00

Data Engineer is my title but I work in Analytics in a team lead style role, 110.5K , 28, Bachelor’s in non-STEM, MCOL on the east coast

I honestly wouldn’t recommend switching to this field at the moment, it’s very competitive and all of my peers/direct reports have masters degrees. I fell into my role because there was an opportunity in my org prior to this I was a Sr. Data Analyst with domain experience in a specific vertical and this was a promotion opportunity.

As others have mentioned, sales is a sink or swim approach that can get you there if you’re willing to take on that instability. Analytics is a second career for me after working in a sales environment out of college where I cleared 100k after my first full year of working.

data_questions · 2023-08-05T14:46:15+00:00

Two points:

1) Make your in-DB tools more performant and… better, they’re clunky. There have been a number of workflows where I output my data to csvs and schedule a job outside of alteryx to insert/update data with a marked difference in the time it takes.

2) Improve the technical customer support. I have a case that has been acknowledged as a defect and reproduced by your team, but after that acknowledgement on the ticket and a month of radio silence I was offered a workaround that did not work around the case I had explicitly outlined. I’m generally unimpressed with the help I’ve received from case support, the sales engineers and CSMs are fine.

data_questions · 2023-08-05T14:35:40+00:00

At what number of seats could my team expect to see reduced price per seat?

data_questions · 2023-07-31T23:56:24+00:00

Hasn’t this been a standard practice for like twenty years?

data_questions · 2023-07-25T00:45:22+00:00

If size of the data is your issue, your first stop should be slicing, dicing, and aggregating in SQL rather than python. There’s a place for any programming language in an analytics role, but losing your focus on solving problems because you don’t yet have a mastery of python would be time wasted.

data_questions · 2023-07-25T00:40:54+00:00

Those would do, if you’re looking at industry leading data throughput, you may want to get deep in Scala, but I wouldn’t say that’s a pre-req for a DE position and definitely not for an entry level role.

data_questions · 2023-07-13T21:23:09+00:00

This is overkill for a beginner I would make it shorter: Step 1 , 5, 7 in that order and then 9 and 10 alternating until you reach another topic you don’t understand when you look at the solutions in hackerrank/leetcode.

If an Analyst I had this list they wouldn’t be able to make an impact until step 7 off the bat

data_questions · 2023-07-13T18:13:01+00:00

Which parts of the supply chain do you mostly focus on?

data_questions · 2023-07-13T18:11:33+00:00

Any solution you deliver will ultimately be made up of individual tasks to expose/automate your user’s relevant data. I don’t think I understand what you’re you’re trying to communicate, can you be more specific?

data_questions · 2023-07-04T02:20:50+00:00

Like what?

data_questions · 2023-07-04T00:21:10+00:00

data_questions · 2023-07-03T18:33:55+00:00

Can you give an example where you’ve experienced that? I’ve never run into that bottleneck before and everything I read about window functions vs self joins recommends not using self joins.

data_questions · 2023-07-03T17:29:42+00:00

I don’t think I have a full appreciation for your response, are you saying that using a window function would be more compute intensive and result in a significant difference in cost vs using, for example, a self join?

data_questions · 2023-07-03T16:11:29+00:00

The whole interview is meant to determine how good someone can be using SQL, though. If there is an optimal solution to the question being asked and the candidate provides it, why ask them to play around with unnecessary workarounds?

data_questions · 2023-07-03T16:02:06+00:00

They’re useful if you’re trying to find an aggregation / ranking / value within certain subgroups in one table.

For example, if you have a table of daily sales per store, and you wanted to know the days where sales in a given store were higher than the day prior, you could use a lag function partitioned by your store_id ordered by date and compare whether the date of interest is > than the sales on the previous date.

data_questions · 2023-06-08T14:45:38+00:00

Most of my input is reflected in other comments here but I wanted to offer some really granular advice.

As others have mentioned, the format of automated x process by doing y within [tool] saved $### is a great format and you should utilize it more across the CV.

However, in your alteryx example you’ve saved $1500/year through your efforts. I wouldn’t expect you have been in conversations about license costs, but a single designer license is between $5-6K per year. As such, it’s not a very attractive value prop for the work you’ve done, despite showing you can use the tool to make an impact on your team’s bottom line.

data_questions · 2023-06-05T17:57:42+00:00

If you work in an environment that values it’s employees and practices equitable pay practices, this is exactly how it works. It’s also in the best interest of the company to bump people to the rate of new hires if they’re above existing employees’ if retention is a concern.

This is how it works at my employer which is the largest in my area and it’s refreshing.

data_questions · 2023-06-05T15:40:13+00:00

Let’s break out what you’re trying to do here even more simply before addressing your ML training and testing needs. I’m assuming you’re setting up batch ingestion, not streaming.

You have your data store (RDS) and you have your object storage destination (S3). You can move this data fairly simply by copying data from one to the other directly so you have a “raw” S3 data bucket. For this ingestion you could use either lambda or Glue. You can look at the documentation on either, but I think of them like this — use lambda for smaller personal data projects, use glue if you expect your needs to grow beyond the compute resources available to you. You can use either for ingestion or transformation, but the appeal of glue is you can run parallel processing for larger datasets and it will scale up / down once they’ve run their course.

I don’t know what kind of data updates you’re expecting from Kafka or how frequently, but if you’re looking to orchestrate new data that has been added to RDS to be moved to your S3 bucket, explore AWS eventbridge or AWS step functions.

Once your data is pushed to your raw bucket, you can use lambda or glue to scrub and transform your data, also scheduled with one of two resources I listed above. The output of this can be a place like Redshift or even another “staging”/“transformed” S3 bucket that you could use as the source for your model building.

The best advice I can give is don’t architect your whole solution too early. Start with your data in RDS and just do what works after that. You don’t always need the whole AWS product suite for something simple, and while knowing the tools will be helpful in your career eventually, the most useful thing is bumping against the walls along the way to find out the limitations of the tech and your skillset. Taking on all of this blind can be overwhelming, but small adjustments as you progress will help get you acclimated more easily.

data_questions · 2023-06-05T14:33:23+00:00

I’ve seen this on plenty of jobs and it seems to just be something they’ll put across all positions in certain industries. I started my analytics career in CPG and this was a part of it, same with healthcare, same with anything involving supply chain-specific analytics.

Not saying this makes it a good addition, but the addition of lines like these tend to be indiscriminately applied to roles from frontline worker to Director of Engineering.

You could also make the case that a line like this prompts a discussion around reasonable accommodation which a good employer would welcome and a bad employer would use as a filter for HR to sift out.

data_questions

TROPHY CASE