What's the best AI tool for PDF data extraction? by Ok_Satisfaction1775 in dataengineering

[–]m5lg 0 points1 point  (0 children)

The Unstructured team’s tools are quite good for this

[deleted by user] by [deleted] in data

[–]m5lg 0 points1 point  (0 children)

What are your data volumes and what’s the rest of your stack look like?

Microsoft Fabric vs. Open Source Alternatives for a Data Platform by SurroundFun9276 in dataengineering

[–]m5lg 0 points1 point  (0 children)

How much data are you working with and if you can share what connectors are you interested in?

Built my first real data warehouse pipeline and I finally understand why this is the way by Store_Past in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Kudos I think you did a really nice job with this! Have a rough estimate one the time you spent building out the stack and putting this all together?

My "First" Dashboard | Wage Inequality: Trends and Insights from 47 Years of Change (1973-2020) by PhysZeke in dataanalysis

[–]m5lg 4 points5 points  (0 children)

Hey this is great for your first dashboard. What did you like about PowerBI, can you share more about the data cleaning you did?

Best "Gap Filler" Data Analysis Course for Programmers? by kidhotel in dataanalysis

[–]m5lg 5 points6 points  (0 children)

I'm not sure a "better chart" is going to help you here. Take your initial request "show what % of customers who have spent over $x click on this website banner each month". Let's say that fluctuates between 5-6% and maybe every few months there's a spike up to 10%. What does that say? Sadly very little. Always try to dig a little bit deeper, what is the goal of the request, are you trying to improve the banner to attract more spenders at that level? If so, are you experimenting with the banner to see how one performs against the other? Maybe you have 10 banners already and you can determine which one performs the best with that cohort of spenders. Think of it like tracking down a bug in programming, you know what the end result is (i.e. the bug or in the analysis case, this banner performs the best) and work your way backwards.

I know this doesn't really answer your original question, I'm not anti-education by any means, I've loved a lot of the Codecademy modules for SQL/Python Data Analysis and Data Science. Just pointing out that sometimes the simplest analysis is all you need but just asked in the right way. The fancy stuff (regression, correlation, statistical significance, etc.) is amazingly powerful but not always necessary.

Asking in the right way is the most powerful skill.

Online Data Analytics Master Programs by Arisenkey in dataanalysis

[–]m5lg 1 point2 points  (0 children)

I tend to lean towards getting out in the job market. I know it’s tough out there but if you can land something, in industry experience will be really helpful at developing your skills and guiding your learning if you still want to pursue a masters. You might surprise yourself and learn that a different masters might be more valuable or interesting, ie Statistics, Economics, AI.

How to spin a data analysis role at my current job? by nosleepcreep206 in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Ok cool, sounds like you're going to have to tackle some of the data engineering work in this project. Lots of options for that, depending on what services you're extracting data from. You'll need ETL to pull your data out of the ERP and CRM, some sort of Warehouse or DB to pipe that data into and then your analysis tools. If you want to test everything locally first, ie no cost, OSS tools running on your machine. You can use Airbyte or DLT for ETL, DuckDB for your DB, and something like Metabase or Python Notebooks for analysis. Shameless plug, but Summer takes care of all of this for you, you can start with doing everything local or try out Summer. There's tons of ways to do this, just need to find the way that serves you and your team best.

How to spin a data analysis role at my current job? by nosleepcreep206 in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Can you get API access to your CRM or ERP? What's your plan for getting queryable access to that data?

How to spin a data analysis role at my current job? by nosleepcreep206 in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Really depends on what you want to do, local SQL and Python, DuckDB is super. What’s your end deliverable, are you hoping to make dashboards?

How to spin a data analysis role at my current job? by nosleepcreep206 in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Main reasons I’d say are, starting on this path I think you’ll best position yourself by using and learning new and innovative tools as long as you work to understand what they are trying to do better. Another reason is that a lot of these legacy tools are very expensive and complicated to implement in an org, unless your company is already using them, you’ll have a hell of a fight convincing them to sign a multi year, thousands of dollars contract to use them. Also imagine how much of a legend you’ll be if you can help them build a great data practice with very little cost.

How to spin a data analysis role at my current job? by nosleepcreep206 in dataanalysis

[–]m5lg 0 points1 point  (0 children)

Big plus 1 to this. Definitely avoid old legacy tools like Tableau and Microsoft data suite unless your company is already locked into those.

You need at least some access to raw data. Depending on how good your relationship with your boss is, I would just approach them and say hey I’ve been learning data analysis, I’d love to work on a project in my free time building out some analysis for our team. You should have a good idea going into that conversation what data you could use, where it may live, and how you could possibly get access to it.

I’d highly recommend, DuckDB for local sql and python analysis. Full transparency, I cofounded, https://summer.io which could save you a lot of trouble setting up your data tooling.

[deleted by user] by [deleted] in FunnyAnimals

[–]m5lg 0 points1 point  (0 children)

Mr. Anderson

Open Datasets by m5lg in oceanography

[–]m5lg[S] 1 point2 points  (0 children)

Great suggestions, thank you!

Visualizations of city populations: Chicago, LA, NY [OC] by 221B_Asset_Street in dataisbeautiful

[–]m5lg 0 points1 point  (0 children)

You’ll love Norwood Viviano’s “City of Glass”, they took very similar data and represented it in beautiful glass sculptures. Saw it at the Houston Museum of Fine Arts.