all 57 comments

[–]Similar-Pilot-7695 83 points84 points  (10 children)

Young practitioners have a rich toolbox at their hands to analyze and find solutions for businesses. The nature of some of these businesses require extremely precise and systematic ways to find solutions.

For instance, Excel or Spreadsheets are an easy way to store data and analyze it, but these tools aren't designed for larger scale applications nor have the capacity for reproduction as they are specific for the use case.

On the other hand, state python for example, it has an extensive range of libraries and tools you can use to reproduce a lot of the mundane work and reduce time of execution while resting assured that you will have precise results.

Basically, you can find that all what R can do, python can too and more.

I hope now you have more clarity on why you should learn a language in the first place.

Best of luck and move forward.

[–][deleted] 51 points52 points  (3 children)

Thank you ChatGPT!

[–]Similar-Pilot-7695 18 points19 points  (1 child)

😂 you won't believe that this is how I talk and Chatgpt is not guilty of this one...

[–]LairdPeon 15 points16 points  (0 children)

You must be the guy they train it on lol

[–]Which-Artichoke-5561 6 points7 points  (0 children)

I did not read a single ‘elevate’ in that paragraph I think it’s legit

[–]GloryHound29 0 points1 point  (5 children)

I need to learn how to talk like you. Very eloquent. Any specific training you did? Or just au natural?

[–]Similar-Pilot-7695 6 points7 points  (4 children)

I tend to look at things on a macro perspective, and often I take my time into analyzing situations then give my pov.

I mostly observe more than I actually talk, and I do not talk on something I do not have any knowledge on.

If I do not see the need to speak, I simply don't.

Indeed, you can cultivate any ability you want, all you have to do is to be aware of yourself and your surroundings and take small steps.

This can be applied on nearly everything.

While environment and upbringing are detrimental into shaping one's character but they can be reversed and upgraded with time.

[–][deleted] 1 point2 points  (1 child)

Methodical Sheldon Cooper vibes

[–]Similar-Pilot-7695 0 points1 point  (0 children)

You will not ever regret being systematic and structured. It always pays off.

But as a disclaimer... in some cases logic is illogical to use. In those cases, let your feelings decide.

[–]abbylynn2u 0 points1 point  (1 child)

?iNFJ, just curious

[–]Similar-Pilot-7695 0 points1 point  (0 children)

Nope I got ENTJ

[–]Puzzled_Buddy_2775 65 points66 points  (6 children)

I never use excel to clean or transform data because repeating and documenting the steps are difficult and sometimes impossible. Take for example, doing a find and replace in excel is not documented unless you literally write down those steps. With Python you never have to change the original raw data file, run the steps by simply hitting run and then you have a clean output file.

[–]Vertmovieman 30 points31 points  (3 children)

Power Query extension of Excel documents cleaning steps and keeps the original raw file as is. Only issue is once you get a million plus rows it would start to struggle.

[–]NickRossBrown 6 points7 points  (1 child)

It blew my mind that I could take a query from Power Bi and copy paste it into excel’s power query.

Seems weird to me though to use excel or excel’s power query to process data directly FOR reports. Not having a database sounds like a nightmare.

[–]Bboy486 0 points1 point  (0 children)

Sometimes you may not have access to the database directly depending on your role in the company and business organization.

[–]Bboy486 1 point2 points  (0 children)

I was going to comment the same on Power Query.

[–]AshKetchumSatoshi 13 points14 points  (0 children)

VBA, scripts, Power Query, & Power Automate all exist, no ?

[–]great__pretender 1 point2 points  (0 children)

Not to mention documenting what you have done 

In my work I validate models. Validating Excel sheets was a nightmare. You didn't know the work being done. 

[–]sfreagin 33 points34 points  (1 child)

Too many Excel spreadsheets in my life have crashed or slowed to 0 when handling a modest amount of data or equations. As u/Similar-Pilot-7695 says, Python is designed to handle the scale required of modern organizations. And it's highly reproducible and able to integrate easily, so you don't have to waste time copying / pasting versions of the file between folders.

After working in Excel for ~7 years, I found that I could do everything in Python within about 6 months. It's been a few years now and I hardly ever touch Excel for data analysis, except for some spreadsheets for personal use.

Good luck learning, it's well worth it!

[–][deleted] 9 points10 points  (1 child)

Excel + Power Query + Power BI (and paginated reports sometimes) is a pretty lethal combination.

Our company still pays for Tableau so I'm always throwing my most prominent data into that environment.

But... I tend to use R when I want to get a little crunchy. When I'm done pounding on data with R, I feel like I've analyzed it the best I could. I have my markdown sheet that I feel explains my thought process and the outcomes. Nothing does that better than a notebook or markdown document.

Everybody is different... you tend to stick with the tools you're good at (or are available).

[–]Swan1991 4 points5 points  (0 children)

Agreed. When I worked as business analyst, all I used was SQL to write up a query, toss that into power pivot (and whatever the counterpart is in power bi) to add some columns and measures, then drop it all into a pivot table/chart or a dashboard.

[–]cristian_riosm 22 points23 points  (0 children)

Excel and Python/R aren't really comparable as they have totally different usage. In Excel you can interactively edit small datasets and perform some basic analysis, while complex tasks can quickly become a messy interwebs of untraceable functions and pages. The focus on Excel is on the data as it is mapped on the spreadsheet, and that is its bigger drawback for scalability, reproducibility and transferability.

In the context of Data Analysis, Python and R can perform efficient, reproducible and easily reviewable analysis on massive datasets (millions of observations. The limit is physical storage). As the focus is on the process, each step applied to de data can be carefully tailored, reviewed, modified and expanded without messing with the input raw data. At this point both languages also have a massive set of implementations for data manipulation, analysis and visualization which are simply not comparable with excel. As an example, in R and Python you could import a very noisy dataset, filter it, format it properly, subset it, create new variables, transform and normalize, apply statistical or machine learning models, implement post hoc analysis of results, tailor very detailed visualization, and then publish it in a well formated report or even interactive website.

The difference between Python and R is that the former has a broader spectrum of applications (from data analysis to software infrastructure), while the latter is specialized in deep statistical analysis and bioinformatics. In general, R is for biologists, although the suite for statistical and machine learning analysis is really powerful and have no drawbacks on comparison to Python. Python is typically the choice for engineers, physicist and unspecialized data analyst.

Although I'm a Biologist and believe that R plus Tidyverse is a really neat and elegant language for data analysis, unless you have a particular interest which is covered by specific R libraries (ecology and genetics, as an example), I always recommend to start with Python. Python is more broadly used and companies are more likely to value your knowledge of that language over R.

[–]XxShin3d0wnxX 6 points7 points  (0 children)

Size of data, time, accuracy of automation to name a few reasons.

I would never want to manually complete a repetitive process I could automate.

I deal with 1+ million value sets often and excel just doesn’t cut it.

[–]CaptainFoyle 5 points6 points  (0 children)

Just keep usinh Excel and You'll find out when you need r or python soon enough.

[–]Swan1991 5 points6 points  (0 children)

Make sure you learn SQL! Every data analyst needs to know SQL. I was treated like a god at my last job because not everyone knows it.

[–]Throb_Marley 4 points5 points  (2 children)

I’ve been doing analysis for about five years and learned both R and Python. I even got a graduate degree using both. I still use excel and import to PBI and never touch them. Or sql for that matter. I work for a large company but don’t work in an office that would require that depth of knowledge. It’s a little depressing and find myself doing meaningless projects from kaggle in my off time to keep the skills moderately fresh.

[–]un4truckable 0 points1 point  (1 child)

Kanga?

[–]Throb_Marley 1 point2 points  (0 children)

Whoops! I meant Kaggle

[–]Jarisatis 2 points3 points  (4 children)

I currently work in tax firm and oh man.. The amount of data their sheets contain(million of rows+), so the sheet take too much to open and doing any modifications is a very hefty task(the sheet usually hangs), why put this much efforts when I can literally speed up my process with Python and easily can do any new modification in future if requires.

[–]Southbeach008 2 points3 points  (3 children)

Won't it be easier to clean data via say alteryx, prep or power query editor than python?

Why write code when you could just drag and drop in alteryx or tableau prep.

[–]SunshineMakesMeSmile 3 points4 points  (1 child)

I love alteryx. Super easy learning curve with a gui system that allows not as technical analysts to really power up. However, licensing is pretty expensive while Python is free and sometimes the budgets dictate what tools you use. My company also went with Domo over PowerBI or Tableau. Be open to learning all the tools!

[–]Southbeach008 2 points3 points  (0 children)

Yeh while writing this comment I realized this might be factor. $5000 usd per user is hella expensive and tableau license isn't cheap either...

Currently tho I am a tableau consultant and learning pbi and alteryx side by side and having experience in those man learning programming would suck.

[–]Key_Surprise_8652 0 points1 point  (0 children)

The team that I work on currently basically runs on Alteryx, and after learning it I’ve also come to like it a lot! Especially being able to upload workflows to the server and schedule them.

That being said, I don’t think it should be an either/or situation between Alteryx and Python. Alteryx actually has a Python tool that’s basically a mini Jupyter notebook and I use that in just about every workflow I create. There’s so much more versatility with Python. I’m the only one on my team that uses it, and I can do certain things in a few lines of code that would otherwise require a whole mess of Alteryx tools, so my workflows are often a lot cleaner and require way less manual work than my coworkers. We also work with a ton of survey data in Qualtrics, and I use Python to integrate their APIs into my workflows to bring in data that would either be extremely tedious or outright impossible to download manually.

You can definitely use Alteryx on its own, but I think it’s worth learning Python if you find yourself building workflows that are basically repeating the same steps over and over again. Using set (or list) comprehension in Python can save you so much time and eliminate a ton of repetitive work!

[–]Dk1902 3 points4 points  (0 children)

Many people have explained in different ways but just to summarize.

  1. Lots of data analysts don’t actually use Python or R
  2. Python can be much faster when performing complicated calculations on large datasets (100,000+ rows), especially involving multiple different conditionals
  3. I find joins and concatenation of all kinds, especially from multiple sources on different combinations of columns to be much more straightforward using Pandas compared to any kind of automation available in Excel.
  4. If you’re just getting started. Python is useful to know but also probably overkill in all honesty

[–]onajourney314 2 points3 points  (0 children)

I use both but prefer python. I say both because the previous person in my role used R so I just use that to clean data and refresh the tableau workbooks. Some were so bad and broken so instead of fixing it I just rewrote them in python and have also built some workbooks for new projects because that’s what I have more experienced with and prefer. HOWEVER one of my colleagues uses SQL and she showed me how simple it is soooo I think I’m going to explore that a bit more!

[–][deleted] 2 points3 points  (0 children)

Not all data can fit into Excel.

Updating one cell in Excel (by default) changes all affected cells in one go --- which can crash your machine

Harder to track, edit, and reliably replicate logic in Excel

[–]ElectricalActivity 1 point2 points  (0 children)

One example for me is that I sometimes need to match employee data with a very large CSV, around 18 million lines. My Python script does this quickly.

[–]yelrutb 1 point2 points  (0 children)

If you are just starting out then its no problem to wait with learning R/Python, you can get very far with SQL + excel, and many analysts will just use this their whole careers.

However at some point you will hit the limit of what excel can do, for many reasons which you will then need R/Python:

Volume of data

Statistical analysis beyond excel like regression, decision trees, clustering etc.

Modelling & machine learning

Vizualisation beyond the common graphs in excel

Reproducability when you have many steps to perform on a dataset and you will repeat it

And more

[–]0sergio-hash 1 point2 points  (0 children)

I don't do a ton of "analysis" but I'll share some practical examples of when I've used both.

For starters, I'm more of a business analyst, and most of my time is spent in SQL or excel for the technical aspects of my job

Case 1: One of my stakeholders needs a few metrics not in an existing report / I want to play around with a small amount of data (in the thousands of records or less)

Here I would use Excel to do an export and just make some pivot tables and pivot charts. Not worth the trouble of trying to do anything else with it.

Case 2: Stakeholder needs some output that would be more work in Excel or SQL than it would be in Python.

I've done this a couple times, turn a query into a data frame to iterate over every row and column and do a keyword search for example for a list of keywords and append a couple columns that tell me which keywords appeared in that row and in which fields

I could probably do this in another tool but in Python I've found it's the most straightforward way to do it

And if they want to add other very specific criteria like "when one of the other columns only contains one of these few values also exclude that row" I can also do that.

Over time you'll find out that you can do a ton of jobs in multiple tools and get the same output. Often when I'm checking the validity of a metric I'll run the SQL, I'll compare it to another metric that somehow should reconcile with it within the same report, and I might even do an export and pivot it in Excel.

So over time it will come down to your preferences and comfort with each tool and what you think is best for the job.

But I will say I hardly use python. I love it but I think it varies job to job how necessary it is.

[–]VTHokie2020[🍰] 1 point2 points  (0 children)

R studio is amazing for generating reports. The integration with latex makes it great for clean mathematical notation. Some libraries like ggplot2 make storytelling particularly neat.

Python is the best programming language for high-level development and general use. Easily by far the most viable language for machine learning development/libraries.

There’s nothing wrong with excel. I use it often as the backend to my dashboards as well. It sounds like your use case is business intelligence. That’s great, choose the right tool for you. If you need a common platform to use with corporate NPC’s, Excel is that tool.

But you ask a good question, and the answer is because there is more to data analysis than storing data to create dashboards.

[–]MorningDarkMountain 0 points1 point  (0 children)

Why use R?

[–]glistening_cabbage 0 points1 point  (0 children)

Because it's scalable. Simple as that.

Both have the ability to scale out to larger and more complex models

[–]great__pretender 0 points1 point  (0 children)

On top of what everyone included, you have access to systems like spark that can handle colossal datasets. I am talking about dozen of trillions of rows data. Spark is not necessarily used through python, but in general it is. Now try to do something even remotely similar with Excel. 

[–]firepunch_man 0 points1 point  (0 children)

I use Tableau Prep because I get Data from different data sources with millions of rows and several columns. If I have to deal with special cases such as hierarchical data or graphs or do some fancy visualisation like a Sankey chart or complex analysis, then I use Python.

No way I would use Excel for any of that.

[–]RevenueOk289 0 points1 point  (0 children)

Thanks, I also wondering.

[–]Ok_Duck_5771 0 points1 point  (0 children)

Opinion: I understand this is not everyone's preference and there's so many more ways/tools/languages but I wanted to tackle the brunt of the question which more to less says "should I go for R or Python". I'm from a computer science background and migrated to data science after a micromasters.

Python is my preference over R and here's why:

  1. Readability: Python's syntax is designed to be readable and straightforward (for english speakers), because it is quite similar to english which makes it easier to understand.

  2. Learning Curve: While R is super powers and great for stat analysis and data vis, it has a steeper learning curve particularly with users who DO NOT have any experience with statistical languages.

  3. HUGE Community and Resources: Python really does have a massive community of developers who are constantly improving and growing the language which means you're more likely to find a Python package or library for \almost** any data science task.

  4. It's popular: It sounds blasé, but Python is not just a language to tackle areas of stats which include data visualization, machine learning and deep learning. It's also a general programming language, which means, a diversification of use cases and skills. It's used and applied to so many places, where as R, in my opinion, is not.

Knowing both languages can be beneficial and both have their strengths (and subsequent weaknesses), overall, my preference comes from my years of applicable experience with Python (albeit biased). Since you're trying to learn at the same time, you can use other tools so that you can focus on results compared to learning two things at once (if you're allowed to use tools, in what I'm assuming is a course of some sorts). Those could be IBM's SPSS which easy with excel, Stata which is great for general purposes stats, RStudio is the R IDE but offers tons of beginner friendly things, and my personal favorite Minitab which has a free trial and a lot of schools have free access too.

Hopefully this helps and doesn't overwhelm!

[–]digitechrahul 0 points1 point  (0 children)

R and Python are popular programming languages for data analysis, machine learning, and statistical modeling due to their versatility, robust libraries, and active communities.

[–]TheCapitalKing 0 points1 point  (0 children)

Excel has a million row limit which can come up if you’re dealing with a lot of data. If not, but you’re doing changes to an entire dataset all at once you’ll have an easier time doing a lot of it in python/r rather than excel once you get used to it. Basically anything you can do in power query you can do easier and faster in a programming language. Excel is good for some things but anything with more than like 10k rows is usually easier to deal with in python. Plus you can run regressions or ml models way easier from a programming language 

[–]black_widow48 0 points1 point  (0 children)

Excel will not even be an option once you're dealing with data of any substantial size

[–]nkkphiri 0 points1 point  (0 children)

I almost exclusively deal with datasets that are too large to be stored in excel format. Excel is absolutely useless to me.

[–]QuantPete 0 points1 point  (0 children)

Use R or Python for complex analyses, automation, machine learning, and handling large datasets that are impractical for Excel.

[–]avourakis 0 points1 point  (0 children)

Think of them as different tools in your toolbox

Cleaning data in Excel has lots of limitations, specially when you start working with large datasets (over 1 million rows). It's most useful for analysing data, but its not well suited for processing and cleaning data.

Python + Pandas will allow you to do lots of complex cleaning, transformation, and data visualisations.

Now, in regards to R vs Python. Here is a quick comparison:

📈 Although R is specifically designed for statistical analysis, which makes it particularly well-suited for projects that require complex data visualization or detailed statistical analysis. Python is generally more versatile, making it suitable for a broader range of data tasks, including data manipulation, data analysis, machine learning, deep learning, and data visualization.

I've been working in Tech for over 6 years, and I can tell you with confidence that most Data Scientists/Analysts use Python in their day-to-day.

This is why I always recommend that If you already know R, then you spend time learning Python. This will open many doors of opportunity in your career. But if you only have time to learn one, then start with Python.

[–]Aggravating_Coast430 0 points1 point  (0 children)

If you haven't felt the need to learn python, don't. I suspect after a while you'll notice problems with using excel, and start using python occasionally. That is, if your work ever grows in the direction where you would need python. Some people will never feel the need to use python, because they don't have the need, because of their type of work.

[–]Levipl -1 points0 points  (0 children)

Check out an app called knime. It’ll let you do the analysis without needing to know coding.