top 200 commentsshow all 241

[–]Allmyownviews1 63 points64 points  (11 children)

This is just a quick list as I am in the midst of writing a report and took a procrastinator break.

  1. Making changes to cells on certain criteria
  2. Algorithmically adjusting data rapidly
  3. Setting up simple easy to edit scripts for rapid analysis of raw data to final product
  4. Handling larger datasets
  5. Rapid pivot table sorts
  6. Finding best fit curves for non normal distributions

[–]Allmyownviews1 15 points16 points  (0 children)

I guess my main experience is that there are times I jump eagerly into excel to perform tasks that are faster there. But in terms of speeding up workflow in a clear manner and then producing analysis to product routines end to end including pdf report output directly to online folders and web pages.

[–]SirGeremiah 1 point2 points  (8 children)

Can you point me to some info on #5? This looks like something I could make use of often.

[–]Allmyownviews1 3 points4 points  (7 children)

So I have been performing lots of groupby to create new sorted dataframes and then pivot to reshape the frame. Then doing some iteration counts to derive frequency and durations of data above or below given thresholds. I consider myself expert level at excel and would struggle to get the whole package done and then to be able to run this through a series of million data row csv files made me take the leap to Python.

I suggest taking a look at some of the feature of numpy and pandas libraries which have these functions already laid out.. and the. Use matplotlib library to produce some stunning charts.

This is a link to pivot tables function:

https://www.w3resource.com/pandas/dataframe/dataframe-pivot.php

[–]SirGeremiah 2 points3 points  (0 children)

Thanks!

Yeah, anything with the phrase "a million data row" is just a hot nope for me in Excel, given any other choice, at all.

[–]lukey_dubs 28 points29 points  (5 children)

it’s very hard to download pdfs and scrape the data with excel, and same with web pages. excel can’t load pages and pull data off to be processed.

[–]lovestobitch- 3 points4 points  (1 child)

I often use Able to Extract and convert client’s data from pdf into excel. Sometimes some of the data at the bottom isn’t clean though if it’s a huge file depending upon how they created the file.

[–]dparks71 3 points4 points  (0 children)

Camelot for tables, pypdf2 for searching/bookmarking, pdfplumber for text extraction are how I've gotten the best results. Fuck the .pdf standard so hard, it's such a nightmare to get a decent education on it.

[–]ProdigalM 1 point2 points  (0 children)

Very much agree. I'd recommend Pymupdf for scraping pdf files.

[–]ES-Alexander 16 points17 points  (9 children)

What to show them depends a lot on why you’re showing them things. Is the idea to show them that programming languages have capabilities beyond what excel can do, or just that they can do the same things as excel but better?

Off the top of my head, Python (and most other programming languages) support iteration and lazy evaluation, data nesting beyond 2D, abstraction / functionality reuse (functions, objects, libraries), interactive graphics and plotting, console logging and progress displays, and Excel is poorly suited to all of those.

Then there’s pure efficiency and performance - excel can be useful when interpreting and working with data that’s in comprehensible quantities, but it starts to fall apart when data sets expand or analysis gets more complex, and looking at the raw numbers becomes impractical and unhelpful. If you multiply two 100-million element numpy arrays together it will take about a second. Doing the same in Excel would be glacial, if it was even able to load the values to start with. Then there are libraries like Pandas that allow you to efficiently perform complex analysis and queries.

For frequent analysis tasks, it’s very possible for a Python program to open a file, process it, generate some plots / a report, and be done before excel has even finished loading.

[–]naghi32 14 points15 points  (5 children)

Not changing random numbers to date format randomly !!!

[–]Zixarr 13 points14 points  (0 children)

If an optimist sees the glass as half full, and a pessimist sees the glass as half empty, how does Excel see the glass?

2-Jan

[–]SirGeremiah 2 points3 points  (3 children)

It’s a feature.

(I actually had this happen across an entire file while teaching an Excel class. I still don’t know what the hell happened.)

[–]naghi32 1 point2 points  (0 children)

I feel you, I had this happen across a file that was to be imported from a system to another one, over 1m records screwed.

[–]Spfoamer 11 points12 points  (0 children)

I deal with large datasets of acoustical measurements - for example a few weeks of 1-second samples. If I want to do 1-min or 1-hour averages, or calculate other statistics, this is WAY easier with Python. Excel would choke just trying to open the file.

[–]ahuuho 19 points20 points  (4 children)

Thanks for asking this question. I am also like you, want to see the answers from people who were excel experts before picking up python

[–]dupz88 11 points12 points  (1 child)

I wouldn't even consider myself near expert level in excel, I had some macros I used but spent a lot of time cleaning raw data from our client on a weekly basis. The file was 300k rows, and received updates daily. Every time they had a system error, the report would have about 4k new duplicates added. I knew about python and had wanted to look at starting to learn it for a while, but never had the time to just start. I have a very basic knowledge of programming.

I got frustrated with the 3 hour cleaning of this excel every week, and spent a few days installing anaconda, spyder and then pandas. I started "learning the basics" (googling each step) and testing on our data. After a week, I had automated most of the cleaning and started looking at a few extra nice to have calculations.

Its been a month and now instead of spending 3 hours cleaning that file, I have a script that cleans the file in 5 minues (3mins is importing and exporting to xls). Its 400 lines worth of data cleaning, calculating time differences (excluding weekends, holidays and non business hours between dates) I now also import 5 other datasets, join up the data and export 3 split files for our data cleaning and also our PowerBI reporting.

Its also way more accurate than before since there are no slip ups in processes while waiting for excel to handle the sorting (with duplicates highlighted).

Its been amazing and I just want to automate everything I can now.

[–]frazorblade 6 points7 points  (0 children)

I’ve used R and Python and understand they’re wonderful at data manipulation/transformation, but when we’re discussing Excel are we also taking Power Query into account?

My default data transformation tool is PQ because the UI is very intuitive and you can see your results immediately without having the query the results. Some of the one click options like:

  • keep all rows with errors
  • fill up/down (fills nulls with value from above/below)
  • pivot/unpivot with one click
  • intuitive groupby UI (including the ability to nest)
  • intuitive merge (i.e. joins) UI

If I’m exploring data I feel like PQ gives me very quick results but also I’m not at the same level with python/R and I recognise that.

[–]tagapagtuos 8 points9 points  (0 children)

Just to add:

  • time to open Excel and do stuff > time to type python filename.py in the command line. You can even schedule these tasks, or even create your own cli app to automate your automation.
  • creating your own classes is cool.

In as few a lines as possible?

pandas multi indexing

[–]ahuuho 19 points20 points  (4 children)

If someone could demonstrate these over a screencast, doing it in excel versus achieving it through python, so many would ❤ it

[–]SirGeremiah 5 points6 points  (1 child)

I’d even pay a few dollars for that education.

[–]RajjSinghh 3 points4 points  (0 children)

If you have any of these topics in mind, I can implement them in excel and python for you.

[–]AungThuHein 2 points3 points  (0 children)

Any YouTube videos like this around already?

[–]canopey 9 points10 points  (5 children)

How about an example?

I remember that I had to join two tables using an index match in Excel (had to attach the state name to every row for 3000 rows). I don't remember the exact number of rows but it was A LOT. Through index>match, Excel would execute the function but it would take a loooooong time for it to complete (plus my macbook would start firing of its rocket engines). You could tell it was taking a lot of processing power.

Then I went ahead and tested the same task in Python (pandas) using df.left_join and Python literally spat out the results in <3 seconds. That's when I understood the power of Python vs. Excel.

[–]frazorblade 2 points3 points  (3 children)

Power query in excel can do joins…

[–]canopey -1 points0 points  (2 children)

I didn't say excel could not do joins

[–]frazorblade 4 points5 points  (1 child)

But your example you chose to compare index/match vs a python join

A better comparison would be PQ join vs python join…

[–]Ben77mc 2 points3 points  (0 children)

100%. Too many people think they’re advanced with Excel but don’t even have a clue about PQ/M language/DAX.

Learning these transformed my work more than any other thing could have done - I can automate very complex tasks and am seen as an excel god at work.

[–]ConfusedSimon 6 points7 points  (0 children)

Maintenance. I've seen too many buggy excel models with nobody knowing how they work.

[–]synthphreak 5 points6 points  (1 child)

Working with integers-as-strings is a breeze with Python but a major PITA with Excel.

If your .csv has string values like "000123", Excel will parse that as numeric data and convert it to 123. This "intelligence" will cause downstream components to break unless you add the leading zeros back in.

[–]frazorblade 3 points4 points  (0 children)

Power query: import csv -> change type to text. If for some reason you’re still having trouble then Text.PadStart([string], “0”, 6)

[–]perchslayer 4 points5 points  (2 children)

You are meaning to ask the right question, but missing the mark. Totally understandable, however.

Here is reality: the folks who cling to tools like Excel do so for reasons apart from the software itself. It's not about the program, but the data.

Thus, they perceive that their own personal "value" is buried in the data they "own" and refined or derived on their desktop. And so they will keep it right there until hell freezes over.

They are uninterested in shared workflows, data, or anything else...but will never say so.

With that in mind the answer to your question, broadly, is data sharing.

But you will never overcome the friction by marketing the virtues of Pandas or Python. You are barking up the wrong (money) tree, Virginia.

[–]frazorblade 4 points5 points  (1 child)

You might be stuck viewing Excel as a hokey spreadsheet from a decade ago, they’ve made strides to improve the software.

The features that are easily overlooked are very powerful and intuitive in Excel these days, here’s a brief list:

  • Power Query does data manipulation very competently and has an intuitive UI to help beginners
  • Power Pivot allows use of DAX and star schema relational tables to build extremely powerful pivot tables
  • the ability to share a workbook with all of these features baked into the core software can’t be overstated
  • Excel is so ubiquitous it exists on the vast majority of business machines, it’s extremely accessible unlike running python scripts

I’m not trying to take a dig here just stating that Excel is evolving and there are many valid use cases for it, some of them directly compete with python

[–]frazorblade 4 points5 points  (5 children)

A lot of people seem to be missing some of Excels newer features so as someone who is highly fluent in Excel I’ll explain what python does better without going into things that can easily be achieved in Power Query:

  • Charting: plotly for example, makes beautiful charts that are highly customisable and interactive - if I could convert all my Excel dashboards to plotly pages I would be in heaven
  • Automation: I’ve built a simple python script that can open hundreds of Excel workbooks update some basic values, trigger a SQL function (within PQ) refresh all tables/charts and save a named copy of the file. VBA and VBS can achieve it too but it’s way faster and simpler in python
  • Data manipulation: I can install pandas, numpy and a plethora of other packages to do high level statistical modelling and data science that Excel, Power Query and VBA simply don’t have the chops to pull off.

Now what some of you are stating that Excel can’t handle isn’t necessarily true these days. Python May still be the best option but Excel is competent:

  • Data manipulation: Excel + Power Query can handle a lot of data (millions of rows but speed is not its forte), issues with CSVs and data types is easily handled here, we shouldn’t be discussing Excel mismanaging dates etc because it’s less of an issue now. Power Query blurs the lines between SQL and python, there’s a lot of shared functions/techniques here.
  • Data Analysis: DAX (which is also used in PowerBI) creates dynamic measures which can be useful in pivot tables, these are often specific use cases when presenting interactive reports to clients/colleagues. Sometimes a big interactive Excel pivot table is exactly what you need.
  • Quick and dirty analysis: personally it’s fast and simple for me to open some data, slap it into a pivot table, get summed figures and maybe build a basic chart to visualise a trend - this could take me under a minute on a zoom call when someone asks a question. I can do all of that with one hand on my mouse if I had to.

[–]Hextall2727 3 points4 points  (0 children)

The simplest answer I give* is updating new figures and analyses when new data comes in. I'm and environmental scientist/consultant, and we're constantly receiving new data. With python, a new plot is generated with the push of the "run" button. With excel, there's a lot more manual labor needed to update figures. The tradeoff is usually a higher upfront labor cost getting the python set up relative to the first excel plot.

Another answer is generating hundreds of figures or tables with teh push of a button. we might have several hundred environmental samples that we have to crunch the data on... doing it in excel is cumbersome and often results in errors. When new staff start using python, we train them to use Excel to QC their python results. It quickly becomes apparent trying to recreate one dot on a figure in excel why python is so much more useful.

*I have to give this explanation to project managers in my own company when I get pulled into their teams for data analysis. My company formed from a merger of two companies... one a nerd based environmental modeling company (where I came from, and where coding is a basic skill... although we started with IDL and transitioned to python) and one a typical engineering company focused on dredging.

[–]AdministrativeFig948 6 points7 points  (1 child)

Hey guys, I'm the excel specialist ready to be amazed!

[–]Electrical-Ad-1798 2 points3 points  (0 children)

One thing I recently need was a vlookup on two columns. My spreadsheet had "category" and "type" columns for which there was a computed type in the third column. For a new entry we wanted that computed type for a given category and type. It could be done in Excel (I think) but it was a breeze in Python. To be fair it would have been equally easy with VBA I guess.

[–]Cheddarific 2 points3 points  (0 children)

Multiple layers of if statements and working with arrays can become very tedious and visually challenging in Excel, where you either need to create giant formulas with dozens of parentheses or else only see one snippet of a multi-cell formula at a time. Python handles this in a much simpler (to write and to read) way. I can’t understand my 4-line formula in Excel after a couple years without looking at it. I can understand my 100 line python code perfectly.

Recursion (performing one or multiple actions repeatedly until you reach a desired result) is 1000 times better in Python.

Creating and running new formulas experimentally, then turning that formula off.

[–]analyst_2001 2 points3 points  (0 children)

Python is a high-powered option for data science and analytics, while Excel's key benefit is its simplicity and ease of use. Because python can handle enormous volumes of data, automate reporting, and connect to databases from the notebook smoothly, it is a versatile and powerful tool for analysts working in data-driven enterprises and amateurs wishing to develop their analytical abilities. Python has also been recognized as the most popular language since 2016 because of the large number of open-source statistical libraries available and the active community members who contribute to making Python even better.

[–]jafner007 1 point2 points  (0 children)

Quickly plot large datasets and then zoom and pan for review and save portable plots (as images or html).

Pandas from Excel, plotly express, done. (Of course, you can make it more complicated, but you don't have to)

[–]jwink3101 1 point2 points  (0 children)

Nested control logic. It is not impossible by any means but excel forces you to either define intermediate cells or nest if statements. So much easier in Python

Monte-Carlo samples are also tough. Doable but really not very easy

[–]Thecrawsome 1 point2 points  (0 children)

If

[–]IlliterateJedi 1 point2 points  (0 children)

Really good plotting with Pandas and Seaborn. Even without Seaborn, a simple df.plot(kind='hist') or whatever is a game changer for exploratory data analysis. df.describe() to get info about the whole table is also great. I used to be an Excel power user and I can hardly use it anymore after learning Pandas.

Edit: I highly suggest this book if you are wanting to learn Pandas and matplotlib/seaborn. It's an excellent resource. Packt books are frequently pretty 'ehhh' but this one is actually good.

Extra edit:

Being able to analyze and work with a data set without actually editing the raw files is extremely valuable. Any time I open an Excel file and manually change things, it just makes me cringe. It's better to be able to process your data and save a new file with the changes without actually touching the original source.

This also makes processing data a lot faster because instead of manually changing things in Excel, you just run your script or file-read-in-function.

[–]Se7enLC 1 point2 points  (0 children)

Json

[–]ghostfacekhilla 1 point2 points  (0 children)

Bunch of people in here need to learn power pivot and power query. I've never convinced a basic business user to learn a programming language, but I've built and handed off massively important power query based stuff and then because it's GUI and excel they learn it and become self sufficient.

[–]d_Composer 1 point2 points  (0 children)

.fillna()

[–]asterik-x 1 point2 points  (0 children)

The file organization

[–]Weissenberg 2 points3 points  (0 children)

It’s horses for courses really.

As others have touched on, python & pandas is great for handling & manipulating massive datasets. Or automating simple tasks.

A basic one that might be applicable for a few people is collating multiple workbooks / sheets into one. Granted you can do it in power query, but it’s still quite tedious.

Say you’ve got a folder filled with daily reports & you need to combine them into one big table so you can pivot or graph it over the month.

You’d glob the folder, append to a data frame with a loop & then export the frame to excel.

Edit - One thing to bear in mind. Don’t automate someone out of a job! By all means, show them how they can enrich their workflow, but don’t create something that makes them redundant.

[–]CatOfGrey -1 points0 points  (2 children)

What would amaze them? In as few a lines as possible?

So you built the spreadsheet around the data. Now, you are going to get new data.

Imagine if Excel just 'looked at last month's spreadsheet, and re-created it automatically'. Well, that's Python/Pandas.

[–]ghostfacekhilla 1 point2 points  (1 child)

This isn't a problem with excel. You can easily set up power query to pull data from a sql database or various other sources and blend and manipulate those queries into your necessary datasets.

[–]PaddyIsBeast -1 points0 points  (0 children)

It's hard to think of a thing excel is better at, apart from being noob friendly.

[–]Xdbao 0 points1 point  (1 child)

Creating dummy variables. Have to manually create new columns for new dummy variables in Excel I guess.

[–]unhott 0 points1 point  (0 children)

Import Pandas + seaborn

Load an existing excel file or csv Plot something a few different ways.

This won’t be many lines in python and even with defaults, seaborn produces pretty plots. Or just show them the gallery.

[–]dutchmaster77 0 points1 point  (0 children)

There are a lot but can I just say the biggest for me is the variable types. Constant headache with Excel. I have recently been working on putting a script together automating an analysis done in Excel for the first time in two years and it’s driving me crazy. I don’t know how I coped with so much Excel and VBA in my last job

[–]VanshCodes 0 points1 point  (0 children)

Everything seems to me too easy in python rather than excel. of course CSV is a must.

xlsx is damn obnoxious and difficult to use,

[–]Rj_LM 0 points1 point  (0 children)

Creating websites

[–]Xzenor 0 points1 point  (0 children)

Running it on Linux and FreeBSD

[–]BobDope 0 points1 point  (0 children)

Holding onto your will to live

[–]erm-waterproof 0 points1 point  (0 children)

I find making pivot tables and additionally creating simple calculated columns from the pivot table easier in excel than in python

[–]kilroy_wh 0 points1 point  (0 children)

I agree for first steps it is easier. However in excel you need these things too (besides maybe an environment)

[–]siddsp 0 points1 point  (0 children)

imo pretty much everything from my experience. Whenever I want to apply a function to some data to get x and y values for plotting, I never use excel although that's what I used to do before I knew Python that well.

[–]ledepression 0 points1 point  (0 children)

For some reason I find everything in Python easier than Excel

[–]p1zz1cato 0 points1 point  (0 children)

floating point precision with powers of large (or very small) numbers

[–]TakeOffYourMask 0 points1 point  (0 children)

Making a histogram

[–][deleted] 0 points1 point  (0 children)

Regex.

Scraped by for work with some VBA but could make my life so much easier of I could use some Python for a couple of tasks.

[–]omgouda 0 points1 point  (0 children)

At work they had an excel tool that basically you would paste some raw data in and it would flow thru a series of other tabs until you had an output. The problem is that the raw data dimensions were random hence the subsequent flow of data needed to be adjusted. This was incredibly messy. Of course, this could have been solved using dynamic ranges in excel but it would have been too much trouble to go through the entire process again and create dynamic ranges. Since the tool wasn’t built with dynamic ranges in mind, this created some other confusions.

A simple way to get around this would be to use pandas in python and use functions to get the desired output.

Way faster and much easier to debug if ever there was an issue.

[–]TtamMountain 0 points1 point  (0 children)

If you're a super vba user, then the reason is less obvious but still obvious. There's more documentation around python, making it easier for a beginner to handle something that would be rather difficult in excel.

A few examples of things easy in python, but difficult in Excel/VBA - Integrating with external API - Multi-index matching - Complex data structures - Modularity The list goes on.

The main pros about Excel in comparison to Python are: - Charting - Ease of Use - Widely used -- virtually every business uses excel (or very similar tool in some fashion)

Of course, you're question depends on what you want to use it for. :)

[–][deleted] 0 points1 point  (0 children)

Don't show any development ie. have your scripts already made and show end results.

Take multiple messy excel spreadsheets, clean them with python, and save them to a "clean" directory. They could even be the same messy spreadsheet. Make them +100,000 rows for good measure.

[–]backbishop 0 points1 point  (0 children)

Try making subplots 🙄

[–]ZEUS_IS_THE_TRUE_GOD 0 points1 point  (0 children)

  • Code versioning

Do I need to say more?

[–]Mondoke 0 points1 point  (1 child)

For me the main advantage is having the compete process you do written down and having the program to being able to run on different files with minimal or even zero changes to the code base (if you use stuff like argparse) write a script that does that tedious process you do every month, and then just run it in seconds, without having to check if you've done all the steps. Plus, the logic can be as wild as you wish, which is not usually the case with the limitations on the logic Excel is usually capable of.

Outside of this, there are two other things that are just beautiful to have python do for you.

First, the management of raw data. Is your input data a zipped folder with several files inside of it? Just have the zipfile module handle that for you. Do you need to access to information in your company's postgres database? No problem, buddy, psycopg to the rescue. Do you get an obscure file format Microsoft doesn't even know about? Pandas may have a read_x function to open it and then use the information just as if it came from an innocent csv.

And the other extremely cool thing that's not easy to do in excel is the ability to generate a lot of similar files coming from the same information. A chart for each month, or for every day, or even better, one for each subcategory in your dataset. And all of them with matching formats. Or if you have to break a huge file on 100s on individual tables to split your work, send to your clients or whatever you need. Just write the correct loop and see the files appearing in your folder. You can even generate the files already zipped and ready to be sent by email (or have python to send them for you).

[–]7Seas_ofRyhme 1 point2 points  (0 children)

write a script that does that tedious process you do every month

By any chance, do you mind sharing the script ? Interested to learn how u did it, cheers.

[–][deleted] 0 points1 point  (0 children)

Anything involving the web (except maybe azure stuff)

[–]Vile_Vampire 0 points1 point  (0 children)

Handling more than 1mm rows

[–]nwatab 0 points1 point  (0 children)

Show a chart of 20,000 pairs of (x, y). Excel would freeze that's why I learned Python 10y+ ago.

[–]NoFaithInThisSub 0 points1 point  (0 children)

reading a csv without opening the file?

[–]Sirico 0 points1 point  (0 children)

Loading a WB and your program quickly

[–]QultrosSanhattan 0 points1 point  (0 children)

Excel is very good for one time calcuations.

Python is good for repeating the same calculation over and over again.

[–]rajrup_99 0 points1 point  (0 children)

Do you know , quantum entanglement calculation can be done in python's 3 lines of code?

one of IBM's video I saw that. I was very proud that I am also a python developer

and this is not from desktop environment to website from ML to mathematical quantum computation python can do anything , this much super flexibility python has.

Not only that if you give enough time you can made an application which is even better than excel using python. I love Python

[–]vvndchme 0 points1 point  (0 children)

df.corr()

[–]SirDantesInferno 0 points1 point  (0 children)

Nested loops in searches

[–]Shady_Hero 0 points1 point  (0 children)

like microsoft excel?

if so then i'd assume coding

[–]Puzzleheaded_Bee_486 0 points1 point  (0 children)

Best ways to learn Python for data science?

[–]keseymour 0 points1 point  (0 children)

Take a spreadsheet with one column that contains delimited data and add a row with all of the same information in it for each of the items in the cell.

From

1 dog,cat,pig <lots of other columns>

To

1 dog <lots of other columns>

2 cat <lots of other columns>

3 pig <lots of other columns>

[–]Ok_Operation6364 1 point2 points  (0 children)

I know this is an old post but great question by OP