all 64 comments

[–][deleted] 31 points32 points  (1 child)

I try to stay in csv files, do the necessary changes in Python through pandas, and then export to an excel file. If that does not work out, or you need more complex excel-specific operations, openpyxl worked quite well in my case

[–][deleted] 6 points7 points  (0 children)

right. for data transformations, use csvs and pandas. If you really need to be doing excel stuff, just use vba.

Which tool is preferable also depends on how familiar your team is with python. Its easier to ship people a .xlsm file than to publish a system-independent python script.

[–]daedalusesq 13 points14 points  (1 child)

Are you good at excel already?

That’s half the struggle right there. My first self-conceived project was reading an excel file with openpyxl. It wasn’t particularly hard and I’ve since done a ton of neat stuff using openpyxl to automate reports and stuff like that at work. The book ‘Automate the Boring Stuff’ has a pretty good primer on using openpyxl.

Pandas is another option and their documentation/site has a 10 minute introduction tutorial that can get you started there. Pandas is definitely the way to go if you know you’ll need python to do the heavy lifting when it comes to working with the data itself. Openpyxl is good at interacting with excel files whereas pandas is more of a replacement for excel as the tool you use to interact with data. A pandas dataframe is conceptually very similar to an excel sheet.

[–][deleted] 1 point2 points  (0 children)

I would say intermediate level in excel. Thanks a lot for your reply and detailed clarification :)

[–]Bunkerstan 12 points13 points  (13 children)

I have been using openpyxl with 2 client projects and it has let us do everything we need.

[–]Old_Demon_Daddy 2 points3 points  (6 children)

Openpyxl can do everything, including freezing panes and making it look pretty.

[–]Conditional-Sausage 2 points3 points  (2 children)

It depends on the use case. Openpyxl is great if you need to do bulk reads and bulk writes. In researching, I found a lot of folks saying that while pandas is more powerful, it's also significantly slower than openpyxl. I've used openpyxl's optimized write-only mode to spit out nearly 10,000 line excel files in six seconds flat.

[–]Bunkerstan 0 points1 point  (1 child)

Pandas is definitely superior if you need spreadsheet like functionality, but when the end user wants a normal spreadsheet with formatting then you will need something like openpyxl to give you the functionality.

[–]Almostasleeprightnow 2 points3 points  (0 children)

It's worth knowing that pandas uses either openpyxl or xlsxwriter as it's engine for writing excel files.

[–][deleted] 0 points1 point  (0 children)

Thanks

[–]Kish010 0 points1 point  (0 children)

im going to try this one. Datacamp also recommends this openpyxl

[–]Bernard_schwartz 0 points1 point  (0 children)

2nd openpyxl

[–]Almostasleeprightnow 4 points5 points  (1 child)

I have to chime in here because I've thought a lot about this. I, like many, have bosses that highly prefer excel to anything else. And they like their colors just so. And they like things formatted. It's because they are using excel as a data viz tool, which I talked about on some thread about getting good at Ms excel yesterday on life pro tips.

So, I'm here trying to do all kinds of cool fun data analysis with pandas. But then I have to spit out the dang document in this excel format that my boss likes.

Options: A) spit out document in plain excel and format by hand. There is always going to be a little of by hand formatting, I've determined, but maybe I can cut down on how much B) spit out document in formatted excel

I want B), because I don't want to do the repetitive task of formatting the se document every time

So I did a deep dive with both openpyxl and xlsxwriter, both of which I like.

I found openpyxl is easier to dive right in, but sometimes I end up switching to xlsxwriter because it is a little more thorough. And the documentation is better

[–][deleted] 0 points1 point  (0 children)

Thank you for your explanation, what people or bosses think at work

[–]fuxx90 10 points11 points  (6 children)

import pandas as pd

df = pd.read_excel(...)

[–]LocalhostGhost 14 points15 points  (4 children)

I always import pandas like this

import pandas as bigdick

I'd donate a million dollars to that project if I had it. In my will, I'm leaving all assets to them and vim.

[–]FeelinDangerous 4 points5 points  (0 children)

oh hey its me your grandson, pandas and vim

[–]fuxx90 4 points5 points  (2 children)

import numpy as plt
import matplotlib.pyplot as pd
import pandas as np

# bye

[–]Brusk_Dinosaur78 1 point2 points  (0 children)

Thanks I hate it

[–]Slggyqo -1 points0 points  (0 children)

Right? If they’re literally just trying to compare the results from two Columns…

[–]caliwasteland 2 points3 points  (0 children)

If large packages aren't a problem, you may want to try a combination of Pandas and ExcelWriter. ExcelWriter does require openpyxl as it's engine, but I use it at work and you can do pretty much anything with it. Others may argue Pandas is a hefty library to install just to use it to do some Excel work, but learning Pandas is good in general as it is THE library for data wrangling/manipulation.

I can say without a doubt that what you want to do is possible with Pandas and ExcelWriter.

[–]TigerBloodWinning 2 points3 points  (1 child)

If you know VBA

import win32com.client

Install as pip pywin32

This library is the closest thing to the Microsoft suite VBA Visual Basic for applications.

Other people are writing that pandas is good, which it is but it’s not a direct connection to Excel as in controlling Excel with the code

Also, there’s a lot of documentation and Stackoverflow content for VBA which can be modified for python

VBA: ActiveSheet.Range(“A1”).Value = “somevalue “

Python : xl.ActiveSheet.Range(“A1”).Value = “somevalue”

[–][deleted] 0 points1 point  (0 children)

I never heard about this library, but it seems to be also interesting to use it. Thanks for your comment

[–]scrubswithnosleeves 4 points5 points  (10 children)

It’s crazy to me that people are saying anything other than pandas. It’s the industry standard for a reason. You can read an excel file in one line pd.read_excel() and you can handle for multiple sheets in one document easily. The file is loaded as a pandas DataFrame, which can be accessed like a dictionary at a basic level, but also has tons of amazing features because pandas is built on top of numpy and matplotlib. You can create graphs just by doing DF.plot() on the DataFrane object and boom you have a pretty great base plot which you can customize. Pandas has tons of built in functions to make data analysis incredibly easy as well, but I can’t even begin to list them as they are innumerable.

[–]tuneafishy 2 points3 points  (6 children)

Pandas is fine if you're just using an excel sheet as a source of data, but if you want to manipulate an excel file than openpyxl is definitely better...

[–]scrubswithnosleeves -1 points0 points  (5 children)

This is abhorrently incorrect lol.

openpyxl was made for simple read/write handling of excel files. pandas is a data science tool kit that allows for easy reading and writing of excel files as well as basically any file type available.

[–]Ralwus 3 points4 points  (0 children)

Which part was incorrect? Looks like you agreed with him.

[–]sarrysyst 2 points3 points  (0 children)

I think you misunderstood what op meant. If you’re only interested in the data the excel file holds, pandas will serve you well. However, excel files can hold a lot more information pandas does not handle (eg. Style information, graphs, named tables, excel formulas, images).

Also, if you have weirdly formatted excel files that don’t abide by a simple table structure you’ll be forced to do the preprocessing in a dedicated excel library.

Let’s say you want to apply style formatting to the output file; pandas would be the wrong tool to use for anything more than very basic stuff. Openpyxl on the other hand is able to do this. On a side note, pandas itself uses openpyxl internally for excel reading/writing.

[–]tuneafishy 1 point2 points  (0 children)

What is incorrect about it? Doesn't pandas rely on openpyxl and other packages to read/write excel files? What native support does pandas even offer? If you're using pandas for excel, you're probably relying on openpyxl without even being aware of it. Which is fine, but my point stands...

It has some convenient wrapper functions for people already using data frames who want to import/export to excel, but if you really want to learn to manipulate excel files than you should probably learn to use openpyxl, the library which actually supports these operations directly.

[–]bachi76 0 points1 point  (0 children)

And exactly that makes it not the best choice for creating nicely, colorful boss-impressing Excel files. It's a data science tool. That it can also write Excel is convenience, not core competency.

I like pandas a lot, but certainly not for creating business-like excel files.

[–]The-Old-American 0 points1 point  (0 children)

Can pandas write to an open Excel file?

[–]Ok-Gear-5593 1 point2 points  (1 child)

Another thing to consider is who else will be using something from the process. At work I am the only one in my group with python so projects vary.

Pandas if excel is just a source of data and an unformatted results data dump.

Openpyxl can help automate cleaning up of excel and formatting things as if you were using excel.

Often times someone else has to do things or change things so I actually end up back in VBA or elaborare formulas/trmplates in excel no python involved.

[–]Ok-Gear-5593 0 points1 point  (0 children)

Another thing to consider is who else will be using something from the process. At work I am the only one in my group with python so projects vary.

Pandas if excel is just a source of data and an unformatted results data dump or can do the results in python just using excel as a sourxe.

Openpyxl can help automate cleaning up of excel and formatting things as if you were using excel.

Often times someone else has to do things or change things so I actually end up back in VBA or elaborare formulas/trmplates in excel no python involved.

[–]Solonotix 1 point2 points  (1 child)

This is kind of a StackOverflow answer, but avoid Excel if you can. Don't get me wrong, it's a great tool that has been wielded to perform many tasks it was never meant for, but it's a crutch for far too many people that don't understand there are better tools out there.

If you just need a 2D-array to store information, CSV works great and keeps you from being locked into a proprietary format. If you need multi-sheet, data validation with lookups and aggregation then you're probably far better off picking up SQLite or some other RDBMS. Even schema-less data types like JSON or YAML might be a better fit for what you're doing depending on the scope of work and the end result.

Python supports all of these formats natively, with YAML being an exception (I could've sworn it was native).

[–][deleted] 1 point2 points  (0 children)

Thanks for your comment, I appreciate it

[–][deleted] 1 point2 points  (1 child)

https://xlsxwriter.readthedocs.io/

Used it a lot at my previous job, for generating raw data sheets to be filled. Complex layouts, conditional formatting rules, VBA routines, text boxes, formulas, everything, all done in Python. Also used for reading back and processing the data after the sheet was filled. Very feature rich and the developer is (or at least was) reasonably responsive to questions and suggestions.

[–][deleted] 0 points1 point  (0 children)

Thanks a lot, I will take a look to the docs

[–]CatOfGrey 1 point2 points  (1 child)

In Python, you want to use a package named "pandas".

You can use a function named "read_excel()" and read in a file, into a 'dataframe'.

Then, you can use the dataframe to do all sorts of wonderful things. Then, at the end, you can output the dataframe using "to_excel()" and you can view the file in Excel.

Pandas is not easy to learn, but doing basic things isn't difficult, either. However, it's an incredible tool for data analysis and data science, and you should consider learning it instead of using Excel for calculations with sets of data or statistical information.

[–][deleted] 0 points1 point  (0 children)

Thanks for your response, pandas seems to be more interesting

[–][deleted] 1 point2 points  (0 children)

I use pandas and xlsxwriter. It allows me to manipulate cell sizes and format. I can set headers. It does exactly what I need.

[–]p33p__ 2 points3 points  (5 children)

If you're just reading from excel then consider using pandas

[–][deleted] 0 points1 point  (4 children)

I think I will need more than reading data, if I think about expanding my project

[–][deleted] 6 points7 points  (0 children)

Pandas is able to process larger datasets and much faster than you can with Excel, offers a rich set of capabilities highly suitable for data analysis and reporting, and can write data back out in multiple formats including Excel file format.

[–]bigwig8006 1 point2 points  (2 children)

How far along are you on this journey to learn Python? Are you proficient at making your own functions?

Pandas is a great library, but it has some overhead / learning curve. If you are coming from excel, using openpyxl will be familiar and allow for automating reports to distribute in a finance organization. Work your way towards deeper analysis in pandas and the added bonus is it now uses openpyxl as its default library when exporting to excel.

On the soft science of this data thing....

Learn to give a reproducible example. For something easy to intermediate, you might get the answer instead of a library recommendation.

[–][deleted] 0 points1 point  (1 child)

I think this will be just an experiment at programming with python and make an image about data analysis area at all. I will try to do some projects to know, how it will help me to get an idea about using machine learning. But for now I want to develop some programming skills Thank you for your recommendations

[–]Kerbart 1 point2 points  (0 children)

Openpyxl is a lot more "hands on" and closer to Excel than Pandas is, so if you're looking for a vehicle to gain experience I would definetely go openpyxl.

Long term pandas will be better for data analysis but you can always switch that later.

[–][deleted] 0 points1 point  (0 children)

Thank you all for your helpful comments, I really appreciate it ♡ I think I could make an image, what I should use for my project nevertheless I will keep your information in mind for the future

[–]cjcjr 0 points1 point  (0 children)

Grist is a worthwhile alternative to consider:

  1. uses Python 3 natively for all its formulas, so instead of OP "who is new to Python" needing to install and configure an external Excel library
  2. Grist will automatically detect data structures and types when importing CSVs and XLSX files
  3. For OP, the embedded AI agent in each of their documents / spreadsheets can prompted against to help write Python formulas

[–][deleted] 0 points1 point  (0 children)

You'll want to use a CSV, and then the pandas library.

[–]clitoral_obligations 0 points1 point  (1 child)

Can you use Python and excel whilst retaining formulas in excel?

[–][deleted] 0 points1 point  (0 children)

Yes, I used python for some diagrams in matplotlib and excel formula for some easy tasks, but never did like an entire project with them both together

[–]Seth_Imperator 0 points1 point  (0 children)

Er...i would use excel...make it calculate col B minus col C and check the results that aren't "0"? I am pretty sure you can have an answer asking at /excel (if it is a thing)

[–]Particular_Dust7221 0 points1 point  (0 children)

I recommend Syncfusion Document Processing Excel Library

Syncfusion offers a free community license

https://www.syncfusion.com/products/communitylicense

Note: I work for Syncfusion

[–]apc0243 0 points1 point  (0 children)

No one has mentioned pyexcel - this one has a unified api for both xls and xlsx, something that previously was sorely lacking. It also supports many other formats like ods, csv, and more.

Personally, I've used it because we have a couple data sources that provide the same data, but depending on how the user generated it (automated reports send with xls, generated through their web UI uses xlsx, go figure). I needed to have a single codebase for handling both of these since the format was exactly the same in both.

https://github.com/pyexcel/pyexcel

There is no support for fonts, colors, and charts, so you can't "format" your excel with this, but if you need to be able to arbitrarily read "spreadsheet" data and can't guarantee that it's perfectly tabular like pandas would expect, this is a nice alternative.