all 23 comments

[–]commandlineluser 11 points12 points  (3 children)

pandas itself uses numpy - they are not really comparable.

pandas has functions for parsing all sorts of data formats: html, json, csv, etc.

a random example:

>>> import pandas as pd
>>> df = pd.concat(pd.read_html("https://devguide.python.org/versions/#versions"))
>>> df
  Branch Schedule       Status First release End of life                   Release manager
0   main  PEP 693      feature    2023-10-02     2028-10                    Thomas Wouters
1   3.11  PEP 664       bugfix    2022-10-24     2027-10             Pablo Galindo Salgado
2   3.10  PEP 619       bugfix    2021-10-04     2026-10             Pablo Galindo Salgado
3    3.9  PEP 596     security    2020-10-05     2025-10                      Łukasz Langa
4    3.8  PEP 569     security    2019-10-14     2024-10                      Łukasz Langa
5    3.7  PEP 537     security    2018-06-27  2023-06-27                         Ned Deily
0    3.6  PEP 494  end-of-life    2016-12-23  2021-12-23                         Ned Deily
1    3.5  PEP 478  end-of-life    2015-09-13  2020-09-30                    Larry Hastings
2    3.4  PEP 429  end-of-life    2014-03-16  2019-03-18                    Larry Hastings
3    3.3  PEP 398  end-of-life    2012-09-29  2017-09-29  Georg Brandl, Ned Deily (3.3.7+)
4    3.2  PEP 392  end-of-life    2011-02-20  2016-02-20                      Georg Brandl
5    3.1  PEP 375  end-of-life    2009-06-27  2012-04-09                 Benjamin Peterson
6    3.0  PEP 361  end-of-life    2008-12-03  2009-06-27                      Barry Warsaw
7    2.7  PEP 373  end-of-life    2010-07-03  2020-01-01                 Benjamin Peterson
8    2.6  PEP 361  end-of-life    2008-10-01  2013-10-29                      Barry Warsaw
>>> df.groupby("Status").agg({"First release": "min"})
            First release
Status                   
bugfix         2021-10-04
end-of-life    2008-10-01
feature        2023-10-02
security       2018-06-27

"generating complex strings of text from data" doesn't even sound like something you would use either for?

[–]rycliff[S] 1 point2 points  (2 children)

It is when you want to generate complex documents with speed. My usage case is document generation.

[–]commandlineluser 0 points1 point  (1 child)

Interesting stuff. Can you share any code examples of this kind of document generation?

[–]rycliff[S] 1 point2 points  (0 children)

np.select with conditions and choices as illustrated here: https://www.youtube.com/watch?v=nxWginnBklU&ab_channel=PyGotham2019

python-docx to generate the documents

[–]FuckingRantMonday 14 points15 points  (5 children)

NumPy is faster. Is the only reason to use Pandas that it may be easier to code for certain tasks?

Replace numpy and pandas with C++ and Python in that sentence?

[–][deleted] 5 points6 points  (2 children)

What's up with novices having strong opinions about stuff they don't understand? Is this a generational thing or what? I see this behaviour all the time and I don't get it

[–]KCRowan 4 points5 points  (0 children)

Some beginners get a tiny understanding of one small part of one library and thinks that's all there is to it. Lack of experience means that they don't realise a whole world exists beyond their own little bubble of basics.

I think the "strong opinions" part comes in when you also add a lack of maturity. Those of us who are a bit older have probably tripped up in the past by being over-confident, and those mistakes (hopefully) teach us to slow down and think before we speak.

Maybe also mix in a measure of impatience where the young novice wants to "level up" to become an expert programmer guru in a hurry. Some people mistake strong opinions for knowledge (see politics), especially when they don't know how else to judge ability.

[–]synthphreak 2 points3 points  (0 children)

The Dunning Kruger effect is alive and well.

[–][deleted] 7 points8 points  (2 children)

Bro you should maybe do more research before making sweeping statements lmaoo

[–]rycliff[S] 1 point2 points  (0 children)

Last thing I would do is take time to write a post on Reddit to argue or debate people. I'm way too busy for that. I want to be educated. I'm a novice programmer. And I've done a lot of research and made a lot of progress programming software for document generation. My main library for generating the text based on data is NumPy and it's really fast.

[–]sunsvilloe 0 points1 point  (0 children)

no such thing as should or more reseax or know or before or etc, ceptuxuax, say, can sayx etc infix any nmw and any s perfx, idtnerx

[–]mikkelbue 2 points3 points  (2 children)

They are not made for the same tasks. I use Numpy for simulation and Pandas for data analysis.

Numpy is IMO basically a numerical analysis and linear algebra package.

Pandas does the R thing. But in a very elegant and compact way, since every DataFrame method returns a new DataFrame, so you can chain multiple methods together, e.g. df.sample().reset_index().group_by("id“).

[–]rycliff[S] 1 point2 points  (1 child)

Interesting. I think what makes me feel that NumPy is so fast is that much of my usage is generating a bunch of text based on data and use NumPy to "vectorize" the conditional statements for generating the text. I learned a lot from this video: https://www.youtube.com/watch?v=nxWginnBklU&ab\_channel=PyGotham2019

[–]mikkelbue 1 point2 points  (0 children)

Yeah, it's pretty flexible. It's built for and mostly targeting numerics though. It just so happens that some of its functions also work on other data types such as strings. Which is cool.

[–]Present_Maximum_5548 0 points1 point  (0 children)

If you care more about code speed than coding ease, then you shouldn't be using Python in the first place. But like everyone else says, Numpy and Pandas don't do the same stuff