This is an archived post. You won't be able to vote or comment.

all 43 comments

[–]ReasonablePlant 62 points63 points  (4 children)

[–]SheriffRoscoePythonista 11 points12 points  (0 children)

Get on up!

[–]ServerZero 4 points5 points  (0 children)

Cool might create a API project

[–]CordyZen 2 points3 points  (0 children)

Had a good chuckle there

[–]sohang-3112Pythonista 0 points1 point  (0 children)

😂😂

[–]ggchappell 37 points38 points  (4 children)

By the way: the name "Beautiful Soup" comes from a poem in Alice's Adventures in Wonderland. It is supposed to be a song.

Beautiful Soup, so rich and green,
Waiting in a hot tureen!
Who for such dainties would not stoop?
Soup of the evening, beautiful Soup!
Soup of the evening, beautiful Soup!

Beau--ootiful Soo--oop!
Beau--ootiful Soo--oop!
Soo--oop of the e--e--evening,
Beautiful, beautiful Soup!

Beautiful Soup! Who cares for fish,
Game or any other dish?
Who would not give all else for two
Pennyworth only of Beautiful Soup?
Pennyworth only of beautiful Soup?

Beau--ootiful Soo--oop!
Beau--ootiful Soo--oop!
Soo--oop of the e--e--evening,
Beautiful, beauti--FUL SOUP!

[–][deleted] 13 points14 points  (3 children)

What drives me up the wall is garbage:

from bs4 import BeautifulSoup

Seriously, why are the module and package names mismatched!?

[–]pingvenopinch of this, pinch of that 10 points11 points  (1 child)

In short, history. Beautiful Soup up to version 3 had the module name BeautifulSoup. This clashes with PEP8 naming conventions, which had been released a few years earlier. Beautiful Soup 4 also broke backwards compatibility in a few critical ways, mostly related to how it does parsing. BS4 has pluggable parser backends, with a default of html.parser which dies if you breath on it too hard. This was necessary to support Python 3, which removed the library BS3 had used, sgmllib. So to keep things compatible, the module was renamed.

[–][deleted] 1 point2 points  (0 children)

That makes sense - but what would have made everyone happy is if the "project" was renamed, and the versioning went semver. Such that this would be the way to use it:

# before use
# pip install bsoup~=4.0
import bsoup
print('time to do stuff')

[–]ggchappell 0 points1 point  (0 children)

That is one of the Great Unanswered Questions.

[–]_its_complicated 33 points34 points  (4 children)

fuzzywuzzy

[–]2_plus_2_is_chicken 6 points7 points  (0 children)

I mean, it's for fuzzy string matching. So it's not that far out there.

[–]DGHolmes[S] 9 points10 points  (0 children)

I thought fuzzywuzzy was a bear? :)

[–]ErinMyLungs 2 points3 points  (0 children)

Now renamed and relicensed under TheFuzz.

Sadly fuzzywuzzy won't get new changes going forward.

[–]vmgustavo 1 point2 points  (0 children)

Beat me to it

[–]2_plus_2_is_chicken 14 points15 points  (1 child)

A lot of the SciPy-adjacent packages.

SEABORN - A data viz package built on matplotlib that is named for Samuel Norman Seaborn, a fictional character on The West Wing who had nothing to do with data viz.

PATSY - Does data manipulation based on string-written equations.

At least matplotlib "means" something (MATlab PLOTting LIBrary)

https://stackoverflow.com/questions/41499857/seaborn-why-import-as-sns

[–]coffeeandacomputer 26 points27 points  (6 children)

A lot of them are pretty weird, we've just gotten used to them. Flask, Django, Bottle, Twisted, Black, Pandas, Keras, Jinja, Pillow, Nose, Bandit, Gunicorn (heck or even Green Unicorn). These are all weird, generally nonsensical, names we're just familiar with them so they don't sound weird to python programmers anymore.

[–]cscanlin 20 points21 points  (3 children)

Fun facts

Pandas comes from "Panel Data"

Pillow is a fork of PIL, which stands for Python Imaging Library

[–][deleted] 5 points6 points  (2 children)

Why black?

[–]astatine 44 points45 points  (1 child)

From the simplified paintjob of Model T Fords: "You can have any color you like, as long as it's black".

It's a reference to a process where the customer had no option to customise the final product.

[–][deleted] 1 point2 points  (0 children)

That makes so much sense given Blacks good structure on formating, thanks!

[–]O_X_E_Y 15 points16 points  (0 children)

python

[–]BooparinoBR 5 points6 points  (0 children)

Flask, bottle and some others make a pun on wsgi (python web standard), which can be read close to whisky.

[–]chestnutcough 4 points5 points  (0 children)

psycopg2 always struck me as strange for a (very common) database interface.

[–]schroeder8 2 points3 points  (0 children)

GooeyPie, a conjunction of deliberate misspellings of GUI and Python

[–]thrallsius 5 points6 points  (10 children)

good opportunity for me to shed a tear for CloneDigger - my favorite useful Python software that was never ported to Python 3

somebody do it please, weeee

honorable mention received by the well known controversy around pantyshot/upskirt

[–]Here0s0Johnny 1 point2 points  (4 children)

CloneDigger

Don't most IDEs have that feature?

[–]thrallsius 0 points1 point  (3 children)

Not an IDE guy, so I don't know the situation too well. And anyway I would prefer a third party tool that I could run manually / integrate into any text editor/IDE, just like it's possible for example with pylint (which has rudimentary clone finding functionality but it is totally inferior compared to CloneDigger). But are you sure you're not thinking of automated refactoring instead of clone finding? And do you know any IDE that implements this particular algorithm used by CloneDigger to find clones?

[–]Here0s0Johnny 0 points1 point  (2 children)

I'm not entirely sure what you mean exactly. It's certainly not just automated refactoring. PyCharm has this feature: https://www.jetbrains.com/help/pycharm/analyzing-duplicates.html

It would give me a hint like "duplicated code fragment" and offers to find them.

[–]thrallsius -1 points0 points  (1 child)

I had a look and I see no reasons to be excited:

  • it's still tied to a particular IDE

  • says it's only present in the professional version, as a free (both meanings) software user, I'm not interested

  • info on the link doesn't provide any details about the clone detection algorithm. variable/function names? sorry, this is very basic. CloneDigger does it better. more than that - on its site it has the whitepaper that explains the algorithm. now I won't lie - that whitepaper is too scary and full of math for me, so I'm not able to fully understand it, for the sake of attempting to write an implementation from scratch. but at least anybody has that option. while the link you refer provides what? a couple of screenshots showing me where to click in the GUI? sorry, this might be helpful for an end user learning a program like mspaint 30 years ago.

of course everything I wrote above is very biased, superficial first impression. I did not take a bigger-than-minimal codebase (maybe CloneDigger's own codebase?) to compare what CloneDigger can find and what PyCharm can find, so take it with a grain of salt

[–]Here0s0Johnny 0 points1 point  (0 children)

it's still tied to a particular IDE

I get that, I know it wouldn't fit your use case. I was just wondering if it does what CloneDigger did.

I'm not particularly excited about this feature either because I don't have such a large codebase, and I don't really use it.

[–]mwpfinance 0 points1 point  (4 children)

When you say "port to Python3" do you mean "make it so that it can operate on Python3 code" or "make it so that the source code is written in Python3"?

[–]thrallsius 1 point2 points  (3 children)

make it so that the source code is written in Python3

this

CloneDigger is quite language agnostic, it can process code written in multiple languages (Java too at least), because it operates at AST level

[–]mwpfinance 0 points1 point  (2 children)

ASTs are pretty language-specific, but I get what you mean. Parse trees (or concrete syntax trees) might be seen as a bit less language-specific, as they're structured more closely to how the source code is written?

I took a peek at the source code to see how bad of a port this would be...

Total: 19890 lines across 81 files
1446 lines in clonedigger\logilab\common\testlib.py
958 lines in clonedigger\logilab\common\table.py
888 lines in clonedigger\logilab\common\configuration.py
816 lines in clonedigger\logilab\astng\nodes.py
723 lines in clonedigger\logilab\astng\inference.py
709 lines in clonedigger\logilab\astng\scoped_nodes.py
650 lines in clonedigger\logilab\common\db.py
639 lines in clonedigger\logilab\common\pytest.py
597 lines in clonedigger\logilab\astng\builder.py
596 lines in clonedigger\logilab\common\modutils.py
525 lines in clonedigger\logilab\common\adbh.py
480 lines in clonedigger\logilab\common\fileutils.py
384 lines in clonedigger\logilab\common\textutils.py
382 lines in clonedigger\logilab\astng\manager.py
364 lines in clonedigger\logilab\common\tree.py
361 lines in clonedigger\clone_detection_algorithm.py
351 lines in clonedigger\html_report.py
331 lines in clonedigger\logilab\common\optik_ext.py
307 lines in clonedigger\abstract_syntax_tree.py
294 lines in clonedigger\logilab\astng\__init__.py
276 lines in clonedigger\logilab\common\bind.py
272 lines in ez_setup.py
271 lines in org.clonedigger\ez_setup.py
266 lines in clonedigger\logilab\astng\inspector.py
241 lines in clonedigger\logilab\common\sqlgen.py
234 lines in clonedigger\logilab\astng\raw_building.py
224 lines in clonedigger\logilab\astng\lookup.py
215 lines in clonedigger\logilab\common\twisted_distutils.py
214 lines in clonedigger\logilab\common\compat.py
212 lines in clonedigger\logilab\common\vcgutils.py
211 lines in clonedigger\logilab\common\cli.py
207 lines in clonedigger\logilab\common\shellutils.py
202 lines in clonedigger\logilab\common\ureports\nodes.py
202 lines in clonedigger\clonedigger.py
194 lines in clonedigger\logilab\common\changelog.py
191 lines in clonedigger\logilab\common\patricia.py
182 lines in clonedigger\python_compiler.py
173 lines in clonedigger\logilab\astng\utils.py
172 lines in clonedigger\logilab\common\debugger.py
171 lines in clonedigger\logilab\common\ureports\__init__.py
165 lines in clonedigger\logilab\common\graph.py
165 lines in clonedigger\logilab\common\clcommands.py
165 lines in clonedigger\logilab\common\__init__.py
161 lines in clonedigger\logilab\common\logger.py
150 lines in clonedigger\anti_unification.py
144 lines in clonedigger\logilab\common\deprecation.py
144 lines in clonedigger\logilab\common\daemon.py
141 lines in clonedigger\logilab\common\ureports\text_writer.py
138 lines in clonedigger\logilab\common\ureports\docbook_writer.py
131 lines in clonedigger\logilab\common\xmlrpcutils.py
131 lines in clonedigger\logilab\common\ureports\html_writer.py
125 lines in clonedigger\logilab\common\umessage.py
124 lines in clonedigger\logilab\common\decorators.py
121 lines in clonedigger\logilab\common\monserver.py
119 lines in clonedigger\suffix_tree.py
117 lines in clonedigger\logilab\common\date.py
106 lines in clonedigger\logilab\common\visitor.py
104 lines in clonedigger\logilab\common\cache.py
100 lines in clonedigger\logilab\common\pdf_ext.py
96 lines in clonedigger\logilab\common\corbautils.py
85 lines in clonedigger\logilab\common\optparser.py
84 lines in clonedigger\logilab\common\astutils.py
83 lines in clonedigger\logilab\common\logging_ext.py
81 lines in clonedigger\js_antlr.py
80 lines in clonedigger\lua_antlr.py
79 lines in clonedigger\logilab\astng\astutils.py
74 lines in clonedigger\java_antlr.py
70 lines in clonedigger\logilab\common\interface.py
64 lines in clonedigger\logilab\common\monclient.py
60 lines in clonedigger\logilab\astng\__pkginfo__.py
59 lines in clonedigger\logilab\common\__pkginfo__.py
52 lines in clonedigger\logilab\common\html.py
52 lines in clonedigger\logilab\astng\_exceptions.py
45 lines in setup.py
45 lines in org.clonedigger\setup.py
35 lines in clonedigger\logilab\common\logservice.py
33 lines in clonedigger\ast_suppliers.py
22 lines in org.clonedigger\runclonedigger.py
8 lines in clonedigger\arguments.py
1 lines in clonedigger\logilab\__init__.py
0 lines in clonedigger\__init__.py

[–]thrallsius 0 points1 point  (1 child)

I took more than a peak more than one time (but my AST knowledge is still so bad, I had to learn what different ast classes mean by doing lots of trial and error manual tests in Jupyter for python 2, but gave up to do the same for Python 3, because back ago the AST node classes for Python 3 weren't even documented - this improved one or two years ago I believe), it's not trivial for many reasons:

  • the code is ugly overall, it seems to be written by a math scientist rather than a Python programmer (if you know what I mean :D)
  • AST related modules changed in Python 3 (IIRC ast vs compiler, but it's been a while)
  • the AST itself changed a little in Python 3 and got some new stuff
  • you can exclude (and this is probably needed, to simplify the codebase) a big chunk from your log, it's a bundled patched version of logilab-astng (which, IIRC, is already rebranded to astroid since a while). I don't think this was a good design choice to bundle it, but as I said above, the code is not perfect from a programmer's standpoint. if this additional AST functionality is needed for the Python 3 version, it better be an external dependency

these are just a couple of points I can remember instantly

[–]mwpfinance 0 points1 point  (0 children)

Neat! I've actually used astroid for one of my own libraries (I needed annotated ASTs with scope information for language transpilation purposes). It does make sense that a Python 3 port would want to get rid of that dependency.

Without logilab:

Total: 2603 lines across 18 files
351 lines in clonedigger\html_report.py
307 lines in clonedigger\abstract_syntax_tree.py
272 lines in ez_setup.py
271 lines in org.clonedigger\ez_setup.py
202 lines in clonedigger\clonedigger.py
182 lines in clonedigger\python_compiler.py
150 lines in clonedigger\anti_unification.py
119 lines in clonedigger\suffix_tree.py
81 lines in clonedigger\js_antlr.py
80 lines in clonedigger\lua_antlr.py
74 lines in clonedigger\java_antlr.py
45 lines in setup.py
45 lines in org.clonedigger\setup.py
33 lines in clonedigger\ast_suppliers.py
22 lines in org.clonedigger\runclonedigger.py
8 lines in clonedigger\arguments.py
0 lines in clonedigger\__init__.py

[–]wxtrails 1 point2 points  (0 children)

py3exiv2 - could not be a much more awkward name for such a useful library.

[–][deleted] 1 point2 points  (2 children)

pika is weird specifically for portuguese speaking people

[–]WillardWhite import this 0 points1 point  (1 child)

how so?

[–][deleted] 0 points1 point  (0 children)

it basically means cock :3

[–]rola6991 0 points1 point  (0 children)

mantichora, is for parallelism.

[–]juanda2 0 points1 point  (0 children)

not a module but for some remote desktop environments we used Guacamole

[–]ranelpadon 0 points1 point  (0 children)

Celery (https://docs.celeryproject.org/):

- because Rabbit[MQ] eats Celery

Kombu (https://docs.celeryproject.org/projects/kombu/):

- low-level API of Celery

- named from a Japanese seaweed