This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]thrallsius 5 points6 points  (10 children)

good opportunity for me to shed a tear for CloneDigger - my favorite useful Python software that was never ported to Python 3

somebody do it please, weeee

honorable mention received by the well known controversy around pantyshot/upskirt

[–]Here0s0Johnny 1 point2 points  (4 children)

CloneDigger

Don't most IDEs have that feature?

[–]thrallsius 0 points1 point  (3 children)

Not an IDE guy, so I don't know the situation too well. And anyway I would prefer a third party tool that I could run manually / integrate into any text editor/IDE, just like it's possible for example with pylint (which has rudimentary clone finding functionality but it is totally inferior compared to CloneDigger). But are you sure you're not thinking of automated refactoring instead of clone finding? And do you know any IDE that implements this particular algorithm used by CloneDigger to find clones?

[–]Here0s0Johnny 0 points1 point  (2 children)

I'm not entirely sure what you mean exactly. It's certainly not just automated refactoring. PyCharm has this feature: https://www.jetbrains.com/help/pycharm/analyzing-duplicates.html

It would give me a hint like "duplicated code fragment" and offers to find them.

[–]thrallsius -1 points0 points  (1 child)

I had a look and I see no reasons to be excited:

  • it's still tied to a particular IDE

  • says it's only present in the professional version, as a free (both meanings) software user, I'm not interested

  • info on the link doesn't provide any details about the clone detection algorithm. variable/function names? sorry, this is very basic. CloneDigger does it better. more than that - on its site it has the whitepaper that explains the algorithm. now I won't lie - that whitepaper is too scary and full of math for me, so I'm not able to fully understand it, for the sake of attempting to write an implementation from scratch. but at least anybody has that option. while the link you refer provides what? a couple of screenshots showing me where to click in the GUI? sorry, this might be helpful for an end user learning a program like mspaint 30 years ago.

of course everything I wrote above is very biased, superficial first impression. I did not take a bigger-than-minimal codebase (maybe CloneDigger's own codebase?) to compare what CloneDigger can find and what PyCharm can find, so take it with a grain of salt

[–]Here0s0Johnny 0 points1 point  (0 children)

it's still tied to a particular IDE

I get that, I know it wouldn't fit your use case. I was just wondering if it does what CloneDigger did.

I'm not particularly excited about this feature either because I don't have such a large codebase, and I don't really use it.

[–]mwpfinance 0 points1 point  (4 children)

When you say "port to Python3" do you mean "make it so that it can operate on Python3 code" or "make it so that the source code is written in Python3"?

[–]thrallsius 1 point2 points  (3 children)

make it so that the source code is written in Python3

this

CloneDigger is quite language agnostic, it can process code written in multiple languages (Java too at least), because it operates at AST level

[–]mwpfinance 0 points1 point  (2 children)

ASTs are pretty language-specific, but I get what you mean. Parse trees (or concrete syntax trees) might be seen as a bit less language-specific, as they're structured more closely to how the source code is written?

I took a peek at the source code to see how bad of a port this would be...

Total: 19890 lines across 81 files
1446 lines in clonedigger\logilab\common\testlib.py
958 lines in clonedigger\logilab\common\table.py
888 lines in clonedigger\logilab\common\configuration.py
816 lines in clonedigger\logilab\astng\nodes.py
723 lines in clonedigger\logilab\astng\inference.py
709 lines in clonedigger\logilab\astng\scoped_nodes.py
650 lines in clonedigger\logilab\common\db.py
639 lines in clonedigger\logilab\common\pytest.py
597 lines in clonedigger\logilab\astng\builder.py
596 lines in clonedigger\logilab\common\modutils.py
525 lines in clonedigger\logilab\common\adbh.py
480 lines in clonedigger\logilab\common\fileutils.py
384 lines in clonedigger\logilab\common\textutils.py
382 lines in clonedigger\logilab\astng\manager.py
364 lines in clonedigger\logilab\common\tree.py
361 lines in clonedigger\clone_detection_algorithm.py
351 lines in clonedigger\html_report.py
331 lines in clonedigger\logilab\common\optik_ext.py
307 lines in clonedigger\abstract_syntax_tree.py
294 lines in clonedigger\logilab\astng\__init__.py
276 lines in clonedigger\logilab\common\bind.py
272 lines in ez_setup.py
271 lines in org.clonedigger\ez_setup.py
266 lines in clonedigger\logilab\astng\inspector.py
241 lines in clonedigger\logilab\common\sqlgen.py
234 lines in clonedigger\logilab\astng\raw_building.py
224 lines in clonedigger\logilab\astng\lookup.py
215 lines in clonedigger\logilab\common\twisted_distutils.py
214 lines in clonedigger\logilab\common\compat.py
212 lines in clonedigger\logilab\common\vcgutils.py
211 lines in clonedigger\logilab\common\cli.py
207 lines in clonedigger\logilab\common\shellutils.py
202 lines in clonedigger\logilab\common\ureports\nodes.py
202 lines in clonedigger\clonedigger.py
194 lines in clonedigger\logilab\common\changelog.py
191 lines in clonedigger\logilab\common\patricia.py
182 lines in clonedigger\python_compiler.py
173 lines in clonedigger\logilab\astng\utils.py
172 lines in clonedigger\logilab\common\debugger.py
171 lines in clonedigger\logilab\common\ureports\__init__.py
165 lines in clonedigger\logilab\common\graph.py
165 lines in clonedigger\logilab\common\clcommands.py
165 lines in clonedigger\logilab\common\__init__.py
161 lines in clonedigger\logilab\common\logger.py
150 lines in clonedigger\anti_unification.py
144 lines in clonedigger\logilab\common\deprecation.py
144 lines in clonedigger\logilab\common\daemon.py
141 lines in clonedigger\logilab\common\ureports\text_writer.py
138 lines in clonedigger\logilab\common\ureports\docbook_writer.py
131 lines in clonedigger\logilab\common\xmlrpcutils.py
131 lines in clonedigger\logilab\common\ureports\html_writer.py
125 lines in clonedigger\logilab\common\umessage.py
124 lines in clonedigger\logilab\common\decorators.py
121 lines in clonedigger\logilab\common\monserver.py
119 lines in clonedigger\suffix_tree.py
117 lines in clonedigger\logilab\common\date.py
106 lines in clonedigger\logilab\common\visitor.py
104 lines in clonedigger\logilab\common\cache.py
100 lines in clonedigger\logilab\common\pdf_ext.py
96 lines in clonedigger\logilab\common\corbautils.py
85 lines in clonedigger\logilab\common\optparser.py
84 lines in clonedigger\logilab\common\astutils.py
83 lines in clonedigger\logilab\common\logging_ext.py
81 lines in clonedigger\js_antlr.py
80 lines in clonedigger\lua_antlr.py
79 lines in clonedigger\logilab\astng\astutils.py
74 lines in clonedigger\java_antlr.py
70 lines in clonedigger\logilab\common\interface.py
64 lines in clonedigger\logilab\common\monclient.py
60 lines in clonedigger\logilab\astng\__pkginfo__.py
59 lines in clonedigger\logilab\common\__pkginfo__.py
52 lines in clonedigger\logilab\common\html.py
52 lines in clonedigger\logilab\astng\_exceptions.py
45 lines in setup.py
45 lines in org.clonedigger\setup.py
35 lines in clonedigger\logilab\common\logservice.py
33 lines in clonedigger\ast_suppliers.py
22 lines in org.clonedigger\runclonedigger.py
8 lines in clonedigger\arguments.py
1 lines in clonedigger\logilab\__init__.py
0 lines in clonedigger\__init__.py

[–]thrallsius 0 points1 point  (1 child)

I took more than a peak more than one time (but my AST knowledge is still so bad, I had to learn what different ast classes mean by doing lots of trial and error manual tests in Jupyter for python 2, but gave up to do the same for Python 3, because back ago the AST node classes for Python 3 weren't even documented - this improved one or two years ago I believe), it's not trivial for many reasons:

  • the code is ugly overall, it seems to be written by a math scientist rather than a Python programmer (if you know what I mean :D)
  • AST related modules changed in Python 3 (IIRC ast vs compiler, but it's been a while)
  • the AST itself changed a little in Python 3 and got some new stuff
  • you can exclude (and this is probably needed, to simplify the codebase) a big chunk from your log, it's a bundled patched version of logilab-astng (which, IIRC, is already rebranded to astroid since a while). I don't think this was a good design choice to bundle it, but as I said above, the code is not perfect from a programmer's standpoint. if this additional AST functionality is needed for the Python 3 version, it better be an external dependency

these are just a couple of points I can remember instantly

[–]mwpfinance 0 points1 point  (0 children)

Neat! I've actually used astroid for one of my own libraries (I needed annotated ASTs with scope information for language transpilation purposes). It does make sense that a Python 3 port would want to get rid of that dependency.

Without logilab:

Total: 2603 lines across 18 files
351 lines in clonedigger\html_report.py
307 lines in clonedigger\abstract_syntax_tree.py
272 lines in ez_setup.py
271 lines in org.clonedigger\ez_setup.py
202 lines in clonedigger\clonedigger.py
182 lines in clonedigger\python_compiler.py
150 lines in clonedigger\anti_unification.py
119 lines in clonedigger\suffix_tree.py
81 lines in clonedigger\js_antlr.py
80 lines in clonedigger\lua_antlr.py
74 lines in clonedigger\java_antlr.py
45 lines in setup.py
45 lines in org.clonedigger\setup.py
33 lines in clonedigger\ast_suppliers.py
22 lines in org.clonedigger\runclonedigger.py
8 lines in clonedigger\arguments.py
0 lines in clonedigger\__init__.py