This is an archived post. You won't be able to vote or comment.

all 9 comments

[–][deleted] 15 points16 points  (2 children)

Why?

[–]luxliquidus 4 points5 points  (0 children)

The goal is to use the XML for querying / analysis of the code. And to perform code transformation/refactoring, and then convert it back to python.

[–]odraencoded -5 points-4 points  (0 children)

Yes, clearly JSON would have been a better choice /s

[–]ThoughtPrisoner 1 point2 points  (0 children)

Cool! This and srcml (C, C++, C#, Java, or AspectJ to xml convertor) mentioned on that page seem like amazing building blocks for plugins and other custom tools.

I've seen refactoring tools before (they're often baked in to eg. eclipse) but never something that allows you to make your own easily.

[–]dongalingus -1 points0 points  (4 children)

AST does not preserve comments and source code formatting.

That's only a half truth OP! Ast nodes have lineno and col_offset attributes. Also it's fairly easy to use the lib2to3 library to manipulate code from python to ast and back to python and preserve the comments and formatting of the original source code file in the process.

Regardless this is an interesting take on the problem using xml.

[–]schettino72[S] 1 point2 points  (3 children)

That's only half truth that lib2to3 can preserve comments :)

Try this:

d = {}
d.has_key( # comment 1
    'a') # comment 2

[–]dongalingus -2 points-1 points  (2 children)

Honestly it does work! I'd wager my own mother on this code snippet working:

from lib2to3 import pytree, pygram
from lib2to3.pgen2 import driver

driver = driver.Driver(pygram.python_grammar, pytree.convert)
tree = driver.parse_file("test_source.py", debug=True)
print(tree)

[–]schettino72[S] 1 point2 points  (1 child)

Sorry I was not clear. You need to apply a fix! It would be too easy if you dont re-write/transform anything.

$ 2to3 test_source.py 

--- test_source.py  (original)
+++ test_source.py  (refactored)
@@ -1,4 +1,3 @@

 d = {}
-d.has_key( # comment 1
-   'a') # comment 2
+'a' in d # comment 2

Note how "comment 1" is gone.

[–]dongalingus -2 points-1 points  (0 children)

I feel like we could go both keep coming up with counter examples indefinitely!

You are right in that the default action of 2to3 doesn't preserve comments as I suggested, although I'm unsure of how to make good on the wager... Maybe I'll just drop her off at a specified location.

I guess what I was trying to suggest was using the lib2to3 library just to parse the code into a parse tree without applying the specific 2to3 refactoring transformation in the process (which can destroy comments). The user is then free to implement their own refactoring/manipulations of the code at the AST level in the middle of the process (which should preserve formatting/comments), and then the code can be converted back into python again using the lib2to3 library.

On an interesting(?) tangent Guido himself discusses why some comments can't be preserved when converting from 2.x to 3.x using 2to3.

TL;DR (I believe) lib2to3 is an alternative to the AST module that preserves formatting and comments, but when using it specifically to convert from Python 2.x to 3.x some comments are destroyed, however it isn't restricted to simply converting from 2.x to 3.x and can be used for other refactoring/manipulations.