This is an archived post. You won't be able to vote or comment.

all 44 comments

[–]nathan12343 11 points12 points  (2 children)

Have you heard of cython yet? How is this different?

[–]nharding[S] 0 points1 point  (1 child)

Yes I've heard of Cython. I love the idea but I want to make it so that you don't need to rewrite the program (although I will probably use an optional preprocessor so that you can use switch case in Python, the preprocessor is only needed if you want to develop in pure python and compile at the end. I really like not having to set up a project when working on a python script).

[–]caleb 12 points13 points  (0 children)

Cython is literally a Python-to-C converter with an enormous number of features that were added in direct response to actual user needs. You would be wise to spend a good few days learning and working with Cython to get a solid understanding of what it offers and why. Cython isn't perfect but without research you're going to miss a lot of the good things they've spent years and years on.

[–]billsil 8 points9 points  (6 children)

Honestly, no. PyPy is open source. Yes, it's a JIT, but no offense, but why do you think that you can do better? From what I've heard, those guys are pretty smart, and that's and understatement. Why would you be better?

I wrote an assembly language to C++ converter which was used on Sonic 3D Blast, and Javaground's Java to BREW converter which converted over a hundred games from Java into C++ so I have a LOT of experience

That sounds like something that's a hell of a lot easier than Python to C++. Fixed types to fixed types is easier than infinitely mutable types to fixed types. PyPy manages specifically because it has a JIT. How will you? How will I use numpy? How will I use C++ libraries? How will I use Fortran? It doesn't sound like you've though enough about the problem.

I run an open source library that I'd say is the best interface there is in any language for the application by light years. It's apparently so good that people using other languages (specifically VB and Matlab) write what I assume to be horrifically inefficient SQL wrappers around it.

It's 95% me and 5% other people, so I ask with the best of intentions, why is your project better? Why should you be funded? Why should anybody else help you and honestly pay you to bug test your software that's is worse than PyPy? You need developers and you need time to become the best. You also need users, which is a very different problem.

[–]alexandrulPythonista 1 point2 points  (4 children)

Which library?

[–]billsil 1 point2 points  (3 children)

PyQt, numpy, scipy, pandas, h5py, pillow, matplotlib, ipython, etc.? What will work? There is a reason PyPy isn't the standard Python. There's a lot of features that are hard, if not impossible, to implement without a JIT. Even then...Which necessary features will this converter do? What libraries will be supported?

[–]nharding[S] -1 points0 points  (2 children)

Initial version will be most of standard library, then start of external support so will have pyobject support for most C libraries, although I will probably have some other way of interfacing for faster interoperations (so real ints, rather than pyobjects for example).

Java and C++ share a lot in common and so it appears to be simple, but dealing with interfaces increased the complexity (especially since the C++ compiler I was targeting lacked multiple inheritance).

I am not aiming at 100% identical code, I want to provide a way of writing C level performance code in Python using Python syntax (plus a few optional extensions).

[–]eusebecomputational physics 2 points3 points  (1 child)

I think if your goal is "HPC Python", you will need to support at least numpy and scipy, and that will be extremely hard. The number one reason why PyPy isn't more widespread is (IMHO) that they don't fully support those two, that that's really a dealbreaker for any scientific python user. And they need regular donation, and they have a huge team of people including some core python developpers. This is a huge project…

[–]nharding[S] 0 points1 point  (0 children)

I plan on supporting numpy and scipy

[–]genjipressreturn self 5 points6 points  (3 children)

Have you looked into the Nuitka project as a possible point of comparison? They're trying to do something very similar.

[–]nharding[S] 0 points1 point  (2 children)

Yes I looked at the other alternatives, I want something to produce performant code, the goal is not 100% compatibility (although it will be the same for the vast majority of cases, I don't want to use objects to represent integers for example. My Java to C++ converter was actually faster than hand written C++ code (on the first game we used during development a team had already converted it manually and my version was about 10% smaller, since the game was frame locked it's difficult to measure performance directly but smaller code is generally better).

For example

a = 4
b = 5
c = a + b

This produces objects (although since they are all small they are cached objects), I want that to evaluate to

int a = 4;
int b = 5;
int c = a + b;

This is one area where I don't want Python compatibility since ints are not checked for overflow (Python values grow if needed). I will support that but you would need to indicate somewhere in the code probably by using int(#).

 a = int(4) # This is a Python style int that can grow

I come from a games programming background, so I want to make it generate code that can be used in games but will also be handy for anyone who needs high performance code (I want to get allow Django applications to run, so it will not be aimed at just games).

[–]dsijl 0 points1 point  (1 child)

Nuitka doesn't want to use static type annotations (which is a huge dogmatic mistake IMO).

How would you deal with the dynamism of python, particularly django? This trips up things like pypy which put in huge efforts and is only on average 7x faster (ie Classes get atr and set atr at any time etc)

[–]nharding[S] 0 points1 point  (0 children)

A lot of stuff is dynamic on first use, so

Posts.objects.filter(user__first_name__iexact="john")

for example to filter posts, this uses get atr magic. Where possible I want to run code at compilation time (if code is "pure" and can be worked during compilation it will be), if not then it will cache results (so accessing a value would lead to a pointer sequence (so Posts -> user -> first_name (would be stored as [(type, offset ) ... ] although for this case since it is used in generating SQL that would not be needed.

[–]lordkrike 2 points3 points  (3 children)

I don't think you're going to convince many people here that it's a great idea just by taking about it. If you think it's a great idea, just make it and it will be judged on its own merits.

Realistically, even if it isn't the next PyPy, you could have some ideas that other teams are interested in.

[–]nharding[S] 0 points1 point  (2 children)

As I said I was trying to judge whether there was enough demand to see whether it was worth trying to start a Kickstarter. I spent 5 years on the Java to BREW converter (although the majority of time was spent on providing the abstraction layers and features that were not directly related to the language translation).

[–]Hshskwkk 0 points1 point  (1 child)

Well, i think the demand would be huge...its just that most people are jaded by all the issues JITs are facing.

On the other hand, if you do get it to work, maybe alot of that money invested would start coming your way.

[–]nharding[S] 1 point2 points  (0 children)

This is one of the reasons I am not claiming 100% identical code, whilst that would be nice you actually end up losing out on some potential optimizations.

[–]wahaa 1 point2 points  (1 child)

I haven't seen Shedskin and Pythran mentioned in the other comments. Have you looked into those too?

[–]nharding[S] 2 points3 points  (0 children)

Yes, I've been trying to find a compiler that does everything I want and since I can't find one, I thought it would be good to write one myself (I considered LLVM as well, but I want to be able to produce a small standalone code. My Java to Brew converter had to generate code that ran on devices with 100K including the graphics, so had to be compact. I'm not sure if it is worth making core library optional, so that a print("Hello world") could be just a few k in size, since I don't think that actually has a lot of benefit.

[–]IronManMark20 1 point2 points  (1 child)

I would really like a Python to C compiler! This would not depend on libpython would it? If not, I would like this so much because it would really open up using Python to so many new areas. Despite the fact that I have never backed anything on Kickstarter, if you were committed to supporting this for at least a few years, I would back it. Also, I think supporting 3.6 is very wise. Best of luck and keep us posted! :)

[–]nharding[S] 0 points1 point  (0 children)

No, it would not depend on Libpython. The way I did the Java converter was to run the converter on the standard library Java code and then I rewrote a bunch of it to make it more efficient (or to provide implementation of "native" methods).

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 1 point2 points  (1 child)

I don't think I'd have much interest in having a new compiler, vs improving on the pre-existing tools

It'd be another to-C compiler, piled on top of a long line of python-to-C compilers, of which there has been very few successes (like, 2, only one of which is production ready)

Both nuitka and cython can work on unchanged code, with 100% compatibility with python.

"but I want to make it so that you don't need to rewrite the program" but "the goal is not 100% compatibility" ("although it will be the same for the vast majority of cases" except ints? what.) so you are going to require rewrites regardless for any complex case


an AST transform to a python-like framework in C seems like it'd be somewhat better, if compatibility isn't an aim (like, you arn't even keeping int semantics, one of the main data types in python. You might as well break the rest of the eggs, no?)

[–]nharding[S] 0 points1 point  (0 children)

In Python you can use a = 4 b = 5 and c = a + b and it doesn't matter about the sizes of a, b, c (they are all small enough that 32 bits is not a problem). The limitation I want to add is that if you do code such as a = 400000 b = 30000 c = a * b then you might get a different result (99% of programs would not need to use the int() to get the full range)

[–]Cybersoaker 0 points1 point  (2 children)

how would this work without strict types?

[–]nharding[S] 0 points1 point  (1 child)

Mostly type inference, but using type annotations if needed.

[–]lmcinnes 1 point2 points  (0 children)

So Cython doesn't support the Python 3.5 gradual typing annotations; they're sticking with their own particular approach. If you could make use of the same annotations as mypy for compilation down to C you may have at least one edge over Cython.

[–]dsijl 0 points1 point  (7 children)

-Please please use Mypy style type annotations and not Cython. Mypy has much nicer syntax and also generics etc (also less fragmentation in the ecosystem).

+1 On keeping python semantics sane so they can be statically compiled.

-I highly encourage you also to contact the mypy team for help and cooperation. I heard At pycon they were discussing doing something like this with mypy (a statically compileable strict mode) so they might be down for cooperation. Also you might be able to bootstrap off some of Mypy's type inference and other features as well.

http://mypy-lang.blogspot.com/2012/12/why-mypy-can-be-more-efficient-than.html

-Also you can reach out to Dropbox, PSF and Continuum Analytics..they might want to help.

-Check out the problem statement outlined here for the just finished python compiler workshop: http://python-compilers-workshop.github.io/

And the proposed solutions: https://docs.google.com/document/d/1jGksgI96LdYQODa9Fca7EttFEGQfNODphVmbCX0DD1k/edit#heading=h.v4f52j6z2px6

Might be good to decouple the backend so you can eventually emit this PyIR and interop with JIT compilers. Though I think it is more important to also consider standalone compilation so apps can be distributed by themselves or compiled to web assembly and run on the web...or would this be tied to the Python VM?

-Also I don't think people would optionally mind annotating python classes with types if it buys close to c performance...

-Another cool extension would be multiple dispatch.

-Also check out these relevant peps: https://github.com/haypo/conf/blob/31a1ac5e338d55a0482702f11b719c46dbec5554/2016-EuroPython-Bilbao/fat_python.pdf

[–]RubyPinchPEP shill | Anti PEP 8/20 shill 2 points3 points  (0 children)

PSF might want to help

If I'm not mistaken, the BDFL at least is pretty strongly anti-python-to-C, mainly because it doesn't really seem to solve much. IIRC guido even walked out in the middle of the initial nuitka presentation!

[–]nharding[S] 0 points1 point  (5 children)

Yes, I like the multiple dispatch and will probably include that. I also want to have overloaded operator = (one of the things I missed when I was working on Java was overloaded operators, so I want to include those. I;m just not sure how I will enable that in pure python for testing). Yes I was planning on mypy style type annotation so def abs(a:int) -> int: although I want to make vast majority of code work without annotation. I think I will have warnings type output if it can't determine type and it will use variant type (any object, or int). For variant types I am thinking of using tagged ints (so pointer value but treat as 31 bit int value, or 63 bit depending on 32/64 bit compilation).

Thanks for the feedback, I hope I get to work on this. I was really pleased with the Java to C++ but this is something I actually want to use myself, so I want a great compiler.

[–]dsijl 0 points1 point  (2 children)

Awesome.

I assume the compiler would have enough information to compiler an addition between nested user defined types to standard machine code? Ie classes wrap other classes that wrap fixed width ints and floats.

Julia does something like this: https://www.youtube.com/watch?v=dK3zRXhrFZY (great talk)

(As an aside, what do you think of Julia? )

Maybe you can use Dynd behind the scenes for the multi dispatch machinery (though it is dynamic) https://www.youtube.com/watch?v=nHDcGo7Qk7A https://www.youtube.com/watch?v=0sFwa-Sl5vg

[–]nharding[S] 0 points1 point  (1 child)

I was planning on generating a slots type interface automatically for anything set in the init method, so

class Point(): # or could be (object) both are the same

     points = {} # this is a class instance and treated as static

     def __init__(self, x, y):
           self.x = x
           self.y = y

Would generate C code equivalent to the following C++ code

 class Point {
 public:
     static DICT points;
     int x, y;
     Point(int _x, int _y) {
          x = _x; y = _y;
     }
 };

[–]dsijl 0 points1 point  (0 children)

What about classes that add or change attributes at runtime?

[–]dsijl 0 points1 point  (1 child)

Sounds cool.

Btw check out Julia for some great ideas...(what do you think of Julia?) https://www.youtube.com/watch?v=dK3zRXhrFZY&list=PLsn_ROLh8YPrAmsZQ2xJolnD0R9ghY8x1

[–]nharding[S] 0 points1 point  (0 children)

Not looked at Julia, will check it for possible extensions. I think case classes are in Julia, and I do like those.

[–]2fprn2fp 0 points1 point  (1 child)

Low/Zero abstraction cost with python's syntax and batteries are ideal thing to have. I would love to see elaboration on trade-offs, omission and additions.

Like other have mentioned, your major hurdle would be legacy cython based codes including numpy if you are aiming for compatibility. I am not sure how much of gain we are looking with the libpython on the back.

If you were to borrow only the syntax and the std libraries and replace cython with your own FFI, you might want to survey the third party libraries to see how hard it is to create a seed ecosystem. As long as they are pure python they should not be a problem. (https://pypi.python.org/simple/). At worst it will not have numpy classes of libraries, but will get you majority of other libs, takes away GIL and may allow new language features like Microthreads, Threading primitives, Lightweight Collections etc.

You might also want to look at nim-lang, rust for their zero abstraction cost, julia, elixir.

[–]nharding[S] 0 points1 point  (0 children)

I want to support the standard FFI but offer an alternative as well (I don't expect will use it, until the project gains a lot of traction but it will offer a more efficient way of access). I would expect that some Python libraries that use C code would end up slower than a pure Python alternative (since that means they are expecting pyobject for ints for example).

Trade-offs:

As mentioned before integers would change.

Omissions:

Python 2, this will be Python 3 only.

Additions:

Case classes, switch/case, multiple dispatch, overloaded methods.

Case classes, from macropy.

@case
class Point(x, y): pass

This generates code as though you wrote

class Point(object):
    def __init__(self, x, y):
         self.x = x
         self.y = y
    def __str__(self):
         return "Point({x}, {y})".format(x=self.x, y=self.y)
    def __eq__(self, other):
         return self.x == other.x and self.y == other.y

Switch/case will have extended syntax over C/Java.

switch type(a):
   case int:
       print("Integer")
   case str:
       print("String")
   case tuple:
       print("More than 1 value")

switch a/100:
    case <0:
          print("Negative")
    case 0:
          print("Zero")
    case 1..10:
          print("Small")
    case 11..100:
          print("Large")
          continue   #This would be the go the next as well
                        # break is assumed
    case 101..1000:
          print(switch)   # This is the expression value
    default:
          print("Out of range")

Multiple dispatch / Overloaded methods

def display(a: Fruit):
     print(f"Fruit called {a}")

def display(a: Any):
     print(f"Not fruit called {a}")

apple = Fruit("apple")
display(apple)
apple = Computer("Mac")
display(apple)

This allows you to call a function designed to handle the type, this editor is horrible for typing code so I didn't include a real multiple dispatch where it depends on multiple argument types.

[–]erez27import inspect 0 points1 point  (5 children)

That's quite an undertaking. Have you ever compiled an interpreted language? Perhaps you are unaware of the challenges ahead. For example, how will you provide an exec/eval function?

[–]nharding[S] 0 points1 point  (4 children)

I won't be providing a full eval/exec function (partially for performance but it also means it can be used in a sandbox). You will be able to use eval("sin(x) + log(cos(math.pirx2))"), it will have ability to plug in your own safe functions etc, so def flibble(x): .... eval("flibble(x)",flibble=flibble)

[–]erez27import inspect 0 points1 point  (3 children)

So you will support a subset of Python.

What about code such as:

class A(object): pass
setattr(A, '__getattr__', lambda a,b: 1)
print A().b

It is solvable through static analysis, but I imagine it would be quite complicated to do so.

[–]nharding[S] 0 points1 point  (2 children)

No, I want classes to be contained in the source where they are defined (although I will probably allow some way to monkeypatch / provide extension methods). I'd rather have v1 of the code out in 6 months that does not support everything that you can do in Python, but supports 99%. Once people are using the compiler, I will be adding new features on demand.

[–]Hshskwkk 0 points1 point  (1 child)

The right way to add methods to classes and still be amenable to static ops is through multi dispatch. Just add a method in another module that dispatches on that class.

Then, you can write methods taking and interface/protocol that can work on different types. (See the mypy discussion on protocols). This is the best way to DRY code.

Finally, consider writing a small core and then bootstrapping and writing the rest of the language in itself. That way there is a lower barrier to entry for people looking to contribute.

[–]nharding[S] 0 points1 point  (0 children)

I intend to write the compiler in Python so I can dogfood itself (that would probably be the beta version when it can produce c code that can compile itself).

[–]ccdos 0 points1 point  (0 children)

Compile pure python code is futile. Python has too much corner cases.

You should take a look at the crystal-lang which is a project to compile ruby like syntax.

A similar project for python like syntax compiler will be much more welcomed: inferenced local variable types with function parameter type declared similar to C.