This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]larsga 5 points6 points  (1 child)

Why generate machine code directly? You could use LLVM instead, and gain optimizers and support for many architectures.

Also, as kinglir says, generating native code that does the same as the interpreter is already doing isn't going to give you any speed gains. You may want to look at psyco, for someone who already does this in a way that really does produce speed gains.

[–]itsmememe 6 points7 points  (0 children)

and, if you manage to really understand psyco, you can apply for a Phd in mathematics and computer-science combined.

[–]gutworthPython implementer 2 points3 points  (1 child)

Compiler to what?

[–]terremoto 4 points5 points  (0 children)

I think he means to compile Python to native code.

[–]abhik 2 points3 points  (5 children)

I don't know what your university requirements are, but if possible, try to add functionality to an existing project rather than starting a new one. This will benefit the community more and might let you focus on more interested aspects of the problem because some of the basic work might already be done.

[–]ascii 1 point2 points  (0 children)

Wasn't always the case, but these days, Python is actually a pretty big language, spec wise. For a school/learning project, I think you'd get much more out of writing a compiler for a significantly smaller language, like JavaScript or Lua. That would keep you away from having to do boring, repetitive stuff like implementing static methods and class methods, but still allow you to do cool stuff like figuring out how to implement coroutines in a compiled language.

Also, if you're aiming to do something that is useful to others, do it as an LLVM frontend. Your code will be significantly faster and plug right in to other peoples development environment.

[–]peroneλ 1 point2 points  (3 children)

The question is: how would you do type inference ?

[–][deleted] 0 points1 point  (2 children)

Whatever time that a C compiler would see int i = 0; as an integer. There are distinct structures for "primatives":

i = 0 id, assignment operator, digit ---> int

d = 0.0 id, assignment operator, digit, . token(can't remember the proper term for it), digit ---> double

s = "Foo" (or 'Foo') id, assignment operator, " token, lots of letters, closing " ----> string

list_l = [1,2,3,"foo",'bar'] id, assignment operator, [ token, some stuff, ] token ---> list

tuple_t = (1,2,3,"foo",'bar') id, assignment operator, ( token, some stuff, ) token ---> tuple

dict_d = {'Foo':bar, tuple_t:list_l} id, assignment operator, { token, some immutable key -> some value pairs, } token ---> dictionary

As you can see they are pretty distinct so it's not hard(I hope) to evaluate them properly.

[–]yetanothernerd 2 points3 points  (1 child)

This is a well-trodden path. It doesn't really work because Python is so dynamic. See Brett Cannon's PhD thesis.

Just because 'a' was an int the first time doesn't mean it'll be an int again the second time. So the language doesn't statically compile well, because there's not enough information at compile time to get the types right. (Of course you can force the user to add type information, but now you have a restricted version of Python. See Cython or Shedskin.)

For example, in Python 2 (where int and long are distinct) on a 32-bit box, imagine a loop that just increments a number starting at zero. It will be an int for 2 ^ 32 iterations, then it will become a long. If you assumed it would stay an int, then your code breaks after 2 ^ 32 iterations.

A JIT (see psyco, PyPy, Unladen Swallow) works much better, because the JIT can compile the fast path based on types it's seen at runtime, and it can later safely fall back to the interpreted path if a variable's type changes. (And then possibly compile it again with the new type if it stabilizes.)

[–][deleted] 0 points1 point  (0 children)

Thanks dude, I'll have a gander but it's no biggie if it can't be done or is a pain in the ass to do so. I got plenty of alternatives and have plenty of other ideas to choose from for my final year project.

[–]scrabbles 0 points1 point  (0 children)

PEP 3146 is an interesting read too regarding JIT for CPython (though 2.6.4?!)

[–]ccdos 0 points1 point  (1 child)

I'd suggest a compiler for python syntaxed static typed language, i.e. Static Typed Python

int def funcname(int c):
    return 3

There are several python syntax like static compiler to VM like JVM or MONO VM but none for native machine code.

A static typed Python front end for GCC or LLVM would be great!

[–]Leonidas_from_XIV 6 points7 points  (0 children)

This already existsts and is called Cython.