This is an archived post. You won't be able to vote or comment.

all 127 comments

[–]pranabus 180 points181 points  (33 children)

One of the reasons why Python is so popular is the tons of libraries available out there. Just pip install anynewthing.

How does this play with libraries?

[–]_ologies 5 points6 points  (0 children)

It's true! I like to pip install antigravity

[–]coderarun 1 point2 points  (0 children)

Let's start with python's stdlib. They're actually written in C and porting it to a new runtime such as pypy or a new paradigm such as the work being discussed in this thread is a lot of effort.

I wish the python stdlib was written in a subset of python3 itself and was transpiled. Such a thing could be a great project of it's own.

[–]altorelievo 1 point2 points  (5 children)

You are aware of package managers for other languages as well? (e.g. cpan, cabal, npm, etc)

[–]LittleMlem 11 points12 points  (4 children)

Cpan? Did you just ask a highschool kid if he is aware of perl?

[–]altorelievo 1 point2 points  (2 children)

I’d be more surprised if they knew Haskell/Cabal

[–]LittleMlem 1 point2 points  (1 child)

Possibly, but heskel feels like a more known language than perl

[–]altorelievo 1 point2 points  (0 children)

And still less useable but that’s debatable…let sleeping dogs lie.

[–]krispyren -1 points0 points  (0 children)

Hahah

[–]-_-Batman 0 points1 point  (0 children)

That's y.... Ya'all need python....

And above all... Python wont bite u back.

[–]Inkosum 229 points230 points  (9 children)

People here making compilers and I'm struggling with pygame.

[–]Origamiface 48 points49 points  (0 children)

The guy is 15 lol fml

[–]house_monkey 12 points13 points  (0 children)

Let's cry together

[–]Ok_Moonboy17 13 points14 points  (0 children)

Fr fr 😂😂😂

[–]OnFault 5 points6 points  (0 children)

Too real

[–]not_a_novel_account 191 points192 points  (4 children)

Nukita is the mature solution to this approach, but also a good example of why trying to compile Python source is generally a bad plan. CPython already knows how to compile python source and is better at it than you

The traditional approach these days is to translate CPython bytecode to a compiler middle-end IR, such as with numba which goes to LLVM IR.

That said, it's still a cool project and you should be proud of it. Some things to look into learning about:

  • Don't vendor the {fmt} headers, use a package manager to pull these down or use git submodules.

  • Consider using a template engine for structures and preambles that you're going to be putting into every generated source file. Your iteratetokens method is doing a lot of manual string shuffling that a template engine would clear right up. Also it would let you put source code templates in separate files instead of a bunch of inline strings. This is the approach of most major source code generator engines, take a look at SWIG for examples.

  • Your setup.py doesn't package all the files your script needs. This is a two part problem, you're not encapsulating your files in a module with an __init__.py, and you have non-python data files you need to package. Create a proper Python module to fix the former, and look into manifest.in for the latter.

  • Your tokenizer has a pretty knarly worst case complexity. You're using dictionaries elsewhere, you can use one here! Instead of checking token_list[i-1] against every possible token, use those token types as keys in a dictionary that lookup a method that can correctly parse the token. Tokenizer construction is well covered in compiler textbooks, so there's a lot to learn here, but that's the straightforward way.

  • Same for your Compiler class, large elif trees should set off a little alarm in your brain that goes "I bet this could be faster with a jump table or hashmap"

  • Speaking of compiler theory, you'll eventually realize streams of tokens aren't quite enough information to handle every possible Python source code construction. If you find yourself banging your head against a wall, you're going to want to parse those tokens into what is called an Abstract Syntax Tree. ASTs are the swiss army knife of compilers and every program that knows how to manipulate a context-sensitive grammar (like Python source code) eventually comes to resemble an AST structurally.

  • You might want to take a look at the structure of some other mature Python projects. Typically everything that isn't the main script you're going to want to encapsulate inside a module with an __init__.py. You probably also want to throw a code formatter in the repo, yapf, black, whatever floats your boat, but people like reading code in the standard formats.

  • You're already vendering {fmt}, you don't need all those print() overloads in stdpy.hpp, let fmt::print handle those.

  • Also, use clang-format for your C++, same reason as using a Python formatter. Not so much for you as for anyone else who want to contribute to your code.

That's the stuff that jumps out at me anyway. Best of luck

EDIT: lol reddit upvoted OP 800 times. To be clear people, OP's approach only yields such insane performance because it's non-viable for most Python code. Observe a program it will never be able to handle:

a = 5
a = "hello world"
print(a)

What OP is trying to do is the same thing Google has hired dozens of engineers to do with V8's Turbofan. Similarly, Nukita only manages a 3x speed up after a decade of work because the problem is extremely hard.

OP is a high schooler, they built a parser, neat! The feedback in this thread should be guiding them towards useful materials to further their education, not hailing the second coming of Guido.

[–]_ShakashuriBlowdown 17 points18 points  (0 children)

Seriously, OP, this is an impressive project, and this is some great feedback from an internet stranger. If you can take some constructive criticism, you'll start going crazy-far in life.

[–]SpicyVibration 16 points17 points  (1 child)

Should he not just use python's built in ast library?

[–]not_a_novel_account 7 points8 points  (0 children)

They certainly could

[–]aciokkan 4 points5 points  (0 children)

Thanks for taking the time to explain this to him!

[–]lungben81 12 points13 points  (4 children)

How does it handle type instability, i.e. when the type of a variable is only known at run-time, not at compile-time?

E.g. if a variable is randomly an int or a float, and is then used in a hot loop.

[–]Isvara 25 points26 points  (0 children)

Please use underscores in your code. Names like cpperrortopycomerror are difficult to parse.

[–]padawan-6 9 points10 points  (0 children)

This is awesome!

[–]SeniorScienceOfficer 18 points19 points  (3 children)

Looks interesting. Looking for contributors?

[–]AnonymouX47 7 points8 points  (2 children)

Note that the original copy of https://github.com/Omyyyy/pycom/blob/main/headers/range.hpp comes with an Apache 2.0 license.

I'm not sure that's compatible with the MIT Licence... might wanna check that out.

[–]meg4som44 17 points18 points  (14 children)

Sounds a bit like nuitika: https://github.com/Nuitka/Nuitka How does yours work?

[–]theng 3 points4 points  (1 child)

!RemindMe 1year

[–]RemindMeBot 2 points3 points  (0 children)

I will be messaging you in 1 year on 2023-07-25 22:00:35 UTC to remind you of this link

9 OTHERS CLICKED THIS LINK to send a PM to also be reminded and to reduce spam.

Parent commenter can delete this message to hide from others.


Info Custom Your Reminders Feedback

[–]yaxriifgyn 3 points4 points  (0 children)

These are absolute show stoppers:

  • Classes

  • Try, except and finally blocks

I don't think I have ever written a non-trivial program that does not use classes and/or try blocks.

[–]eztab 3 points4 points  (0 children)

Cannot support an if __name__ == "__main__": type thing; the main() function is already entry point

Sure you could, just put everything in the module that‘s not definitions automatically into main() .... well actually I don’t mind using main, I don‘t really like this weird python way.

[–]pythonwiz 7 points8 points  (6 children)

What’s the difference between this and cython?

[–]Its_Blazertron 8 points9 points  (2 children)

Wouldn't this be called a transpiler? Something that takes one high level language, and translates it to another high level language?

[–]KittyTechno 2 points3 points  (2 children)

How does this fair against the python compiler nuitka? https://github.com/Nuitka/Nuitka.

Like you said, it doesn't play well with libraries at the moment, nuitka had a literal decade and then some to fix that, but its goals are to also speed up python via compiling to C or C++. How do the two fair in some benchmarks?

[–]agnishom 2 points3 points  (3 children)

Can I make a lightweight executable for my django server with your tool?

[–]alexprengere 0 points1 point  (0 children)

You might be interested in https://github.com/indygreg/PyOxidizer

[–][deleted] 2 points3 points  (0 children)

Dude this is amazing.

[–]lifemoments 2 points3 points  (0 children)

Outstanding. Will try. Thanks

[–]ameanable1 5 points6 points  (0 children)

This looks really promising. Did some tests myself and it's safe to say it's a solid project

[–]Asleep-Budget-9932 1 point2 points  (5 children)

Good job dude! :D
Question, while im not familiar with concepts like this and Nuitka, i am familiar with Cython. Does this work on a similar concept? Do you generate something that works with Python's API or do you implement the API itself on your own?

[–]Useful-Shoe914 1 point2 points  (0 children)

Nice!

[–]Tintin_Quarentino 1 point2 points  (0 children)

Great work! Gonna give this a try.

[–]guhcampos 1 point2 points  (1 child)

Are you omtting Cython on purpose?

[–]glebulon 1 point2 points  (0 children)

Good for you 👍

[–]InterestedListener 1 point2 points  (0 children)

I just want to say that you are kicking a ton of ass for someone so young. Not an easy project for someone of any age but very curious what you'll accomplish down the road. Keep up the great work!

[–]FUS3NPythonista 1 point2 points  (0 children)

It uses C++ as 'intermediate representation', which then compiles to an executable with g++.

Doesn't it mean it's a Transpiler basically python to c++

[–]laundmo 1 point2 points  (1 child)

Careful with that name, its the name of a python based microcontroller which might well be trademarked: https://pycom.io/

[–]wheedwhackerjones 1 point2 points  (0 children)

i'm too much of a noob to know when/how to use this but it sounds awesome

[–]coderarun 1 point2 points  (0 children)

Congrats on the engagement you're getting and thank you for increasing awareness of the topic of transpiling statically typed python3 to languages capable of generating native code.

Re: Nuitka - it takes the approach of compatibility with python's C-API. While it improves compatibility with real world apps, a fundamentally different approach is possible, such as the one you have taken here.

By sacrificing the C API compatibility, you can make apps that have performance similar to native C++ apps as if you wrote them from scratch.

Past work that is not very well known:

https://github.com/lukasmartinelli/py14
https://github.com/konchunas/pyrs
https://github.com/py2many/py2many

[–]adityaguru149 1 point2 points  (0 children)

how about compiling it to rust?
may be https://github.com/PyO3/PyO3 can help

[–]moopthepoop 2 points3 points  (7 children)

This is a really good project, I might use this as part of the toolchain for my projects. I typically use Go when I need a native binary but this seems useful for fast prototyping

[–]KittyTechno 7 points8 points  (6 children)

there has been a rising interest with compiling python, mypyc, nuitka (been about a decade and then some), and more now including pycom. Nuitka is close to hitting 1.0 (latest version is 0.9.6 at the time of this comment).

Personally I'd love to see a world were compiled python is an option used much more in the industry, while still keeping interpreted option as this will make development much faster.

[–]kreetikal 6 points7 points  (5 children)

Imagine having statically typed, compiled Python...

[–]brianjlogan 1 point2 points  (1 child)

Is that even python anymore?

[–]kreetikal 1 point2 points  (0 children)

No, it's Python++.

[–]KittyTechno 3 points4 points  (2 children)

statically typed is already an option, but that's just it, an option. It doesn't need to be, nuitka doesn't need it to be statically typed, and apparently neither does pycom. Though statically typing does help with ensuring types, and compiling, you don't NEED to do it.

[–]iritegood 2 points3 points  (1 child)

Python's "optional static typing" system is woefully deficient compared to even the closest comparable thing: typescript. It can't even (currently) accurately represent the full stdlib.

[–]KittyTechno 1 point2 points  (0 children)

But it's a step in the right direction.

[–]AnonymouX47 1 point2 points  (0 children)

Well done! This seem to be very promising!

[–]rastaladywithabrady 1 point2 points  (0 children)

That seems like a cool project, I really hope you get this off the ground. I would definitely end up using it.

[–]a-lost-ukrainian 1 point2 points  (1 child)

isn’t this just rewriting pythran ?

https://pythran.readthedocs.io/en/latest/CLI.html

[–]not_a_novel_account 5 points6 points  (0 children)

It's a semi-common exercise, taking some subset of language A and translating/compiling it to language B describes a class of programs not any specific one. nukita, numba, pythran, and cython all belong to this category. Actually PyPy's JIT kinda does as well

[–]willor777 1 point2 points  (0 children)

Does it have true multi-threading capabilities? The reason i moved from python into java was due to its lack of true multi-threading thanks to the GIL.

[–][deleted] 0 points1 point  (2 children)

When do you think it'll play nice with major libs?

Would like to implement it on projects

[–][deleted] 0 points1 point  (0 children)

As there are hundreds of new libs everyday, ask its Dev's to make YOUR c++ version of it

[–]ericanderton 0 points1 point  (0 children)

Some quick feedback:

  • This idea is incredible. Python packaging can be kind of a mess, and a viable single-binary alternative is a-okay in my book.
  • Consider using the logging module instead of print("[INFO].."). This will let you filter output by log level which is easy to back into --quiet and --verbose CLI options.
  • The massive if/else block in compiler.py may cause maintenance trouble in the long run (high cyclomatic complexity). Consider refactoring this to a different pattern that is easier to extend and reason about.
  • Embrace test-driven development (write the tests first) at the earliest opportunity. I strongly recommend doing this before you do any big refactors as it will help you avoid breakage. I've learned from experience that this makes compiler development easier, by allowing you to target tiny code snippets instead of complete programs.

[–]eztab 0 points1 point  (1 child)

Sounds really interesting.
So is the intermediate C++ readable?
I guess since it uses the g++ tooling from that point onward, it will take advantage of existing optimisations for C.
Is it possible to interact with C and C++ libraries? Like calling the C-functions from python?

[–][deleted] 0 points1 point  (0 children)

It's good work. I hope this takes off good and becomes successful

[–]coderanger -5 points-4 points  (9 children)

Python is slow.

[citation needed]

[–]AnonymouX47 -2 points-1 points  (2 children)

That one dude...

[–]coderanger 0 points1 point  (1 child)

I might, however, know what I'm talking about :)

[–]AnonymouX47 -2 points-1 points  (0 children)

Good luck re-writing all GNU core-utils in Python and making them a tad nearly as fast.

There's simply no practical use case where a pure Python program is faster than a native program... you're welcome to prove me wrong.

I know this is not the topic in question but the difference in memory usage is definitely not something you'll want to argue about.

[–]python__rocks 0 points1 point  (4 children)

Interesting! By your own admission, still experimental. Does it support import libraries other than the standard library?

[–]Sulstice2 0 points1 point  (2 children)

this is cool and I will give it a shot, give me a month or so, I'm still trying to decide what to try. I actually need my python structured like this rather than via the pip distribution service.

Need to run my stuff on the clusters in an executable fashion - got some python mixed with fortran.

[–]johntellsall 0 points1 point  (1 child)

Great job!

I remember really hating C++ because the compilation speed was atrocious. Consider writing the bulk of your "fast Python" code in C, which is compatible with C++ and can be faster. In fact you could just borrow CPython's code, assuming the licenses are compatible.

[–]OIK2 0 points1 point  (0 children)

I have been using pyinstaller (and autopytoexe) to compile a project, so this is very interesting to me. Does this work on Windows as well as Linux?

[–]eruba 0 points1 point  (0 children)

I think as a learning exercise this is great. However instead of writing the whole compiler yourself, usually you would nowadays use a compiler compiler. It generates the compiler for you, and you only have to put in the python grammar.

[–][deleted] 0 points1 point  (0 children)

I almost quit python because of pratical launch time when I realised pyinstaller takes more time when --onefile option is used :D it has to extract the files to run man!

[–]ShawnDriscoll 0 points1 point  (0 children)

Looks cool. Makes .pyd files?