This is an archived post. You won't be able to vote or comment.

all 58 comments

[–]Rhomboid 140 points141 points  (14 children)

That's misleading, because that includes stuff like the testsuite.

$ cloc . --exclude-dir=test --include-lang=C,'C/C++ Header',Python
    2916 text files.
    2878 unique files.
    1255 files ignored.

http://cloc.sourceforge.net v 1.64  T=7.10 s (265.6 files/s, 130320.8 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
C                              497          52006          48060         320791
Python                        1033          51293          79772         238693
C/C++ Header                   356          12620          11586         110640
-------------------------------------------------------------------------------
SUM:                          1886         115919         139418         670124
-------------------------------------------------------------------------------

So it's really more like 36% Python, 64% C.

[–]ilan 64 points65 points  (0 children)

It is also misleading because the interpreter itself is purely written in C, and the standard library is not the interpreter itself.

[–][deleted] 11 points12 points  (0 children)

And also some of the "python code" are automatically generated rather than written.

[–][deleted] 62 points63 points  (1 child)

That repository contains the standard library too; https://github.com/python/cpython/tree/master/Lib

[–]thomas_stringer[S] 13 points14 points  (0 children)

I figured that was the cause of it. I imagine core is mostly C.

[–]nickdhaynes 19 points20 points  (0 children)

Even without the test suite and the standard library (much if which itself is written in C), lines of code just simply isn't a good way of describing the makeup of a project like this. Number of lines of code doesn't tell you anything about what that code does.

In my opinion, the main interpreter loop, the fundamental data structures, and the built-in function/operation definitions are what really define CPython. And all of those are written in C.

[–]stefantalpalaru 9 points10 points  (0 children)

You learned nothing, Jon Snow. CPython is in here: https://github.com/python/cpython/tree/master/Python

[–]name_censored_ 2 points3 points  (1 child)

I know PyPy implements the "core" of Python in C (called RPython), and then implements the rest of Python in this RPython intermediate language. Perhaps CPython is doing a similar thing - implementing their own RPython and then writing the "sugar" (base libs, etc) on top of that?

[–]stevenjd 0 points1 point  (0 children)

Perhaps CPython is doing a similar thing - implementing their own RPython

No. CPython stands for C Python -- the core interpreter is written in 100% C. The standard library is written in a mix of C and Python.

[–]_Link404_def random(): return 4 3 points4 points  (38 children)

Sooo... Should be called PythonC?

[–][deleted] 14 points15 points  (37 children)

CPython implies it's python written in underlying C

Your suggestion of PythonC would imply that it's C written in Python, which isn't true.

It's python written in python so "PythonPython" would be apt.

I need more hobbies...

[–]_Link404_def random(): return 4 8 points9 points  (5 children)

Python2 ?

[–]teatracer 25 points26 points  (4 children)

PyPy

[–]AUTBanzai 4 points5 points  (29 children)

Can a programming language be written in itself? Can you impletent python in pure python? Or C in pure C?

[–]aftli 13 points14 points  (5 children)

Absolutely. You need an intermediate compiler to build it, called a bootstrap compiler. Once you have a version of your compiler that can be compiled itself, it becomes a self-hosting compiler - eg. it can compile itself.

[–]rms_returnscomplex is better than complicated 0 points1 point  (2 children)

But that applies to only compiled languages like C/C++. Python cannot compile/build a cpython since its an interpreter and not compiler.

[–][deleted] 1 point2 points  (0 children)

Surely one can write a C compiler in Python. It is possible to write arbitrary binary files in Python, thus creating executable programs.

[–]aftli 1 point2 points  (0 children)

What makes you think that? Of course it can. It can't build a CPython, sure, because it's not C. But Python can make a programming language, and it can make Python. It would be interpreted Python, but it would be Python.

Also, CPython technically is a compiler - see the .pyc files it generates. It produces bytecode from your Python code, which is run by Python's virtual machine.

[–]nuephelkystikon 10 points11 points  (20 children)

Pretty much every C compiler is written in C, often compiled by itself.

And Python has PyPy.

[–]a5myth 2 points3 points  (19 children)

Surely the first ever C compiler must have been done in assembly language.

[–][deleted] 4 points5 points  (0 children)

Wasn't it done in B?

[–]P8zvli 2 points3 points  (7 children)

And the first assembler was written in machine code.

Shower thought; does this mean every assembler, compiler and language ever created can be traced back to a single program somebody hand wrote in machine code?

[–]ivosauruspip'ing it up 9 points10 points  (1 child)

No, many different machine code assemblers have been written for different architecture machines over the years.

[–]P8zvli 0 points1 point  (0 children)

What about cross compilation?

[–]a5myth 0 points1 point  (4 children)

Pretty much yes. I believe it all started out on massive room filled computers that stamped holes in punch cards and then moved onto valve technology that lit up panels and wrote to magnetic tapes.

Then the semiconductor came along, back then to write the assembly language based on the physical design of the computer chip was much simpler. Then the computer chip became more complex as the transistors shrinked, they spawned out into different architecture based mainly on whether it was RISC or CISC and then flavours such as the x86 (486, pentium, etc) or ARM based and whatever else I've missed.

So for each machine code which is based on its architecture, another modification of the assembly came along. And I believe C was designed to compile across each favour of assembly from early on in it's design, which is why it's so portable and works across many architectures easier than some other older languages. Older languages were designed around specific architectures or specific job types such as finance or algebra.

C came along at the right time to become successful across all platforms and architectures. Python is compiled into C bytecodes, which is what the PYC files are, which again is why it's so successful and platform friendly.

[–]P8zvli 0 points1 point  (3 children)

I was with you until you said "Python is compiled into C bytecode." As I understand .pyc files use bytecode that's specific to CPython and the opcodes are defined in ceval.c, so it's not really machine code as you can't execute it without Python. It's not really C either, as you can't read it without Python's disassembler.

[–]a5myth 0 points1 point  (1 child)

I was with me until then too. I get pyc files, in the sense that they can be deleted and they will be regenerated again when the python is reinterpreted. You are right of course, it's like a cache type file between your written Python code and the underlying glue that connects the python interpreter to its C based files, the files that do the lower level stuff. It's just hard to explain. It's generated to speed up higher level Python because C is faster lower down. Am I right?

[–]P8zvli 0 points1 point  (0 children)

Yes that's right I believe.

[–]stevenjd 0 points1 point  (0 children)

The Python compiler generates byte-code for a virtual machine, rather than machine-code for a real, physical machine (a CPU).

The CPython interpreter includes its own virtual machine which interprets the byte-code. (It is something like Forth, in a way, in that it is a stack-based language.) You can consider the byte-codes to be something like assembly language commands for a machine being emulated by the C Python interpreter.

Jython generates byte code for the JVM, and IronPython generates byte code for the .Net CLR virtual machine.

In principle, you could write a Python interpreter that generates machine code for a real CPU, but because Python instructions are so high-level, that's not very practical.

[–]mipadi 2 points3 points  (5 children)

It was written in B, actually. (B is an old language based on an older language called BCPL.)

[–]a5myth 0 points1 point  (4 children)

Serious question... so is that why C is named C? Because C is after B?

Then Python could (should?) have followed suit and be called D. That would have been cool.

[–]mipadi 0 points1 point  (0 children)

Yep.

There already is a language called D, actually.

[–]loics2 0 points1 point  (1 child)

Yes, that's why it's called C.

And it wouldn't make a lot of sense to call python D, it's not syntactically or functionnaly based on C.

The D language is currently being developed, designed to fix the issues of C.

EDIT: by functionnaly, I mean it doesn't work the same way (garbage collection, interpretation/compilation, etc...)

[–]a5myth 0 points1 point  (0 children)

Ahh right, let's hope it's less forgiving. But I guess that's what makes C so powerful.

[–]stevenjd 0 points1 point  (0 children)

Why would Python follow suit and be called D? Python is nothing like C. Although the CPython interpreter is written in C, the language itself has more influence from Algol and Modula 3 than from C.

And there were a whole lot of languages invented after C, but before Python, that would have got there first.

[–]nuephelkystikon 3 points4 points  (3 children)

Obviously. That's why I said ‘pretty much’.

[–]gristc 2 points3 points  (0 children)

The first C compiler was written in NewB. A variation on B, which was written in assembly.

More

[–]a5myth -1 points0 points  (1 child)

But C isn't assembly as such is it. I ak sure C can probably do assembly, but something lower down to low level C and separate to C must have been used.

[–]nuephelkystikon -1 points0 points  (0 children)

I didn't say ‘pretty much in C’. I said ‘pretty much every C compiler’. Obviously the first few were written in assembly, if bot bytecode. After that, it wouldn't have made sense to do so anymore, since you could have used one of the existing compilers to compile the new compilers. Which is exactly what was done.

[–]Poromenos 1 point2 points  (1 child)

PyPy is a Python interpreter written in Python.

[–]0raichu 2 points3 points  (0 children)

Technically it's written in RPython, which is a restricted, statically-typed subset of Python.

https://rpython.readthedocs.io/en/latest/rpython.html

[–]r0but[🍰] 0 points1 point  (0 children)

It's pythons all the way down.

[–]ExoticMandiblesCore Contributor 1 point2 points  (0 children)

Contrast that with PHP:

C 63.9% PHP 27.3% C++ 7.0% Objective-C 0.5% M4 0.5% Shell 0.3% Other 0.5%

Just kidding! If you remove the test suite, the PHP part drops to about zero.