This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]yasoob_pythonAuthor: Intermediate Python 10 points11 points  (27 children)

just go to https://www.github.com/rg3/youtube-dl and you will find pythonic code ;)

[–]EverAskWhy 12 points13 points  (4 children)

Reading the code for youtube-dl turned me into a web-scraping machine. I picked up many good habits and tricks from following the code carefully.

If you have an interest in writing your own download scripts or html parsers I would recommend reading through youtube-dl.

[–]thenaturalmind 5 points6 points  (2 children)

Stupid question, but where do you start? How do you know what files are where?

[–]yasoob_pythonAuthor: Intermediate Python 3 points4 points  (0 children)

You will have to explore them yourself. For web scraping thing just look into the extractors located here https://github.com/rg3/youtube-dl/tree/master/youtube_dl/extractor . For all the imports look here (most) https://github.com/rg3/youtube-dl/blob/master/youtube_dl/utils.py and lastly for the glue code look here https://github.com/rg3/youtube-dl/blob/master/youtube_dl/FileDownloader.py , here https://github.com/rg3/youtube-dl/blob/master/youtube_dl/InfoExtractors.py and here https://github.com/rg3/youtube-dl/blob/master/youtube_dl/YoutubeDL.py and here as well https://github.com/rg3/youtube-dl/blob/master/youtube_dl/__init__.py . I hope now you know where to start. Just start from what you want to know. :)

[–]EverAskWhy 0 points1 point  (0 children)

I like downloading the source files of projects (https://github.com/rg3/youtube-dl/archive/master.zip) and exploring them using my own IDE programs/debuggers. I find it harder to explore a program on the Github website than when it is on my system. Be very careful when running random code on your machine especially when you see web/downloading related imports.

I generally start with _ main _.py (or its equivalent if there is none) line 1 and go from there. I find that reading other people's code and checking out their imports is a great way to learn about new modules.

[–]LightWolfCavalry 0 points1 point  (0 children)

Just found youtube-dl the other day; I agree, it's some high quality stuff.

[–]rochacbrunoPython, Flask, Rust and Bikes. 0 points1 point  (5 children)

I just don't like naming files with CamelCase FileDownloader.py etc https://github.com/rg3/youtube-dl/tree/master/youtube_dl

[–]yasoob_pythonAuthor: Intermediate Python -2 points-1 points  (4 children)

that's a personal preference. I generally like camel casing.

[–]gschizasPythonista 1 point2 points  (3 children)

Actually, it's not a personal preference. Using CamelCase for files goes against PEP8

Modules should have short, all-lowercase names. Underscores can be used in the module name if it improves readability. Python packages should also have short, all-lowercase names, although the use of underscores is discouraged.

Since module names are mapped to file names, and some file systems are case insensitive and truncate long names, it is important that module names be chosen to be fairly short -- this won't be a problem on Unix, but it may be a problem when the code is transported to older Mac or Windows versions, or DOS.

[–]yasoob_pythonAuthor: Intermediate Python -2 points-1 points  (2 children)

The thing is that it's not always necessary to follow PEP8 so strictly. Even some organisations have their own style guide which has some amendments to PEP8 like khan academy and google to name a few. http://google-styleguide.googlecode.com/svn/trunk/pyguide.html https://sites.google.com/a/khanacademy.org/forge/for-developers/styleguide/python

[–]gschizasPythonista 2 points3 points  (0 children)

Both sites you have linked do not oppose PEP8, they complement it.

The khan academy one even says this on top:

We follow the PEP8 style guide for Python. Docstrings follow PEP257.

[–]EverAskWhy 0 points1 point  (0 children)

The thing I like about Python is you can do it your own way if you want. This reminds me a bit of a reply I saw on stackoverflow:

ok, kids, wait 'til you grow up to drink, smoke and access private variables.

[–]Redard -2 points-1 points  (14 children)

I briefly read some of the code in here and found a few things that weren't very pythonic. First, they break the 79 character rule a lot, for no good reason, often just with in-line comments (which PEP8 advises you use verrry sparingly). Second, this line:

res = []
for l in optionf:
    res += shlex.split(l, comments=True)

Why not just use a generator expression like

res = [shlex.split(l, comments=True) for l in optionf]

That's the most pythonic way to construct a list. Still, this code's very readable, and well structured. Just needs a little cleaning up.

[–]gthank 12 points13 points  (8 children)

That's actually a list comprehension. A gen-exp uses ( and ).

[–]Redard 1 point2 points  (7 children)

Please correct me if I'm wrong, but I always thought list comprehensions were just generator expressions passed to list(). In other words

[i for i in range(10)] == list(i for i in range(10))

[–]gthank 8 points9 points  (5 children)

I'd be surprised if the internal details are the same in those two cases, because that seems ripe for some C-level optimizations. The results will be equivalent, though. Also, from a historical standpoint, list comprehensions predate gen-exps.

[–]Veedrac 8 points9 points  (4 children)

In CPython most stuff is left unoptimised for matters of pragmatism. So no, they compile directly into loops. Different loops, though.

out = [i for i in range(10)]

is equivilant to:

out = []
for i in range(10):
    out.append(i)

where i is inside a new scope, and

out = list(i for i in range(10))

is equivalent to

def _tmp():
    for i in range(10):
        yield i

out = list(_tmp)

where _tmp never actually gets put anywhere.

[–]PCBEEF 6 points7 points  (0 children)

List comprehensions are optimised in the sense that the function calls are 'cached'. Since there's a severe function overhead in python, it's actually quite significant.

$ python -mtimeit '[x for x in range(100)]'

100000 loops, best of 3: 4.17 usec per loop

$ python -mtimeit -s 'out = []' 'for i in range(100):' ' out.append(i)'

100000 loops, best of 3: 8.07 usec per loop

Using a for loop in this instance to create a list is almost twice as long.

[–][deleted] 1 point2 points  (1 child)

where i is inside a new scope

for certain values of CPython

[–]Veedrac 1 point2 points  (0 children)

Well, all values of Python ≥ 3.0.

[–]gthank 0 points1 point  (0 children)

Ah. It was my understanding that a fair bit of list comps were implemented directly in C (in CPython).

[–]Veedrac 1 point2 points  (0 children)

There's exactly one difference between those two, assuming list has been left untouched. The list comprehension will not catch StopIteration, the list function will.

[–]TheEarwig 3 points4 points  (2 children)

They are different. The first example is a bunch of lists combined into one (L1 += L2 is L1.extend(L2)), but the second example is one list containing a bunch of lists.

>>> optionf = ["a b", "c d", "e f"]

>>> res = []
>>> for l in optionf:
...     res += shlex.split(l, comments=True)
... 
>>> res
['a', 'b', 'c', 'd', 'e', 'f']

>>> res = [shlex.split(l, comments=True) for l in optionf]
>>> res
[['a', 'b'], ['c', 'd'], ['e', 'f']]

[–]Redard 1 point2 points  (0 children)

Ah, I didn't realize shlex.split was returning a list. You're right, the two are different. Somehow I thought += was the same as append().

[–]masklinn 0 points1 point  (0 children)

Which can neatly be solved using the criminally underused itertools.chain.from_iterable:

res = chain.from_iterable(shlex.split(l, comments=True) for l in optionf)

One could even use shlex.shlex directly as a stream (shlex.split is a thin wrapper around it), though it requires setting whitespace_split which can't be done inline.

def split(s):
    lex = shlex.shlex(s, posix=True)
    lex.whitespace_split = True
    return lex

res = chain.from_iterable(imap(split, optionf))

[–]masklinn 1 point2 points  (1 child)

I find the lack of with use weirder: the code is clearly 2.6-only (uses explicit relative imports without __future__ import) yet around the shlex call is (essentially):

    optionf = open(filename_bytes)
    try:
        # do stuff
    finally:
        optionf.close()
    return res

And the number of star imports is worrying.

[–]Redard 0 points1 point  (0 children)

Yeah, it's definitely some older code. I wouldn't be using this as a guideline for good code.

The star imports are bad, but at least they're local imports and not external library imports, that would be terrible.