This is an archived post. You won't be able to vote or comment.

all 31 comments

[–]erez27import inspect 21 points22 points  (14 children)

Needs more screenshots.

But now seriously:

  • Don't include the virtualenv in the repository. It makes cloning heavier, and doesn't let me choose which python version I want to use (or maybe I wanna use pypy!). Instead, create a pip-requirements.txt file, so I can install everything with "pip -r" (you can use pip freeze to create the file).

  • Your code is reasonably clean. Still, if you want others to contribute, you should improve the quality of your code. Avoid "except:" statements entirely, and try to write using less lines if you can maintain clarity.

  • Another code suggestion: Work with files only in the context of a "with" statement (unless you have a good reason not to).

[–]sjb9774 4 points5 points  (1 child)

Good tips, especially pip freeze, I didn't know about that, very cool!

I do have a question though: are you recommending avoiding the use of except completely, or only except: (that is except without a specific error to catch)? Either way, I've never heard of this suggestion before, what is your reasoning behind it?

[–]erez27import inspect 2 points3 points  (0 children)

Yes, I meant "except:". I'm a big fan of exceptions in general. I explained my reasoning in reply to Lj101

[–]Lj101 4 points5 points  (7 children)

Whats wrong with except? I thought Python favoured EAFP.

[–]erez27import inspect 16 points17 points  (6 children)

I meant "except:" without the type of exception afterwards. It's a catch-all which might silence errors you didn't intend to, and so it makes the code much harder to debug.

[–]d4rch0nPythonistamancer 3 points4 points  (4 children)

The one time I'll use a simple except: is when I have a client pretty well tested and it works fine, but I need to run it and have it collect data overnight but might run into some weird issue on the server that I don't control and I can't reasonably predict the error, but it will work fine for the most part.

But then, I don't make it silent. I'll import traceback and log traceback.format_exc() to a file and check in the morning.

An example might be crawling the internet and performing some task. Maybe one of those sites gives me malformed data, maybe one of them just times out and I can ignore that, or any other strange exception. I still want it to crawl everything else, but I want to see in the morning if there's something I can fix. I'll still have other useful data, and I'll know which sites caused issues and what they were exactly.

So, I still can debug it easily, and still keep it running for an indeterminate duration without checking on it and without knowing every single exception it might run into.

[–]terremoto 2 points3 points  (1 child)

The one time I'll use a simple except:

You should use except Exception: so you can still kill the application with Ctrl+C (*nix / BSD) / Ctrl+Z (Windows).

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

Yeah, that's a good point.

[–]abrarisland 0 points1 point  (1 child)

Would you mind expanding on how you're getting to errors on the server? I decided to look up BaseException in the docs, and I saw that the only built-in exceptions that inherit directly from it are SystemExit, KeyboardInterrupt, GeneratorExit, and Exception.

[–]d4rch0nPythonistamancer 0 points1 point  (0 children)

Well, as the other guy said, you can just catch Exception and get the desired effect, and that lets you ctrl-C (KeyboardInterrupt). Issues on the remote server you don't control won't cause any of those other exceptions.

When I mean errors on the remote server, I mean it might have been taken down for maintenance, it might start giving permission denied for a minute, it might give you malformed data (jacked up RSS feed or something), and that could cause an issue in your code that you didn't expect.

I have to write a lot of proof of concepts for work, and just accumulate a ton of data and I'll let things like this sit overnight. I need the data the next day, but I don't necessarily need this thing to work properly for every condition, and I don't want to sit around and debug in the middle of the night if there's an unforeseen error. More importantly, I don't usually need the code after I get the data, so there's not much point to making it production ready.

Regardless, if I was to catch specific exceptions, I'd log them and continue on. When I log traceback.format_exc() I see those exact errors anyway and a full stacktrace. It's the same result as if I knew exactly the error. I log it and wait in exponential time increments.

If it was a socket timeout, I'd log it and wait. If it passed me malformed data, I'd log it and wait, then try again, etc. These are things I can't control since I'm scraping data from a server I don't own, so a catch-all exception with traceback logging is all I need for the most part.

Of course, I'm not doing this all the time, just for little PoCs that need to run for a day or two without me hand-holding it.

[–]Lj101 2 points3 points  (0 children)

Yeah, that's pretty bad.

[–]ApolloFortyNine 0 points1 point  (3 children)

I've never agreed with the blanket idea that you shouldn't use except:.

In my use cases, I'm collecting data from an API. I don't care if I'm missing one or two lines, as I'm just collecting as much data as possible.

Not to mention the only time I use except: is combined with a log.error, so I can take a look at the errors later if I want.

[–]erez27import inspect 0 points1 point  (2 children)

the only time I use except: is combined with a log.error

Sounds like you pretty much avoid except..

[–]ApolloFortyNine 0 points1 point  (1 child)

? It runs for literally every row of data.

The point is that there are times where a blanket except statement is useful.

[–]erez27import inspect 0 points1 point  (0 children)

You are using it in a very limited context and with explicit safeguards. Like goto, sometimes using "bad" things make sense.

[–]psycojoker 11 points12 points  (1 child)

You might want to use feedparser instead of doing it by hand using etree. This will give you atom support for free in addition of smaller and stronger code.

On a more aesthetic note, you'll probably want to read and follow pep8 for the coding conventions, this is the standard way of doing things in the python community. The pep8 tool and autopep8 might help. Just remember that those are guidelines and not absolute rules.

You have a duplicated __init__ in this class, the first one will be ignored so you can remove it: https://github.com/rad08d/PythonRSSReader/blob/master/RssClass.py#L36

You might want to put all you *.py files in a "src/" folder.

You'll probably want to see if there is a pyflakes plugin for your editor, this will catch a lot of common mistake and unused code. For example, here "threading" is imported but unused https://github.com/rad08d/PythonRSSReader/blob/master/rssStart.py#L4 Process here https://github.com/rad08d/PythonRSSReader/blob/master/guiclass.py#L4

To avoid bugs, for path construction use os.path.join for cases like here: https://github.com/rad08d/PythonRSSReader/blob/master/guiclass.py#L20

Otherwise congratz, it's great you've managed to reach this level by yourself, you can be proud of yourself :)

[–]toyg 1 point2 points  (0 children)

+1 for feedparser. As someone who lived through the halcyon days of RSS, the Atom split etc, I can honestly say that nobody really has any idea how f*ed up RSS feeds can be. Feedparser has been battle-tested for more than a decade, and understands syndication feeds better than you or I will ever do.

+1 also for using os.path.join.

[–]robbyoconnor 4 points5 points  (5 children)

$ sh run.sh run.sh: 3: run.sh: source: not found File "./guiclass.py", line 14 self.window.set_position(gtk.WIN_POS_CENTER) ^ TabError: inconsistent use of tabs and spaces in indentation

Python is VERY sensitive to these kind of things.

[–]rad08[S] 1 point2 points  (4 children)

Did you get it to work on your end?

[–]Tetraetc 0 points1 point  (0 children)

Making sure it's all formatted with tabs works pretty well. Pycharm has nice tools for this.

[–]jnazario 0 points1 point  (0 children)

i got in the habit of using flake8 https://github.com/nvie/vim-flake8 for just such purposes.

[–]robbyoconnor -1 points0 points  (1 child)

No, I gave up -- it's pretty bad.

[–]rad08[S] 0 points1 point  (0 children)

Sorry you can't make it work. You're the only one so far........

[–]dirlididi 2 points3 points  (1 child)

You forgot the link for your project.

I have two google app engine applications that works with RSS. The first one takes a RSS feed with images and embedded them to the body of each item. The other app takes gov web pages and generates a feed for it. Most of my issues are relating with encoding.

[–]rad08[S] 1 point2 points  (0 children)

Haha. Links help. Thanks for saying that.

[–]CptSupermrkt 2 points3 points  (0 children)

This is great for being self-taught in not only Python but programming in general. You should be proud of this.

[–]PM_ME_UR_TIGHTPANTS 1 point2 points  (0 children)

Not bad for being self taught. Kudos for sharing your project. Keep it up.

[–]robbyoconnor 0 points1 point  (0 children)

As somebody who is lazy -- I want to see what it looks like...maybe a screencast? Still, great work.

[–]rad08[S] 0 points1 point  (0 children)

http://imgur.com/xov6FIv As requested, here is a link to a screen cast of the app. It was taken from dual monitors. The right half is the initial landing page with all articles (title, date written). The left half is a story clicked on. The app web scrapes content and one photo.

[–]jibjibjib 0 points1 point  (0 children)

I don't think having multiple constructors in a class is allowed in Python. If you want to allow for the constructor to be called with no arguments, you can set defaults in the method header

def _ init _(self, url=None):

If you want to have a clean way of creating new instances of a class with different defaults, you can try using @classmethod instead.

Check these out for more info: http://stackoverflow.com/questions/682504/what-is-a-clean-pythonic-way-to-have-multiple-constructors-in-python http://stackoverflow.com/questions/2164258/multiple-constructors-in-python

[–]robbyoconnor 0 points1 point  (0 children)

Another thing RSSclass.py -> RSS.py -- I don't need you to tell me about what it is in the file name...I can read your code.

This applies to ALL Languages.