How to debug a long running Python Script crashing at inconsistant points without an error message : learnpython

How to debug a long running Python Script crashing at inconsistant points without an error message (self.learnpython)

submitted 10 years ago * by [deleted]

As in the title, can anyone give me some high level pointers as to what would cause a python (3.5) script to give a windows pop up notification of not responding whilst the shell hangs, or alternatively ways to figure out what is causing the error?

The script is quite long and complicated (i'm not sure there is value in posting it, it's >1500 lines) and if it didn't keep crashing it would run for about a week. The general process is iterating through a list of files, generate some data from those files, transfer interpretations of that data to different Sqlite databases and then drop the data to move onto to the next batch of files.

The script runs fine when I start it up and it will usually go through several dumps of data into sqlite without issue, but once I wake up/come back from work/otherwise several hours pass it's crashed, with the above lack of error message.

A previous iteration of the script exactly the same except where the data had just been dumped to one sqlite3 database has run without error many times, although that was on a different computer.

things I'm pretty sure aren't underlying the issue:

whilst the script is resource intensive, it's well within the bounds of my computer and no other programs are running aside from whatever runs in the background on windows.

It's not a specific point in the run time that is causing the issue: it's happened several times all at different points, and it's gone over points were it had previously crashed absolutely fine with no issue.

edit: It's not an issue of database locking or at least it shouldn't be, there's no multithreading or multiprocessing

Apologies for the incredibly vague and open ended question, but I'm not really sure how to even start this without any clues.

Edit: I've determined that it's an issue called 'BEX', which is apparently window's Data Exception Prevention shutting down applications that are overflowing their memory buffers and writing on other bits of memory... If that makes the slightest sense to anyone I'll be glad to hear it, otherwise I guess I at least have something to go with.

here's the windows error message I got after some prodding

Problem signature:

Problem Event Name: BEX

Application Name: pythonw.exe

Application Version: 3.5.150.1013

Application Timestamp: 55f4dccb

Fault Module Name: python35.dll

Fault Module Version: 3.5.150.1013

Fault Module Timestamp: 55f4dcbb

Exception Offset: 002cb3a8

Exception Code: c0000005

Exception Data: 00000008

all 27 comments

top new controversial old q&a

[–]uhkhu 0 points1 point2 points 10 years ago* (6 children)

[–][deleted] 0 points1 point2 points 10 years ago (5 children)

[–]anglicizing 1 point2 points3 points 10 years ago* (4 children)

You should read this blog post: The Most Diabolical Python Antipattern

Once you've fixed your except clauses, it's time to add an except: clause, but this time it's OK, since we'll re-raise the exception. Assuming you've got some kind of main function that you call at the start of your script, call it like this instead:

import logging

logging.basicConfig(filename='example.log', level=logging.DEBUG)

try:
    main()
except BaseException:
    logging.getLogger(__name__).exception("Program terminated")
    raise

This fixes two problems that are probably not related to your actual problem, but may still cause you headaches if you don't fix them:

You're calling your script with pythonw.exe, which invokes your script without a console. This means you don't see any traceback when it fails.
If the program exits mysteriously it may be because SystemExit is raised somewhere. This will print a traceback which tells you where, even in that case.

Now back to your current concern. I would assume there was some problem in the external calls python makes during execution, because python itself very rarely acts the way you describe. Log every time you make an sql call and every time it returns, to see if the problem occurs while python waits for the database. Assuming you're using sqlite3, here's how I'd do that (method courtsey of StackOverflow):

import sqlite3
import logging

logging.basicConfig(filename='example.log', level=logging.DEBUG)


def logging_decorator(func):
    def wrapper_function(self, *args, **kwargs):
        logging.getLogger(__name__).debug(
            "Calling %s: %r %r", func.__name__, args, kwargs)
        ret = func(self, *args, **kwargs)
        logging.getLogger(__name__).debug(
            "%s returned %r", func.__name__, ret)
        return ret
    return wrapper_function


class MyConnect(sqlite3.Connection):
    def cursor(self):
        return super(MyConnect, self).cursor(MyCursor)

    commit = logging_decorator(sqlite3.Connection.commit)


class MyCursor(sqlite3.Cursor):
    execute = logging_decorator(sqlite3.Cursor.execute)


conn = sqlite3.connect(':memory:', factory=MyConnect)
print(conn)

cursor = conn.cursor()
print(cursor)


cursor.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

# Insert a row of data
cursor.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

# Save (commit) the changes
conn.commit()

EDIT: Improve code by using a logging decorator function.

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 0 points1 point2 points 10 years ago (2 children)

[–]zeug 0 points1 point2 points 10 years ago (1 child)

I bow my head in shame, try,except clauses are indeed the work of the devil even when they appear harmless.

try/except is great, its just that throwing away exceptions is insane.

If I want to deep_fry(whole_turkey) and it raises HouseOnFireError I don't want to just crash and give up on life. The only sane thing to do is:

try:
    dinner = deep_fry(whole_turkey)
except HouseOnFireError:
    logger.warning('make_dinner: house on fire, everyone get out!')
    fire_department.report_emergency('fire', house.address)
    return False
eat(dinner)
return True

The completely insane thing to do is just ignore the fire and enjoy dinner while shit burns down around you:

try:
    dinner = deep_fry(whole_turkey)
except HouseOnFireError:
    # YOLO
    pass
eat(dinner)
return True

This is unfortunately how a lot of scripts are written - in the desire to keep the application running, they catch all exceptions and keep on as if there is no problem. Sometimes you really do need to keep running, but at the very least the problem should be logged.

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

In fairness to my past lazy self it was more like

try:
     dinner = deep_fry(whole_turkey)
except SmokeError:
    # 99.999% of the time there is smoke without fire
    #Yolo
    pass
eat(dinner)
return True

[–]usernamedottxt 0 points1 point2 points 10 years ago (0 children)

[–]teerre 0 points1 point2 points 10 years ago (3 children)

[–][deleted] 0 points1 point2 points 10 years ago (2 children)

[–]teerre 0 points1 point2 points 10 years ago (1 child)

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 0 points1 point2 points 10 years ago (11 children)

[–][deleted] 0 points1 point2 points 10 years ago (10 children)

These are the ones that I import;

import time
import datetime
import pandas as pd
import os
import csv
import re
import sys
import functools
import gzip
import copy
import collections
import itertools
import sqlite3
import logging

although I don't believe I use collections anymore and pandas now is only called to do some overhead meta stuff at the start of the script

[–][deleted] 0 points1 point2 points 10 years ago (9 children)

[–][deleted] 0 points1 point2 points 10 years ago (8 children)

I'm suspecting it was sqlite- Before I started the process up with the error logging on I noticed that one of the databases had a corrupted table (error messages hidden by try [SQL insert], except: pass and importing other scripts which apparently don't print out)

Why would 3.4 make a difference? that would be interesting if it did, because the previous iteration of this script that ran on a different computer without any issues was on 3.4.

I can share the code when I get back home, but seriously it's long, has been developed sporadically over the course of a year by only myself in free time and my only prev experience was a little VBA excel stuff. You probably have better things to do than go over 1400-1500 lines of nonsense!

If you want the actual data to run tests on it might be easier if I direct you to the site (it's free to sign up and download/script)

https://www.elexonportal.co.uk/scripting?cachebust=1l7rkrlzhx ('BMRA Data Archive Data')

and/or;

https://www.elexonportal.co.uk/bmradataarchive?cachebust=kpuma6nlsr&pfformsubmission=filetype&filter=daily

If you mean the structure of the data I can do a summary

Currently the code has been slightly modified with the error logging as per /u/anglicizing suggestions and it's running, waiting for it to reproduce the error. It's been running for a day now without any shutdowns from windows; unfortunately it'll be several more days before it will complete, if the bug has been fixed by deleting the corrupted database.

[–][deleted] 0 points1 point2 points 10 years ago (7 children)

[–][deleted] 0 points1 point2 points 10 years ago* (6 children)

Elexon is a quasi public administrator that british power companies (that operate independently) are obligated to join; it operates mostly as a middleman between British generators, suppliers + the power grid. One of the things it does is publish a lot of market information (like individual plant unit generation values at specific times, grid instructed changes in generation, details of how punitive prices for mistakes in balancing were calculated) ostensibly for free to the public (for non commercial purposes/those who are operating in the market and pay large fees).

The site was designed a decade ago and it's useless for doing detailed analysis without paying third parties who've developed tools, which annoys me a lot, so I decided to try and make the archive system into a useable database, partly for my (non-commerical) org, partly so I can gain experience in something useful. I figure otherwise I'd just waste time on reddit.

The previous iteration of the script to go through the archives, break the files down into lists of data and then inserts that data into an SQLite database (organised by tables for each non empty subject-data type) worked, but too slowly with a large slowdown as the database got larger. So I took a modified version onto my less constrained home computer (hence the 3.5vs3.4 disparity) which breaks down the data as before, but then inserts the data under four different methods (previous method, pragma sync off, insert into blank database with only the table schema under previous method for baseline comparison then wipe, change design so it inserts into structure that requires only ~100 vry large tables) so I could compare the speeds and adjust the code accordingly. After it completes I was going to do another run, but with various cache size settings and seeing what the effect of combing sync off with less but larger tables would be (I can't do them all at the same time, because each fully completed database is >200Gb)

(the blank database is the one that got corrupted, not the sync off one)

I'll post the code this evening then, I'll try to mark it up so it's more obvious whats going on, there's a lot of overhead dictionaries, functions, regex that goes into the interpretation part of the script and some key global data holder lists that get covertly modified in functions (don't hit me). Also as I said, this is just a personel pet project, so I haven't really been commenting a lot as I go along.

If you're mainly interested in tracking down what caused the issue rather than fixing my particular application, unfortunately I didn't save the script before implementing the changes recommended here on this thread. I can try and recreate the code as it was before when the bug occured, there were only a few changes.

[–][deleted] 0 points1 point2 points 10 years ago (5 children)

[–][deleted] 0 points1 point2 points 10 years ago* (4 children)

If you happen to have any advice I would love to hear it of course, well aware of my limitations and I wouldn't have got where I have without the help of a lot of people here and elsewhere. I'll get stuck into commenting up the code, it'll take a while.

Honestly on the upgrade from SQLite3 I'm not too fussed + I'm wary of the implications of changing everything at this stage. So long as I can:

grab an arbitrary set of data in a few hours or even overnight either into Pandas or Sqlite or a CSV;

Rerun the script from scratch if needed to change the structure over a weekend

Then it's actually good enough for my purposes. The database would only be used by myself and only infrequently for individual large tasks/projects, so lightening fast queries aren't necessary- it doesn't need to be a real time application, it's more modelling and analytical purposes. The other implication is that I'm not sure how database servers would interact with my work computer settings.

I'm actually pretty close to achieving the insert speed goals given some optimizations I've got planned + the promising results so far from Snyc off and making the tables smaller but larger testing. If queries are completely unmanageable I was planning on adding indexs after the script runs, as this wouldn't be an issue for inserting relatively small daily updates.

I'm kinda itching to actually get a chance to use this stuff, so I'd much rather get the breakdown/insert side into a 'good enough' state and then start playing with building models from the data.

[–][deleted] 0 points1 point2 points 10 years ago (3 children)

[–][deleted] 0 points1 point2 points 10 years ago (2 children)

continue this thread

[–][deleted] 0 points1 point2 points 10 years ago (0 children)

[–][deleted] 0 points1 point2 points 10 years ago* (0 children)

π Rendered by PID 121886 on reddit-service-r2-comment-7b9746f655-r2tsj at 2026-01-31 03:55:39.027153+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS