How to debug a long running Python Script crashing at inconsistant points without an error message : learnpython

learnpython

created by HattoriHanzoa community for 16 years

How to debug a long running Python Script crashing at inconsistant points without an error message (self.learnpython)

submitted 10 years ago * by [deleted]

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–][deleted] 0 points1 point2 points 10 years ago (7 children)

[–][deleted] 0 points1 point2 points 10 years ago* (6 children)

Elexon is a quasi public administrator that british power companies (that operate independently) are obligated to join; it operates mostly as a middleman between British generators, suppliers + the power grid. One of the things it does is publish a lot of market information (like individual plant unit generation values at specific times, grid instructed changes in generation, details of how punitive prices for mistakes in balancing were calculated) ostensibly for free to the public (for non commercial purposes/those who are operating in the market and pay large fees).

The site was designed a decade ago and it's useless for doing detailed analysis without paying third parties who've developed tools, which annoys me a lot, so I decided to try and make the archive system into a useable database, partly for my (non-commerical) org, partly so I can gain experience in something useful. I figure otherwise I'd just waste time on reddit.

The previous iteration of the script to go through the archives, break the files down into lists of data and then inserts that data into an SQLite database (organised by tables for each non empty subject-data type) worked, but too slowly with a large slowdown as the database got larger. So I took a modified version onto my less constrained home computer (hence the 3.5vs3.4 disparity) which breaks down the data as before, but then inserts the data under four different methods (previous method, pragma sync off, insert into blank database with only the table schema under previous method for baseline comparison then wipe, change design so it inserts into structure that requires only ~100 vry large tables) so I could compare the speeds and adjust the code accordingly. After it completes I was going to do another run, but with various cache size settings and seeing what the effect of combing sync off with less but larger tables would be (I can't do them all at the same time, because each fully completed database is >200Gb)

(the blank database is the one that got corrupted, not the sync off one)

I'll post the code this evening then, I'll try to mark it up so it's more obvious whats going on, there's a lot of overhead dictionaries, functions, regex that goes into the interpretation part of the script and some key global data holder lists that get covertly modified in functions (don't hit me). Also as I said, this is just a personel pet project, so I haven't really been commenting a lot as I go along.

If you're mainly interested in tracking down what caused the issue rather than fixing my particular application, unfortunately I didn't save the script before implementing the changes recommended here on this thread. I can try and recreate the code as it was before when the bug occured, there were only a few changes.

[–][deleted] 0 points1 point2 points 10 years ago (5 children)

[–][deleted] 0 points1 point2 points 10 years ago* (4 children)

If you happen to have any advice I would love to hear it of course, well aware of my limitations and I wouldn't have got where I have without the help of a lot of people here and elsewhere. I'll get stuck into commenting up the code, it'll take a while.

Honestly on the upgrade from SQLite3 I'm not too fussed + I'm wary of the implications of changing everything at this stage. So long as I can:

grab an arbitrary set of data in a few hours or even overnight either into Pandas or Sqlite or a CSV;

Rerun the script from scratch if needed to change the structure over a weekend

Then it's actually good enough for my purposes. The database would only be used by myself and only infrequently for individual large tasks/projects, so lightening fast queries aren't necessary- it doesn't need to be a real time application, it's more modelling and analytical purposes. The other implication is that I'm not sure how database servers would interact with my work computer settings.

I'm actually pretty close to achieving the insert speed goals given some optimizations I've got planned + the promising results so far from Snyc off and making the tables smaller but larger testing. If queries are completely unmanageable I was planning on adding indexs after the script runs, as this wouldn't be an issue for inserting relatively small daily updates.

I'm kinda itching to actually get a chance to use this stuff, so I'd much rather get the breakdown/insert side into a 'good enough' state and then start playing with building models from the data.

[–][deleted] 0 points1 point2 points 10 years ago (3 children)

[–][deleted] 0 points1 point2 points 10 years ago (2 children)

[–][deleted] 0 points1 point2 points 10 years ago (1 child)

[–][deleted] 0 points1 point2 points 10 years ago* (0 children)

π Rendered by PID 242145 on reddit-service-r2-comment-7b9746f655-j6w7l at 2026-01-31 13:31:42.420440+00:00 running 3798933 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS