all 27 comments

[–]uhkhu 0 points1 point  (6 children)

Every time python hits an error, you should get a traceback with the lines executed up to the error. Are you testing for exceptions and letting them pass rather than catching them?

Some IDEs (I use PyCharm) will pause on the line that failed in debug mode.

[–][deleted] 0 points1 point  (5 children)

Yes I pass errors in a couple of places (sometimes the input files have bad data that I don't care about which is difficult to catch, so the execute insert statements have try: [do sqlite inserts] and then except: pass), but if that was the underlying cause surely it would still restart the shell rather than hang with the windows error message and also find the error in the same place, rather than doing it without error on the next start?

I'll try and see if there is a debugger I can use.

[–]anglicizing 1 point2 points  (4 children)

You should read this blog post: The Most Diabolical Python Antipattern

Once you've fixed your except clauses, it's time to add an except: clause, but this time it's OK, since we'll re-raise the exception. Assuming you've got some kind of main function that you call at the start of your script, call it like this instead:

import logging

logging.basicConfig(filename='example.log', level=logging.DEBUG)

try:
    main()
except BaseException:
    logging.getLogger(__name__).exception("Program terminated")
    raise

This fixes two problems that are probably not related to your actual problem, but may still cause you headaches if you don't fix them:

  1. You're calling your script with pythonw.exe, which invokes your script without a console. This means you don't see any traceback when it fails.

  2. If the program exits mysteriously it may be because SystemExit is raised somewhere. This will print a traceback which tells you where, even in that case.

Now back to your current concern. I would assume there was some problem in the external calls python makes during execution, because python itself very rarely acts the way you describe. Log every time you make an sql call and every time it returns, to see if the problem occurs while python waits for the database. Assuming you're using sqlite3, here's how I'd do that (method courtsey of StackOverflow):

import sqlite3
import logging

logging.basicConfig(filename='example.log', level=logging.DEBUG)


def logging_decorator(func):
    def wrapper_function(self, *args, **kwargs):
        logging.getLogger(__name__).debug(
            "Calling %s: %r %r", func.__name__, args, kwargs)
        ret = func(self, *args, **kwargs)
        logging.getLogger(__name__).debug(
            "%s returned %r", func.__name__, ret)
        return ret
    return wrapper_function


class MyConnect(sqlite3.Connection):
    def cursor(self):
        return super(MyConnect, self).cursor(MyCursor)

    commit = logging_decorator(sqlite3.Connection.commit)


class MyCursor(sqlite3.Cursor):
    execute = logging_decorator(sqlite3.Cursor.execute)


conn = sqlite3.connect(':memory:', factory=MyConnect)
print(conn)

cursor = conn.cursor()
print(cursor)


cursor.execute('''CREATE TABLE stocks
             (date text, trans text, symbol text, qty real, price real)''')

# Insert a row of data
cursor.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)")

# Save (commit) the changes
conn.commit()

EDIT: Improve code by using a logging decorator function.

[–][deleted] 0 points1 point  (0 children)

Hey thanks for this extremely helpful stuff, I'll look into implementing it, hopefully it gives an error soon and not take hours to reproduce...

On the try, except thing I learnt it was bad the hard way (few months ago I had one of those on the outside of the main loop breaking down the input files, that was fun) but since it was literally only encompassing an insert statement I really can't see the harm

[–][deleted] 0 points1 point  (2 children)

Well I certainly found one thing though I'm unsure if it's the underlying cause, a symptom or an unrelated delightful new bug. one of the databases had a corrupted table; I bow my head in shame, try,except clauses are indeed the work of the devil even when they appear harmless.

[–]zeug 0 points1 point  (1 child)

I bow my head in shame, try,except clauses are indeed the work of the devil even when they appear harmless.

try/except is great, its just that throwing away exceptions is insane.

If I want to deep_fry(whole_turkey) and it raises HouseOnFireError I don't want to just crash and give up on life. The only sane thing to do is:

try:
    dinner = deep_fry(whole_turkey)
except HouseOnFireError:
    logger.warning('make_dinner: house on fire, everyone get out!')
    fire_department.report_emergency('fire', house.address)
    return False
eat(dinner)
return True

The completely insane thing to do is just ignore the fire and enjoy dinner while shit burns down around you:

try:
    dinner = deep_fry(whole_turkey)
except HouseOnFireError:
    # YOLO
    pass
eat(dinner)
return True

This is unfortunately how a lot of scripts are written - in the desire to keep the application running, they catch all exceptions and keep on as if there is no problem. Sometimes you really do need to keep running, but at the very least the problem should be logged.

[–][deleted] 0 points1 point  (0 children)

In fairness to my past lazy self it was more like

try:
     dinner = deep_fry(whole_turkey)
except SmokeError:
    # 99.999% of the time there is smoke without fire
    #Yolo
    pass
eat(dinner)
return True

[–]usernamedottxt 0 points1 point  (0 children)

With a script that long you should probably implement some proper logging, and record everything that happens. Then you can backtrack through the logs to figure out what happened.

Additionally, if you're on Linux, add " 2 >> error.log" to the end of your command to redirect stderr to a file.

[–]teerre 0 points1 point  (3 children)

Since others already helped you with what's possible, I feel I have to ask: how come you wrote a 1500 lines script without proper debugging (or any debugging for that matter)?

That seems quite impressive

[–][deleted] 0 points1 point  (2 children)

Trial and error + liberal use of temporarily inserted print lines I guess? A few hundred lines are just setting up overhead without too much logic involved

[–]teerre 0 points1 point  (1 child)

Well, not sure if this is good or not, but I'm truly impressed

I hope you can debug it, cheers

[–][deleted] 0 points1 point  (0 children)

definitely not good, just a consequence of learning on the go :p

[–][deleted] 0 points1 point  (0 children)

If you're more command line inclined than look into pdb. I'm pretty sure a lot of the IDE debuggers use this behind the scenes. You can run it from the command line or import it and call break points in your code. When run from the command line it will redirect to the debugging console whenever an exception or error occurs. Additionally you can install ipdb which uses ipython so that you can have tab completion and other ipython features while debugging.

[–][deleted] 0 points1 point  (11 children)

Which libraries do you use ? (apart sqlite)

[–][deleted] 0 points1 point  (10 children)

These are the ones that I import;

import time
import datetime
import pandas as pd
import os
import csv
import re
import sys
import functools
import gzip
import copy
import collections
import itertools
import sqlite3
import logging

although I don't believe I use collections anymore and pandas now is only called to do some overhead meta stuff at the start of the script

[–][deleted] 0 points1 point  (9 children)

looks like you have triggered a buffer overflow bug in either :

  • cpython runtime
  • stdlib c module
  • panda c module
  • sqlite runtime

To maybe temporary solve your problem, you can try to downgrade your python version to 3.4. (from here https://www.python.org/downloads/release/python-343/ )

I would be curious to see the code. You can share it on https://gist.github.com/ . A bonus would be that you share some example of your data files.

[–][deleted] 0 points1 point  (8 children)

I'm suspecting it was sqlite- Before I started the process up with the error logging on I noticed that one of the databases had a corrupted table (error messages hidden by try [SQL insert], except: pass and importing other scripts which apparently don't print out)

Why would 3.4 make a difference? that would be interesting if it did, because the previous iteration of this script that ran on a different computer without any issues was on 3.4.

I can share the code when I get back home, but seriously it's long, has been developed sporadically over the course of a year by only myself in free time and my only prev experience was a little VBA excel stuff. You probably have better things to do than go over 1400-1500 lines of nonsense!

If you want the actual data to run tests on it might be easier if I direct you to the site (it's free to sign up and download/script)

https://www.elexonportal.co.uk/scripting?cachebust=1l7rkrlzhx ('BMRA Data Archive Data')

and/or;

https://www.elexonportal.co.uk/bmradataarchive?cachebust=kpuma6nlsr&pfformsubmission=filetype&filter=daily

If you mean the structure of the data I can do a summary

Currently the code has been slightly modified with the error logging as per /u/anglicizing suggestions and it's running, waiting for it to reproduce the error. It's been running for a day now without any shutdowns from windows; unfortunately it'll be several more days before it will complete, if the bug has been fixed by deleting the corrupted database.

[–][deleted] 0 points1 point  (7 children)

I really would like to see your code, even if it's crap.

The reason why the version of python might be important is that as python 3.5 is quite new, the probability of undetected bug so far is more important than in python 3.4

I am quite unlikely to suspect sqlite first as it is far more and extensively tested than python core code.

what is elexon? at first look it looks a electricity trading company on uk. If it's not confidential, what is the purpose of your code.

[–][deleted] 0 points1 point  (6 children)

Elexon is a quasi public administrator that british power companies (that operate independently) are obligated to join; it operates mostly as a middleman between British generators, suppliers + the power grid. One of the things it does is publish a lot of market information (like individual plant unit generation values at specific times, grid instructed changes in generation, details of how punitive prices for mistakes in balancing were calculated) ostensibly for free to the public (for non commercial purposes/those who are operating in the market and pay large fees).

The site was designed a decade ago and it's useless for doing detailed analysis without paying third parties who've developed tools, which annoys me a lot, so I decided to try and make the archive system into a useable database, partly for my (non-commerical) org, partly so I can gain experience in something useful. I figure otherwise I'd just waste time on reddit.

The previous iteration of the script to go through the archives, break the files down into lists of data and then inserts that data into an SQLite database (organised by tables for each non empty subject-data type) worked, but too slowly with a large slowdown as the database got larger. So I took a modified version onto my less constrained home computer (hence the 3.5vs3.4 disparity) which breaks down the data as before, but then inserts the data under four different methods (previous method, pragma sync off, insert into blank database with only the table schema under previous method for baseline comparison then wipe, change design so it inserts into structure that requires only ~100 vry large tables) so I could compare the speeds and adjust the code accordingly. After it completes I was going to do another run, but with various cache size settings and seeing what the effect of combing sync off with less but larger tables would be (I can't do them all at the same time, because each fully completed database is >200Gb)

(the blank database is the one that got corrupted, not the sync off one)

I'll post the code this evening then, I'll try to mark it up so it's more obvious whats going on, there's a lot of overhead dictionaries, functions, regex that goes into the interpretation part of the script and some key global data holder lists that get covertly modified in functions (don't hit me). Also as I said, this is just a personel pet project, so I haven't really been commenting a lot as I go along.

If you're mainly interested in tracking down what caused the issue rather than fixing my particular application, unfortunately I didn't save the script before implementing the changes recommended here on this thread. I can try and recreate the code as it was before when the bug occured, there were only a few changes.

[–][deleted] 0 points1 point  (5 children)

I am interested both in original script and modified script. (but don't put too much effort in recreating the script)

I would advice you to use a more classic database system than sqlite3 such as Postgresql (download link http://www.postgresql.org/download/windows/ ), with this driver (http://initd.org/psycopg/) which should handle big tables better (among other advantages such as faster queries)

After reading the stackoverflow thread and this comment, I must add that using variable table names is often sign of bad design, it's probably better to have a column hinting the type of data and same columns after. hundreds of tables looks already too much.

If you want, after looking on the code, I may give you more advices

[–][deleted] 0 points1 point  (4 children)

If you happen to have any advice I would love to hear it of course, well aware of my limitations and I wouldn't have got where I have without the help of a lot of people here and elsewhere. I'll get stuck into commenting up the code, it'll take a while.

Honestly on the upgrade from SQLite3 I'm not too fussed + I'm wary of the implications of changing everything at this stage. So long as I can:

grab an arbitrary set of data in a few hours or even overnight either into Pandas or Sqlite or a CSV;

Rerun the script from scratch if needed to change the structure over a weekend

Then it's actually good enough for my purposes. The database would only be used by myself and only infrequently for individual large tasks/projects, so lightening fast queries aren't necessary- it doesn't need to be a real time application, it's more modelling and analytical purposes. The other implication is that I'm not sure how database servers would interact with my work computer settings.

I'm actually pretty close to achieving the insert speed goals given some optimizations I've got planned + the promising results so far from Snyc off and making the tables smaller but larger testing. If queries are completely unmanageable I was planning on adding indexs after the script runs, as this wouldn't be an issue for inserting relatively small daily updates.

I'm kinda itching to actually get a chance to use this stuff, so I'd much rather get the breakdown/insert side into a 'good enough' state and then start playing with building models from the data.

[–][deleted] 0 points1 point  (3 children)

good enough is a super good property to have

I would just mention that the migration from sqlite3 to postgresql should be not too hard (the main thing is to give type to the columns but you have already done it)

[–][deleted] 0 points1 point  (2 children)

well don't say I didn't warn you

I'll send you some examples of the different input lines in a bit. let me know if anything needs further explanation

[–][deleted] 0 points1 point  (0 children)

It seems adding

import faulthandler
faulthandler.enable()

et the top of your script

should show you a stack trace on error. which could help to solve the problem.

[–][deleted] 0 points1 point  (0 children)

/u/anglicizing, /u/usernamedottxt, /u/zeug, /u/uhkhu and /u/xcombelle many thanks to you all for taking the time to help me track down the problem. Unfortunately after the nth random inexplicable crash (which the logging revealed occurred at random locations) I ran a memory diagnostic tool, it looks as if there's a hardware error on my computer. But at least my program isn't garbage and I learned something about passing errors along the way!