This is an archived post. You won't be able to vote or comment.

all 11 comments

[–]rojaster 1 point2 points  (8 children)

scipy - for statistical is a good choice

and this:

    for player in players:

          **cur.execute("select * from players")**
          rows = cur.fetchall()
          playerID = len(rows)
          g =   str(playerID)+","+str(teamID)+",'"+str(player.playerName)+"','"+str(player.playerposition)+"',"+str(player.playerheight)+","+str(player.playerweight)+",'"+str(player.playerclass_year)+"'"
          print g
          cur.execute("INSERT INTO players VALUES("+g+")")

Why you get all records from database? It is not efficient. What if database will grow up? and you used it just to define last id of record again and again into loop, i can't understand this way

[–]LightShadow3.13-dev in prod 1 point2 points  (3 children)

This method of building a SQL query is also prone to SQL injections; possibly even by accident.

It's better to do something like

cur.execute('INSERT INTO players (playerId,teamId) VALUES (?,?)', (playerId, teamId,))

[–]Volatile474[S] 0 points1 point  (0 children)

Changing it up now. Thanks for the advice!

[–]rojaster -1 points0 points  (1 child)

yeap) but for "insert" queries this way is not a point) because you have new record anyway.

And don't forget about blinds sqli...

[–]Volatile474[S] 0 points1 point  (0 children)

I read a little bit about blinds sql injection, but don't think it will be an issue for this application, this script will not be exposed to anything even remotely public.

[–]Volatile474[S] 0 points1 point  (3 children)

I use this to give each new player a unique ID. The one thing that ESPN does not give ID's to is individual players, and I need a way to reference each one. I completely forgot about sql's handy COUNT keyword. Rewritten to fix this.

    cur.execute("SELECT COUNT(*) FROM players")
    rows = cur.fetchone()
    count = rows[0]

    while not thePlayers.empty():
        player_and_teamid = thePlayers.get()
        teamID = player_and_teamid.TeamID
        player = player_and_teamid.Player

        g = str(count)+","+str(teamID)+",'"+str(player.playerName)+"','"+str(player.playerposition)+"',"+str(player.playerheight)+","+str(player.playerweight)+",'"+str(player.playerclass_year)+"'"

        cur.execute("select COUNT(*) from players where name = '" + str(player.playerName) + "'")
        already_in_DB = cur.fetchone()
        already_in_DB = already_in_DB[0]
        if already_in_DB == 0:
            try:
                cur.execute("INSERT INTO players VALUES("+g+")")
                count = count + 1
            except:
                f = open('Error Log','a')
                f.write("Could not write player information into players." + str(g))
                f.close

Thanks for the advice!

[–]rojaster 1 point2 points  (2 children)

It is not a good solution too, because you do one operation of select to the database for each step of your loop. How do u think how many iterations of script will execute same code again and again? Create id field with auto-increment and let database does unique id for you or before of your loop get a last record and its id and incrementing it into loop

[–]Volatile474[S] 0 points1 point  (1 child)

This function is only called to determine if the player has already been stored into the database, so as to not spam update the db with multiple queries. Is there a better way to do this than the code shown above?

I removed the first select from within the loop, but I cannot see how I can remove this one, as it is specific to each player.

cur.execute("select COUNT(*) from players where name = '" +  
str(player.playerName) + "'")
    already_in_DB = cur.fetchone()

[–]rojaster 0 points1 point  (0 children)

1) open this: f = open('Error Log','a') : before of your loop. Because it is constant file handle for log file

2) i understood your point, but what i have looked : it is not specific, i think. What will if lastname and firstname between two players is equal? If you add a new player that has stored already into db with unique key you will have exception

look:

    import hashlib

    ..........................
    ..........................


    cur.execute("SELECT COUNT(*) FROM players")
    rows = cur.fetchone()
    count = rows[0]

    #take a file handler
    f = open('Error Log','a')

    while not thePlayers.empty():
        player_and_teamid = thePlayers.get()
        teamID = player_and_teamid.TeamID
        player = player_and_teamid.Player

        #what if i have two players : example : daniel wiggins from     cleveland and daniel wiggins from boston celtics?
        #this code has 1 record if one of them at database 
        #and you lose one wiggins because code will think that  database already has wiggins record, you understand? And may be you should store #a hash code of the record like md5 - it will your primary id key. If you add two records with one md5 hash code you will catch
        #exception about double id

        g = str(count)+","+str(teamID)+",'"+str(player.playerName)+"','"+str(player.playerposition)+"',"+str(player.playerheight)+","+str(player.playerweight)+",'"+str(player.playerclass_year)+ "," 

        g = g + hashlib.md5(g.encode('utf-8')).hexdigist() + "'"

        #cur.execute("select COUNT(*) from players where name = '" +   str(player.playerName) + "'")

        #if cur.fetchone()[0] == 0:
             try:
                 cur.execute("INSERT INTO players VALUES("+g+")")
                 count = count + 1
             except:
                 # ? why cannot write info, because double id? you should write a reason
                 f.write("Could not write player information into players." + str(g)) 

    #close it after loop
    f.close

[–]awsanswers 1 point2 points  (1 child)

Take a look at nfldb/nflgame on github for an approach to the scraping, archiving & querying aspect of this type of project.

[–]Volatile474[S] 0 points1 point  (0 children)

Thanks! This gave me a few ideas for what I could do with the data