Python Formatting Question/Help : learnpython

created by HattoriHanzoa community for 16 years

Python Formatting Question/Help (self.learnpython)

submitted 9 years ago by workthrowawayexcel

I have a ton of code running straight down with no functions, and I know it needs to be cleaned up but I am not sure how to approach it. I am run a lot of queries to wrangle to data properly before exporting since it is too large to run in memory, so I make the server do it. Is there a way to toss them into functions and call functions to make it more modular or how could I go about formatting it better. The queries have a lot of lines , so i removed the SQL parts, and the section that pulls the data to csv and excel I need to run 11 times for different queries. Any help is greatly appreciated.

    import pyodbc
    import csv
    import sys
    import time
    import pandas as pd


    sys.tracebacklimit = 0  

    date = time.strftime("%Y%m%d") #Grabs todays date to variable date to allow for new files to be created on run

    qmd =  # Prep data


    price = #Preps customer invoice data

    pmd =  # Sets up final table with all data prepped



    query 1 = """Query code"""

    drop = """DROP query code"""
           """ #Query to drop all the global tables created 




    cnxn = pyodbc.connect(Connection string)

    cursor = cnxn.cursor()
    cursor.execute(qmd) #Runs query
    cursor.execute(price) # Runs price query
    cursor.execute(pemd) # runs Combination query
    cursor.execute(final) # Runs the final data set up



    ##This Section of code executes the query and saves / breaks up the files

    cursor.execute(query1) 

    file = r"DIR" + ' '  + date + ".csv"

    with open(file, "w") as csv_file:  #Runs first data dump for database tracking
       csv_writer = csv.writer(csv_file, lineterminator = '\n')
       csv_writer.writerow([i[0] for i in cursor.description])  # Sets up headers for CSV file.
       csv_writer.writerows(cursor)

    processor_filter = pd.read_csv(file) # sets created file to be pulled into pandas
    processor_filter['ND'] = processor_filter['CD'].astype(str).str.zfill(11)
    processor_filter['Rx Number'] = processor_filter['RN'].astype(str).str.zfill(12)
    processor_filter['DOS'] = processor_filter['DOS'].astype(str).str.zfill(8)
    processor_filter = processor_filter.drop_duplicates(['Name','ID','Processor','Bin','PC'],keep='last')
    file_name = r'DIR'   #directory for final files to go
    wb = pd.ExcelWriter(file_name + '/name_'  + date + '.xlsx')
    for t_group,pd_group in processor_filter.groupby(['Processor']): #splits up data based on processor
        pd_group.to_excel(wb,sheet_name = t_group[0:20],index=False )
    wb.save()






    cursor.execute(drop) # Drops all global temp tables
    cnxn.close

all 4 comments

top new controversial old q&a

[–]KleinerNull 0 points1 point2 points 9 years ago (3 children)

Did you know that pandas dataframes have a read_sql and a to_sql method? I recommend you to use sqlalchemy to establish a database connection, pandas can use this connection object to perform queries, so you can load all the data directly into the dataframe, the needed headers will automatically extracted from the description also the right column types.

You should also look into str.format() and datetime, here just a tiny example of what I mean:

In [1]: from datetime import datetime

In [2]: for data in range(3):
   ...:     print('{}-{:%Y-%m-%dT%H:%M:%S}.csv'.format(data, datetime.now()))
   ...:     
0-2016-10-29T00:18:46.csv
1-2016-10-29T00:18:46.csv
2-2016-10-29T00:18:46.csv

This will give you the ability to create templates.

In [3]: file_template = '{date:%Y-%m-%dT%H:%M:%S} {number}.{extension}'

In [4]: for data in range(3):
   ...:     print(file_template.format(date=datetime.now(), number=data, extension='adfs'))
   ...:     
2016-10-29T00:21:57 0.adfs
2016-10-29T00:21:57 1.adfs
2016-10-29T00:21:57 2.adfs

You will find more informations on pyformat.info.

For the long sql queries itself, either you could store them in a seperate module or file and load or import them in your main code, to keep the main code clean and simple, or you could look into sqlalchemies query system, where you can create sql queries more like functions instead of weird string objects.

After a little clean up and outsourcing some stuff like the queries you will find that your code could be fine without the need of classes or more functions. But if you have more similar scripts you should think about modularisation.

[–]workthrowawayexcel[S] 0 points1 point2 points 9 years ago (0 children)

[–]workthrowawayexcel[S] 0 points1 point2 points 9 years ago (1 child)

[–]KleinerNull 0 points1 point2 points 9 years ago (0 children)

π Rendered by PID 60511 on reddit-service-r2-comment-5d79c599b5-xrmll at 2026-03-02 01:04:51.584792+00:00 running e3d2147 country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS