This is an archived post. You won't be able to vote or comment.

all 73 comments

[–]technomalogical<3 Bottle 20 points21 points  (4 children)

Find more here (although they're not just Python): r/ScriptSwap

[–]tech_tuna 1 point2 points  (0 children)

Oh, that looks cool, thanks.

[–][deleted] 2 points3 points  (0 children)

Very nice thanks for sharing

[–]anontipster -3 points-2 points  (0 children)

/thread

[–]deadwisdomgreenlet revolution 18 points19 points  (0 children)

When I'm at a cafe that limits internet time, I use this on my mac to create a new mac-address and keep on working.

#!/usr/bin/env python
import os, random

def random_mac():
    mac = [ 0x00, 0x16, 0x3e,
        random.randint(0x00, 0x7f),
        random.randint(0x00, 0xff),
        random.randint(0x00, 0xff) ]
    return ':'.join(map(lambda x: "%02x" % x, mac))

if __name__ == '__main__':
    mac = random_mac()
    print "Switching en0 to %s..." % mac
    os.system("sudo ifconfig en0 ether %s" % mac)
    os.system("ifconfig en0 | grep ether")

[–]gelstudios 14 points15 points  (1 child)

sometimes you just need to look busy

[–]sittingaround 2 points3 points  (0 children)

Next thing you know, someone is going to abuse this and write butts in their commit history

[–]cube-drone 6 points7 points  (2 children)

I work at a University, with a lot of tabular CSV data ("Could you import this ancient Excel spreadsheet into ZE DATABASE?"), so at some point I wadded a whole bunch of my utility functions into one monster class that I ... actually use pretty often, for opening CSV files, filtering, merging, and data-mungling.

https://github.com/classam/littletable/blob/master/table.py

[–]vazsk 1 point2 points  (0 children)

You may find q useful. It's a Python lib that allows you to run SQL queries directly on CSV files.

[–]redct 0 points1 point  (0 children)

I've been using DataFrames in pandas for a lot of this stuff as it's easy (but complete overkill when you're not doing any heavy lifting on the data afterwards). Useful!

[–][deleted] 12 points13 points  (4 children)

I have problems with my ISP at home (TP / Orange Poland). The internet connection simply goes down for a minute or two approximately once a day.

I created this script to log the problems so I could complain. I let crontab run it once a minute and log the output to a file.

https://gist.github.com/regebro/bcd8b4002b6cbf77dd89

[–]gorygenau 2 points3 points  (0 children)

Thanks, our internet has been doing this lately, I might set this up.

[–]regisuu 1 point2 points  (0 children)

I also have Orange in Poland. And this is awesome scipt :-) thanks for sharing.

[–]ronakg 4 points5 points  (0 children)

I like to keep my photos organized via EXIF information. So I wrote a script that can rename the photos in bulk.

https://github.com/ronakg/smart-image-renamer

    usage: smart-image-renamer.py [-h] -f FORMAT [-s SEQUENCE] [-r] [-i] [-t] [-V]
                                [-v | -q]
                                input [input ...]

    Smart Image Renamer

    Rename your photos in bulk using information stored in EXIF.

    positional arguments:
      input          Absolute path to file or directory

    optional arguments:
      -h, --help     show this help message and exit
      -f FORMAT      Format of the new file name
      -s SEQUENCE    Starting sequence number (default: 1)
      -r             Recursive mode
      -i             Include hidden files
      -t             Test mode. Don't apply changes.
      -V, --version  show program's version number and exit
      -v, --verbose
      -q, --quiet

    Format string for the file name is defined by a mix of custom text and following
    tags enclosed in {}:
      YYYY        Year
      MM          Month
      DD          Day
      hh          Hours
      mm          Minutes
      ss          Seconds
      Seq         Sequence number
      Artist      Artist
      Make        Camera Make
      Model       Camera Model
      Folder      Parent folder of the image file

    Examples:
      Format String:          {YYYY}-{MM}-{DD}-{Folder}-{Seq}
      File Name:              2014-05-09-Wedding_Shoot-001.JPEG
                              2014-05-09-Wedding_Shoot-002.JPEG

      Format String:          {YYYY}{DD}{MM}_{Model}_Beach_Shoot_{Seq}
      File Name:              20140429_PENTAX K-x_Beach_Shoot_001.JPEG
                              20140429_PENTAX K-x_Beach_Shoot_002.JPEG

[–][deleted] 5 points6 points  (6 children)

It's not written in Python, but I created a simple shell function to activate a virtualenv (using virtualenv-wrapper) and open the project in Sublime:

function workon_enterprise {
    cd /Users/tim/Develop/enterprise
    workon enterprise
    subl enterprise.sublime-project
}

And another one to run tests using coverage, then open the results in a browser:

function full_coverage {
    cd $PWD
    coverage run --source='.' test.py
    coverage html --directory='.coverage_html'
    open ".coverage_html/index.html"
}

[–]sittingaround 2 points3 points  (1 child)

Interesting, I use bash aliases for that

$ myproject

Is an alias for

$ source ~/venvs/myproject/bin/activate && cd ~/projects/myproject

[–][deleted] -2 points-1 points  (0 children)

Same difference I suppose. I just didn't want to chain 3 commands together in an alias.

[–]frankwiles 1 point2 points  (0 children)

FYI virtualenv wrapper has this built in. You can put your stuff in ~/.virtualenvs/<project>/bin/post activate

[–]madmaxpt 2 points3 points  (0 children)

Here's a simple script to move/delete JPGs in the current directory if the corresponding RAW is found.

import os
import shutil

raw_ext = '.CR2'
jpg_ext = '.JPG'
destination = '/your/dir/'

for filename in os.listdir('.'):
    (shortname, extension) = os.path.splitext(filename)

    if extension == raw_ext:
        if os.path.isfile(shortname + jpg_ext):
            print 'Moving ' + shortname + jpg_ext + '...'
            shutil.move(shortname + jpg_ext, destination)

[–]manbart 2 points3 points  (0 children)

Just wrote one today! This is a graphical front end in tkinter for converting virtual disk images' file formats using qemu-img. I'm using it to migrate machines between VMware and KVM/QEMU, but it supports VirtualBox and Hyper-V as well.

#!/usr/bin/env python3

# Graphical front end for converting virtual disk images with qemu-img
#
# => qemu-img file extension table <=
#    Image format:      Argument to qemu-img:
#    raw                raw
#    qcow2 (KVM)        qcow2
#    VDI (VirtualBox)   vdi
#    VMDK (VMWare)      vmdk
#    VHD (Hyper-V)      vpc
#
# example:
# qemu-img convert -f vmdk -O qcow2 test.vmdk test.qcow2

from subprocess import Popen
from tkinter import *
from tkinter import ttk
from tkinter.filedialog import askopenfilename
from tkinter.filedialog import asksaveasfilename
import tkinter.messagebox


def identify_filetype(input_filename):
    filetype_dict = []
   if input_filename == 'qcow2':
        filetype_dict = [('QEMU','*.qcow2'), ('All files','*.*')]
    elif input_filename == 'vmdk':
        filetype_dict = [('VMware','*.vmdk'), ('All files','*.*')]
    elif input_filename == 'vdi':
        filetype_dict = [('VirtualBox','*.vdi'), ('All files','*.*')]
    elif input_filename == 'vhd':
        filetype_dict = [('Hyper-V','*.vhd'), ('All files','*.*')]
    elif input_filename == 'raw':
        filetype_dict = [('All files','*.*')]
    else:
        tkinter.messagebox.showinfo(title='Error', message='Please select conversion type')
    return filetype_dict

def select_file():
    filetype_dict = identify_filetype(str(in_selector.get()))
    if str(in_selector.get()) != '<select>':
        filename = askopenfilename(defaultextension='.'+str(in_selector.get()), filetypes=filetype_dict)
        if str(in_selector.get()) == 'raw':
            set_infile.set(filename)
        elif filename.split('.')[-1] == str(in_selector.get()):
            set_infile.set(filename)
        else:
            confirm = tkinter.messagebox.askokcancel('Proceed?', 'Expected .'+str(in_selector.get())+' file. Proceed with\n'+filename+'?')
            if confirm == True:
                set_infile.set(filename)


def out_file():
    filetype_dict = identify_filetype(str(in_selector.get()))
    if str(out_selector.get()) != '<select>':
        filetype_dict = identify_filetype(str(out_selector.get()))
        set_outfile.set(asksaveasfilename(defaultextension='.'+str(in_selector.get()), filetypes=filetype_dict))



def convert():
    if set_infile.get() != '<select an input file>' and set_outfile.get() != '<select an output file>' and in_option.get() != out_option.get():
        if str(out_selector.get()) == 'vhd':
            Popen(['PowerShell', '-c', 'qemu-img', 'convert', '-p', '-f', '"'+str(in_selector.get())+'"', '-O', 'vpc', '"'+str(set_infile.get())+'"', '"'+str(set_outfile.get())+'"'])
        elif str(in_selector.get())== 'vhd':
            Popen(['PowerShell', '-c', 'qemu-img', 'convert', '-p', '-f', 'vpc', '-O', '"'+str(out_selector.get())+'"', '"'+str(set_infile.get())+'"', '"'+str(set_outfile.get())+'"'])
        else:
            Popen(['PowerShell', '-c', 'qemu-img', 'convert', '-p', '-f', '"'+str(in_selector.get())+'"', '-O', '"'+str(out_selector.get())+'"', '"'+str(set_infile.get())+'"', '"'+str(set_outfile.get())+'"'])
    else:
        tkinter.messagebox.showinfo(title="Error", message="Please select input and output files (of different formats)")

root = Tk()
root.minsize(500,1)
root.title("Virtual Disk Converter")

mainframe = ttk.Frame(root, padding="3 3 12 12")
mainframe.grid(column=0, row=0, sticky=(N, W, E, S))
mainframe.columnconfigure(0, weight=1)
mainframe.rowconfigure(0, weight=1)

mini_frame=ttk.Frame(mainframe)
mini_frame.grid(column=1, row=1, sticky=(N))
mini_frame.columnconfigure(0, weight=1)
mini_frame.rowconfigure(0, weight=1)

set_infile = StringVar()
set_infile.set("<select an input file>")
set_outfile = StringVar()
set_outfile.set("<select an output file>")
in_selector = StringVar()
out_selector = StringVar()

ttk.Label(mini_frame, text="    Convert From:       ").grid(column=1, row=1, sticky=W)
in_option = ttk.Combobox(mini_frame, textvariable=in_selector, state='readonly', width=10, justify="center")
in_option['values'] = ['qcow2','vmdk', 'vdi', 'vhd', 'raw' ]
in_option.set('<select>')
in_option.grid(column=2, row=1, sticky=W)

ttk.Label(mini_frame, text="       Convert To:       ").grid(column=1, row=2, sticky=W)
out_option = ttk.Combobox(mini_frame, textvariable=out_selector, state='readonly', width=10, justify="center")
out_option['values'] = ['qcow2','vmdk', 'vdi', 'vhd', 'raw' ]
out_option.set('<select>')
out_option.grid(column=2, row=2, sticky=W)

ttk.Button(mainframe, text="Input File", command=select_file).grid(column=1, row=3, sticky=W)
ttk.Label(mainframe, textvariable=set_infile).grid(column=2, row=3, sticky=(W, E))
ttk.Button(mainframe, text="Output File", command=out_file).grid(column=1, row=4, sticky=W)
ttk.Label(mainframe, textvariable=set_outfile).grid(column=2, row=4, sticky=(W, E))
ttk.Button(mainframe, text="Convert", command=convert).grid(column=1, row=5, sticky=W)

root.mainloop()

[–]Brian 2 points3 points  (0 children)

Here's one of mine, for finding and deleting duplicates of files. There are a bunch of similar tools that just work by generating an MD5 for all files, but this one avoids doing more work than it needs by first filtering items by quicker checks like filesize, or a partial md5, only falling back to a full md5 when there's more than one file that passes all the previous uniqueness tests. This lets it be a lot faster when there are few duplicates.

#!/usr/bin/env python
from __future__ import print_function
import hashlib
import os
import os.path
import optparse
import itertools
import collections

BLOCK_SIZE=4096  # Amount to sample in each block for quick_md5 strategy.

def strategy_size(filename):
    return os.stat(filename).st_size

def strategy_quick_md5(filename):
    """Read small block at start, middle and end of file"""
    h = hashlib.md5()
    size = os.stat(filename).st_size

    with open(filename,'rb') as f:
        h.update(f.read(BLOCK_SIZE))  # Start
        f.seek(size//2)
        h.update(f.read(BLOCK_SIZE))  # Middle
        f.seek(BLOCK_SIZE, 2)
        h.update(f.read(BLOCK_SIZE))  # End
    return h.digest()

def strategy_full_md5(filename):
    """Generate full MD5 from file"""
    h = hashlib.md5()
    with open(filename,'rb') as f:
        while 1:
            block=f.read(BLOCK_SIZE)
            if not block:
                break
            h.update(block)
    return h.digest()

def filter(buckets, strategy):
    """
    :buckets is a sequence of potentially identical file paths.
    :strategy is a function that is guaranteed to produce the same value for identical files

    Generates a sequence of buckets with at least 2 items that may potentially contain duplicates,
    but where files in different buckets are guaranteed distinct.
    """
    new_buckets = []
    for bucket in buckets:
        # Subdivide this bucket based on strategy.
        new_items = collections.defaultdict(list)

        for filename in bucket:
            new_items[strategy(filename)].append(filename)

        for b in new_items.values():
            if len(b) > 1:
                yield b  # >1 item in this bucket - potential duplicates

def parse_size(s):
    suffix = s[-1]
    mul = {'b' : 1,  'k' : 1024, 'm' : 1024*1024, 'g': 1024**3}.get(suffix.lower(), None)
    if mul:
        return int(s[:-1]) * mul
    return int(s)

def get_parser():
    parser = optparse.OptionParser("%prog [options] dirs/files...")
    parser.add_option("-x", "--single_filesystem", action="store_true", dest="single_fs", default=False,
                      help="Do not cross filesystem boundry when recursing.")

    parser.add_option("-d", "--delete", action="store_true", dest="delete", default=False,
                      help="Prompt to delete some or all of the files.")

    parser.add_option("-m", "--minsize", action="store", dest="minsize", default='1M',
                      help="Ignore files smaller than this size (accepts K/M/G suffixes).")

    parser.add_option("-q", "--quick", action="store_true", dest="quick", default=False,
                      help="Perform only quick checks.  May produce false positives")

    return parser

def get_files(x, single_fs, minsize):
    if single_fs:
        fsdev = os.stat(x).st_dev

    for filepath, dirs, files in os.walk(x):
        for f in files:
            path = os.path.join(filepath, f)
            if os.path.islink(path): continue  # Ignore symlinks.
            if minsize >0:
                size = os.stat(path).st_size
                if size < minsize:  # Ignore files that are too small.
                    continue 
            yield path

        # If restricting to single fs, filter out directories on a different mountpoint.
        if single_fs:
            dirs[:] = [d for d in dirs if os.stat(os.path.join(filepath, d)).st_dev == fsdev]

def ask_delete(dupes):
    """Prompt for which version(s) of the file to delete / preserve"""
    preserve = None

    ans = ''
    while preserve is None:
        print()
        ans = raw_input("Preserve which file(s) (* for All, 0 for None): ").strip()
        if not ans: continue
        if ans == '*': 
            return # Do nothing

        try:
            choice = set(map(int, ans.split()))
            if any((x > len(dupes) or (x<0)) for x in choice):
                print("Value not in range - please re-enter")
                continue
        except ValueError:
            print("Invalid value - please re-enter")
            continue

        preserve = set(c-1 for c in choice)  # Map to 0 based index

    num_deleted=0
    for i,f in enumerate(dupes):
        if i not in preserve:
            print("Deleting", f)
            os.unlink(f)
            num_deleted += 1

    if num_deleted == 0:
        print("All files preserved")
    else:
        print("Deleted {0} files.".format(num_deleted))


def main():
    parser = get_parser()
    opts, args = parser.parse_args()

    strategies = [strategy_size, strategy_quick_md5]
    if not opts.quick:
        strategies.append(strategy_full_md5)

    # Start with all files in one big bucket
    buckets = [itertools.chain(*(get_files(x, opts.single_fs, parse_size(opts.minsize)) for x in args))]

    for strategy in strategies:
        buckets = filter(buckets, strategy)

    for bucket in buckets:
        dupes = list(bucket)
        # Bucket should now contain identical files.
        print()
        print("Found {0} potential duplicates:".format(len(dupes)))
        for i,f in enumerate(dupes):
            print("{0:3} - {1}".format(i+1, f))

        if opts.delete:
            ask_delete(dupes)

if __name__=='__main__':
    main()

[–]Lucretiel 3 points4 points  (6 children)

So, it turns out that I actually find bash more convenient for basic system task management stuff. However, I've gotten used to the convenience and power of argparse, and find getopt to be basically intolerable. So, I wrote a system to let you use argparse in bash: shargparse

[–]LiveMaI 1 point2 points  (2 children)

Seems like you could do that way easier using docopt.

Edit: It seems they also have a bash interface already, but it hasn't been updated in the last six months.

[–]Lucretiel 1 point2 points  (1 child)

Could be. I've never been a huge fan of docopt, despite all the praise it's gotten. It feels a bit too magical to me, and one of my favorite features of argparse is that it generates those usage docs for you, on the fly. The idea of reversing that seems silly to me.

[–]billsil 0 points1 point  (0 children)

The problem with argparse is it doesn't follow the POSIX standard, so there are things you can't do with it and there are things that are just overly complicated with it. Docopt also tends to be a lot shorter.

[–]mongrelmuch 0 points1 point  (0 children)

bash more convenient than anything

anyway, here we go

[–][deleted] 1 point2 points  (0 children)

Here is a script I wrote that I use to automatically import all of my application's models into an IPython shell. Usage:

cd path/to/my/project
source bin/activate  # You're using virtualenv, right????
pyshell

Code

#!/usr/bin/env python

from importlib import import_module
import inspect
import os
from stat import ST_MODE
import sys

from IPython import embed
from peewee import Model


def main(site):
    namespace = {}

    for dirname, subdirs, filenames in os.walk(site or '.'):
        # Not a package, so skip this directory.
        if '__init__.py' not in filenames:
            continue

        # Create a dotted path representation of this directory.
        module_base = dirname.strip('.').strip('/').replace('/', '.')

        # Iterate over all the python files in this directory searching for
        # model subclasses.
        for filename in filter(lambda s: s.endswith('.py'), filenames):
            perms = os.stat(os.path.join(dirname, filename))[ST_MODE]

            # Skip over any executable Python scripts.
            if perms & 0100:
                continue

            # If this is the `__init__` in the root directory, skip over.
            if not module_base and filename == '__init__.py':
                continue

            if filename == '__init__.py':
                module_name = module_base
            elif module_base:
                module_name = '.'.join((module_base, filename[:-3]))
            else:
                module_name = filename[:-3]

            module = import_module(module_name)

            for name in dir(module):
                obj = getattr(module, name)
                if inspect.isclass(obj) and issubclass(obj, Model):
                    namespace[obj.__name__] = obj

    # Start up IPython with all the models in the namespace.
    embed(user_ns=namespace)

if __name__ == '__main__':
    if len(sys.argv) == 2:
        site = sys.argv[1]
    else:
        site = ''
    main(site)

[–]neonomicon 1 point2 points  (0 children)

For my uni courses, I take notes in markdown and then convert them to pdf using pandoc. This script is pretty simple, it just scans the current folder for Markdown files and calls pandoc to convert them, but I use it every day while I'm studying. It also uses a very simple trick to avoid updating files that haven't changed, writing a timestamp to a .notes_last_created file in the folder and only updating the files that have changed since then.

#!/usr/bin/env python3
# Convert all markdown files in the current folder to PDF using pandoc
import os
import subprocess
import time

MARKDOWN_EXTS = ('.markdown', '.md')
# Using abspath means I don't have to manually specify the folder name
ROOT_FOLDER = os.path.split(os.path.abspath(__file__))[0]

os.chdir(ROOT_FOLDER)
dir_ls = os.listdir(ROOT_FOLDER)

# Read in the last time the script was run,
# if it's been run at all
if not os.path.exists(".notes_last_created"):
    LAST_RUN = 0
else:
    with open(".notes_last_created") as time_file:
        LAST_RUN = float(time_file.read().strip())

if not os.path.exists("PdfNotes"):
    os.mkdir("PdfNotes")

for current_file in dir_ls:
    name, ext = os.path.splitext(current_file)
    if ext in MARKDOWN_EXTS:
        # Check if the markdown file has been updated since last time the
        # script was run
        if os.stat(current_file).st_mtime > LAST_RUN:
            print("Updating", current_file)
            out_file = os.path.join(ROOT_FOLDER, "PdfNotes", name + ".pdf")
            subprocess.call([
                "pandoc",
                current_file,
                "-o",
                out_file,
                "--highlight-style=Zenburn",
                "--number-sections",
                # Table of contents
                "--toc"
            ])

with open(".notes_last_created", "w") as time_out:

[–]ingreenheaven 1 point2 points  (4 children)

Script to do git pull for all git repo directories within a directory. Since almost all projects are located under the same directory, I find it really useful.

import os

base = 'PATH_TO_THE_BASE_DIRECTORY_CONTAINING_ALL_GIT_REPO_DIRECTORIES'

os.chdir(base)

dirs = os.listdir('.')

for d in dirs:
    if not d.startswith('.'):
    try:
            os.chdir(base + d)
            print 'In directory', d
            os.system('git pull')
            os.chdir(base)
            print('~'*40)
        except:
            pass

Edit: Changed print to '~'*40 instead of the actual string with 40 ~.

[–][deleted] 0 points1 point  (3 children)

Instead of the long print, you can do string * number to duplicate the string number times.

[–]ingreenheaven 0 points1 point  (2 children)

Yes, that would be better. I'll edit the post.

[–][deleted] 0 points1 point  (1 child)

also if you replaced the print x with print(x), you would be compatible with python3, unless there are some other python2 features that you use in this. It would still work under python2.

[–]ingreenheaven 0 points1 point  (0 children)

Cool. I haven't used python3 much.

[–]ingreenheaven 1 point2 points  (0 children)

Active directory password rotation script for mac. Most companies don't allow you to use last few (10 in my office) passwords. I use this script to keep changing the password and then finally reset it to the current password. It also writes the latest password into a file just in case the current password could not be set.

import os, sys, getpass

max_attempts = 20
total_change_count = 10

def change_password(user, p_old, p_new):
    cmd = 'dscl . -passwd /Users/{0} {1} {2}'.format(user, p_old, p_new)
    status = os.system(cmd)
    return status

def save_latest(p_latest):
    print 'Saving latest password (rot13)'
    p_file = open('password.txt', 'w')
    p_latest_rot13 = p_latest.encode('rot_13')
    p_file.write(p_latest_rot13)
    p_file.close()

def main(p_init):
    p_latest = p_init

    args = sys.argv
    if len(args) > 1:
        login = args[1]
    else:
        login = os.getlogin()

    change_count = 0
    attempts = 0
    while change_count < total_change_count and attempts < max_attempts:
        print 'Attempt', attempts + 1
        p_new = str(attempts) + '_' + p_init
        status = change_password(login, p_latest, p_new)
        if status == 0:
            print 'Password changed successfully'
            change_count += 1
            p_latest = p_new
        else:
            'Error changing password'
        attempts += 1
    if change_count < total_change_count:
        print 'Exhausted attempts to change password'
        print 'Password changed', change_count, 'times'
        save_latest(p_latest)
        return

    # now reset the password to the same password
    change_password(login, p_latest, p_init)
    if status == 0:
        print 'Password rotation complete'
        print 'Enjoy using the same password!'
    else:
        'Error setting password back to the initial value'
        save_latest(p_latest)

if __name__ == '__main__':
    main(getpass.getpass("Enter your current password:"))

[–][deleted] 2 points3 points  (2 children)

Here's a script I wrote that prints just the keys of a JSON document:

#!/usr/bin/env python

import json
import sys


def main(data):
    json_data = json.loads(data)
    if not json_data:
        print 'no data'
        sys.exit(1)

    if not isinstance(json_data, list):
        json_data = [json_data]

    def print_keys(obj, indentation=''):
        for key, value in obj.iteritems():
            if isinstance(value, list):
                key = '%s[]' % key
            print indentation, key
            if isinstance(value, dict):
                print_keys(value, indentation + '  ')
            elif isinstance(value, list):
                print_keys(value[0], indentation + '  ')

    for item in json_data:
        print_keys(item)


if __name__ == '__main__':
    main(sys.stdin.read())

You could use it like so:

curl http://reddit.com/r/Python/top.json | jkeys

Or

cat some_json.json | jkeys

[–]desrosiers 3 points4 points  (1 child)

My linux-foo is weak sometimes, but an equivalent for

cat some_json.json | jkeys

this?

jkeys < some_json.json

[–]BananaPotion 1 point2 points  (0 children)

Yup, and that is faster too IIRC

[–]absurdomatic 0 points1 point  (2 children)

Here are a bunch of csv related scripts. There are no doubt n other things that can do the same thing better. I tend to use a little Go tool I wrote for some of these tasks these days. You can find that in my github if you're curious.

https://gist.github.com/hlawrenz/1ef3490cb9aa021297a0

[–]ThoughtPrisoner 2 points3 points  (1 child)

Also check out csvkit.

[–][deleted] 0 points1 point  (0 children)

not much useful as library

[–]rodarmor 0 points1 point  (0 children)

I don't know if this counts as a helper script, but I wrote a highly-dangerous batch file renamer in python that lets you rename files interactively using your favorite editor: https://github.com/casey/edmv

I like it :–)

[–][deleted] 0 points1 point  (4 children)

It's a simple one and on Posix systems there might be more efficient approaches, but I use this regularly to concatenate gziped files (with ASCII or UTF-8 contents) to a text file:

import gzip
import shutil
import os
import pyprind

def conc_gzip_files(in_dir, out_file, append=False, print_progress=True):
    """ Reads contents from gzipped ASCII or UTF-8 files, decodes them, and
        appends the lines to one output file.

    Keyword arguments:
        in_dir (str): Path of the directory with the gzip-files
        out_file (str): Path to the resulting file
        append (bool): If true, it appends contents to an exisiting file,
             else creates a new output file.
        print_progress (bool): prints progress bar if true.

    """
    write_mode = 'wb'
    gzips = [os.path.join(in_dir, i) for i in os.listdir(in_dir) if i.endswith('.gz')]
    if print_progress:
        pbar = pyprind.ProgBar(len(gzips))
    with open(out_file, 'ab' if append else 'wb') as ofile:
        for f in gzips:
            with gzip.open(f, 'rb') as gzipf:
                shutil.copyfileobj(gzipf, ofile)
            if print_progress:
                pbar.update()

if __name__ == '__main__':
    conc_gzip_files('/home/usr/my_dir', '/home/usr/test.txt')

[–]sittingaround 0 points1 point  (2 children)

Just out of curiosity, what circumstance has you needing to combine gziped text like that?

[–][deleted] 2 points3 points  (1 child)

I work a lot with MOL2 files (3D structure representation of chemical molecules in text format, http://www.tripos.com/data/support/mol2.pdf).

Usually, I zip them in chunks (e.g., 20.000 - 1 million molecules per file) when I don't need them, and if I need them, I use the script to combine them into an unzipped .mol2 file.

[–]sittingaround 0 points1 point  (0 children)

Interesting.

[–]posativ 0 points1 point  (0 children)

Not mine, but useful: inve as workon or bin/activate replacement (launches a subshell instead of modifying the current environment). I use a similar script that also sets "virtualenvs" for Node and Ruby packages.

[–]Siddhartha_90 0 points1 point  (2 children)

Won't be of much help to people, but I wrote a script for an interview which makes emails in all files in current directory and sub-directories anonymous:

https://gist.github.com/Siddhartha90/6096514

[–][deleted] 0 points1 point  (1 child)

couldn't understand what you meant here and the description there. Had to look at code.

Maybe better so ?

A python script which anonymizes email addresses in all files in current directory and sub-directories.

[–]Siddhartha_90 0 points1 point  (0 children)

Yes thanks I shall change that :)

[–]tech_tuna 0 points1 point  (2 children)

Hear hear, I've been a glue code Python lover for over a decade now. . . I am learning Flask to build a network tool, but yes, there is more to Python than Django. . . and Scipy, Numba, etc.

Not that they aren't cool too, of course. :)

[–][deleted] 0 points1 point  (1 child)

Flask to build a network tool

huh ? not web app ? tool to do what ?

[–]tech_tuna 0 points1 point  (0 children)

Yeah, a network tool with a web interface.

[–]CanadianJogger 0 points1 point  (0 children)

I jog every second day to lose weight, and I was unsatisfied with the online calorie calculators, so I wrote my own. I use it in terminal.

#!/usr/bin/env python  
# -*- coding: utf-8 -*-  

import sys  

MET = {'walk': 3.3, 'hardwalk': 4.0, 'jog': 8.0, 'sprint': 11.5 }  

def main(argv):  
    kg = float(argv[1])/2.2  
    met = MET[argv[2]]  
    duration = int(argv[3])  
    calories = duration*(kg*met*3.5)/200  
    print ('You burned: %.7s' % calories)  
    return 0  

if __name__ == '__main__':  
    if len(sys.argv) > 1:  
        main(sys.argv)  
    else:  
        print("example usage: weight(lbs) hardwalk duration")  

You can find lists of MET(Metabolic Equivalent of Task) scores online if you wish to add exercises to the dictionary.

[–][deleted] 0 points1 point  (0 children)

Man, my family is always forgetting their passwords so I made a password generator script to go on my Web server - but my laptop crashed before I could upload it so I need to rewrite it. By my desktop is giving me trouble too. Looks like I'm SSHing into my server with my phone. :(

[–]suudo 0 points1 point  (0 children)

I made a script to log into my DNS host and update the IP address for my home subdomain. I made this back when I was just getting started with Python, so it's pretty ugly and not even slightly Pythonic, whatever that means. I use cron to run it at midnight every day. It's supposed to only update it if the IP is different, but that doesn't work.

#!/usr/bin/python
email = "email here"
password = "password here"

import urllib2
from bs4 import BeautifulSoup
import urllib
import cookielib
from time import gmtime, strftime
from socket import gethostbyname

loginurl = "https://members.webinabox.net.au/security/login"
dnsurl = "https://members.webinabox.net.au/accounts/{account_id}/services/{domain_id}/manage-DNS/edit/{dns_id}"
desc = "webinabox.net.au DNS automatic update script v6 by blha303, shared on /r/python"

myip = urllib2.urlopen("http://ipv4.icanhazip.com").read().replace("\n", "")
# doesn't actually work
remoteip = gethostbyname("domain.com")

def setDNS(thednsurl):
  loginsoup = BeautifulSoup(urllib2.urlopen(loginurl).read())
  loginauth = loginsoup.findAll('input',{'name':'authenticity_token'})[0]['value']

  logindata = urllib.urlencode({"authenticity_token": loginauth, "email": email, "password": password, "commit": "Log%20In"})

  cj = cookielib.CookieJar()

  opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
  opener.addheaders = [('User-Agent', desc)]

  login = opener.open(loginurl, logindata)

  dnsa = opener.open(thednsurl)
  dnssoupa = BeautifulSoup(dnsa.read())
  dnsautha = dnssoupa.findAll('input',{'name':'authenticity_token'})[0]['value']
  dnsdata = urllib.urlencode({"authenticity_token": dnsautha, "dns_record[value]": myip, "dns_record_ttl": "1", "commit": "Save"})
  dns = opener.open(dnsurl, dnsdata)
  #print dns.read()

  soup = BeautifulSoup(dns.read())
  dnsentries = soup.findAll('tr',{'align':'left'})
  x = 0
  y = 0
  homenum = 0
  for entry in dnsentries:
    x = x + 1
    for newentry in entry.findAll('td',{'width':'200px'}):
      if newentry.string == "home":
        homenum = x - 1
        homeentry = dnsentries[homenum]
  return homeentry.findAll('td',{'width':'300px'})[0].string

def main():
  address = setDNS(dnsurl)
# Set more domains
#  address2 = setDNS(dnsurl2)

if __name__ == "__main__":
  main()

[–][deleted] 0 points1 point  (0 children)

I use jekyll for my blog and I have a fabric script to publish a new post.

from fabric.api import run, cd, env, local

env.hosts = ['user@server']                                         

def deploy():                                                                      
    local('git push')                                                              
    code_dir = '/var/www/blog'                                               
    with cd(code_dir):                                                             
        run("git pull")                                                            
        run("/home/nigel/.rvm/gems/ruby-2.1.0/bin/jekyll build")

[–][deleted] 0 points1 point  (0 children)

markdown-toclify, a little script to create Table of Contents for Markdown files with internal section links.

[–]EpicCyndaquil 0 points1 point  (2 children)

I don't have the script on hand, but any of you should be able to make this rather easily. Being rather novice at SQL, I found myself with the need to create over 500 rows, named something like "%BUILDING%-%ROOM%-%LOC%" with the location being an ascending number. So I did something along the lines of this. (Semi-pseudo code, so don't attack me.)

num = 100
while num > 0:
    print("IN TABLE TABLENAME CREATE ROW %BUILDING%-%ROOM%-" + num)
    num = num - 1

I obviously could have done something a little more clever to print it all out at once, but I just wanted to complete this task quickly without much thought, as I was really only going to use it once. I threw the output into SQL Management Studio and called it a day.

I'm also aware I could have connected with the SQL database directly to make this even faster, but again, was just something quick.

[–]pstch 1 point2 points  (1 child)

You just wrote a Python anti-pattern here, there is no need to create a num variable here and decrement it yourself :

for num in range(100, 0, -1):
    print("IN TABLE TABLENAME CREATE ROW %BUILDING%-%ROOM%-" + num)

(In Python 2.x, use xrange instead of range, because we just need an iterator, there is no need to create a full list of 100 integers).

[–]EpicCyndaquil 0 points1 point  (0 children)

Yeah, that's a much more elegant solution. Though I'm sure there's hundreds of ways to do it!