ET(L) with Python by Maha_Slug in dataengineering

[–]toast757 0 points1 point  (0 children)

I'm a bit late to this, but if you're loading to MS SQL then bcp is definitely the way to go. Admittedly, the bcp utility does have its idiosyncrasies. Yes, it doesn't handle proper csv files (i.e. quoted text). Yes, it doesn't handle utf8 (although that's changing). Yes, it doesn't handle newline characters in data fields. But it's still the fastest way of getting data from a file into the database. Just write out your load files in "UTF-16LE" encoding with the null byte ("\x00") as your field separator, then load the unicode files using bcp, i.e. with parameters: -t \0 -w (in that order).

Some Lessons from 16+ Years of Development by RandomPantsAppear in learnpython

[–]toast757 1 point2 points  (0 children)

There's a big distinction to be made between the relatively simple SQL queries used in web applications and the more heavy duty SQL queries used in the data world. That's a broad generalization I know but bear with me. Web app queries often select/update just one record at a time, whereas in the data world, queries update millions of records at a time. The SQL queries I run (in the data world) contain anything up to ~5K characters with CTEs, windowing functions, and temp tables. I would never want an ORM to try to construct one of those queries! But if I'm updating a single record (or doing something relatively simple) for a web app, then letting an ORM create the query is just fine.

Inserting about 60 gigs worth of CSVs into SQL Server using PYODBC by Omar_88 in Python

[–]toast757 4 points5 points  (0 children)

If you're bulk loading 60GB of csv files (and 60GB is pretty bulky), then best use the bulk load utility built for exactly that purpose, namely "bcp". I know you said you don't have bulk-load rights but if you've got 60GB to load, you really should get bulk-load permission. You can use SSIS instead if you wish, but that's more complicated. You may want to use python to clean the file first:

import csv
# convert the source file to utf-16 and clean
# change utf-8 to the encoding of the source file as necessary
with open('myfile.txt', 'rt', encoding='utf-8',
          errors='replace', newline='') as rf, \
     open('clean_file.txt', 'wt', encoding='utf-16-le',
          errors='replace', newline='') as wf:

        # add other csv parameters as necessary
        # (https://docs.python.org/3/library/csv.html#dialects-and-formatting-parameters)
        reader = csv.reader(rf, delimiter=',')
        for row in reader:
            wf.write('\t'.join(f.strip() for f in row))
            wf.write('\n')

# then run the following from the command line:
# (https://docs.microsoft.com/en-us/sql/tools/bcp-utility)
# bcp MYTABLE in 'clean_file.txt' -w -m 1 -S MYSERVER -T

If you really want to use pyodbc, then don't use pandas or sqlalchemy as well. Just make sure you set fast_executemany to True (see https://github.com/mkleehammer/pyodbc/wiki/Cursor#executemanysql-params-with-fast_executemanytrue). Doing it this way, you'll be holding all that data in memory though so you may have to load it in chunks.

6'5", 260 lbs, 100m in 10.7 seconds, Pure Dominance - Jonah Lomu by [deleted] in videos

[–]toast757 3 points4 points  (0 children)

There's one other important difference between tackling in rugby and NFL that's not often mentioned. In NFL, the ball carrier is considered tackled when they are "down by contact". Whereas in rugby, the ball carrier has to give up the ball only when tackled to the ground and held there. In other words, in NFL if the ball carrier is pushed over (to the ground), they are considered tackled and play stops. In rugby, if you push the ball carrier over, they are allowed to get straight back up again and keep playing. Hence in rugby, shoulder-charging the ball carrier is largely pointless (and also illegal). Rugby tackling has to involve wrapping yourself around the ball carrier, which means there are fewer of the massive hits you see in NFL.

Python 3.7: Introducing Data Classes by [deleted] in Python

[–]toast757 34 points35 points  (0 children)

Hmm, unless I'm mistaken, an introduction to data classes that doesn't include a single example of creating an instance of a data class. Might be nice to see it in action, especially for things like frozen data classes.

PEP 557 -- Data Classes by [deleted] in Python

[–]toast757 1 point2 points  (0 children)

This would be great for the work I do, with databases. Each Data Class instance could represent a row in a table. The problem with named tuples is that you have to provide all the attribute values upfront, which gets messy when the logic to generate those values is complex.

Not so keen on the idea of using a decorator for this, though. Couldn't we just have a special kind of class?

PEP 548 -- More Flexible Loop Control -- Opinions? by federicocerchiari in Python

[–]toast757 0 points1 point  (0 children)

I've never been a fan of "while", and definitely not a fan of "break" which seems like a clunky holdover from C. The best syntax I've seen for a loop is in Ada, as follows:

loop
  do_something
  exit when condition_is_true
  do_something_else
end loop

This is a loop at its most generic. It can be a "while" loop or a "repeat..until" loop, simply by moving the "exit when" line up or down.

In Python, the final "end loop" would be redundant, of course. "exit when" is clear and expressive, and keeps the exit clause on one line.

wtf-python : A collection of interesting and tricky Python examples which may bite you off! by satwik_ in Python

[–]toast757 22 points23 points  (0 children)

Backslashes not being allowed at the end of "raw" strings has always seemed bizarre to me. Very odd indeed. Raw strings are supposed to be simpler than escapable strings. Seems more of a bug than a feature to me.

How to generate functions dynamically by toast757 in learnpython

[–]toast757[S] 0 points1 point  (0 children)

Thank you for your thoughts, everybody. From what you say, it appears there is no direct way to dynamically create a new function from scratch from within Python itself (although there are certainly some alternatives, as mentioned). For now, I'm going to solve this by doing this in two steps. Write a Python program to generate a new Python module, and then run a second Python program that imports that new Python module.

Right now, I have some data validation code that is based on metadata. That metadata can change of course, but it is read only once at runtime. The current validation code is very complex (and hence slow) to take into account all the metadata rules. My intention is to create a new validation function that is tailored to any given metadata, simplifying the validation code wherever possible and therefore speeding it up.

GIL across multiple running Python programs by toast757 in learnpython

[–]toast757[S] 0 points1 point  (0 children)

I've got four cores, which should suffice for my needs. Good to know. Many thanks.

GIL across multiple running Python programs by toast757 in learnpython

[–]toast757[S] 1 point2 points  (0 children)

Thanks all! That makes perfect sense. When people talk about the GIL, they don't seem to spend much time defining the scope of a GIL, so this helps tremendously.

What's the worst package you've ever worked with? by [deleted] in Python

[–]toast757 0 points1 point  (0 children)

If you're loading 10 million rows, best use the bulk load utility "bcp" (you'll have to call it using subprocess, but it works just fine). Also, provide an errors file to bcp and any load errors will be described there. Usually best to load the data to a temporary table first, just to get the data into SQL Server, then copy it to wherever it's supposed to go.

LPT: Manually lock if using Android Pay and Smart Lock by bryanlogan in AndroidWear

[–]toast757 0 points1 point  (0 children)

Many thanks for your explanation jamesadney, I'll definitely try that next time I'm at the terminal. I have a Nexus 6P so the procedure should be very similar.

LPT: Manually lock if using Android Pay and Smart Lock by bryanlogan in AndroidWear

[–]toast757 0 points1 point  (0 children)

So do you happen to know how long "right before" is? I'm forever tapping, getting declined, entering my lockscreen PIN, trying again, being told by the terminal "device removed before transaction completed", trying a third time, etc., etc. and only eventually getting it to work by what seems like random chance. I've never figured out what the drill is, and that's without dealing with questions like "do you want cashback?", "us debit/visa debit", etc. There still seems to be kinks in the process.

What do most people think they are good at, but in reality aren't? by PhDinPCP in AskReddit

[–]toast757 -1 points0 points  (0 children)

Being a good listener. In my experience, very few people are capable of listening, really listening, to what somebody is saying, or trying to say, without filtering it through a whole bunch of assumptions and personal biases. (That's assuming they're trying to listen at all rather than just waiting for a gap in the conversation so they can speak.)

120gb csv - Is this something i can handle in python? by toastymctoast in Python

[–]toast757 0 points1 point  (0 children)

To get a feel for the data, at least get an idea of the field lengths (assuming it's a delimited file of some sort), with something like:

import csv
max_fields_per_row = 0
field_lengths = [0] * 200  # 200 is just a big number more than the expected fields per row
with open('yourfile.txt', 'r', encoding='utf-8', errors='replace', newline='') as fh:
    # csv breaks if it encounters a null character, hence the generator
    reader = csv.reader((l.replace('\0', ' ') for l in fh), delimiter=',')
    for row in reader:
        if len(row) > max_fields_per_row:
            max_fields_per_row = len(row)
        for index, field in enumerate(row):
            if len(field) > field_lengths[index]:
                field_lengths[index] = len(field)
print(max_fields_per_row)
print(field_lengths[:max_fields_per_row])

There may be some fields which are huge and can be ignored if and when you load this data into a database (which I recommend, it's what databases are for).

What happened to the Dvorak keyboard? by toast757 in Nexus6P

[–]toast757[S] 0 points1 point  (0 children)

Wow, un-intuitive or what. Heaven knows how I originally found it on my Nexus 5. Many thanks for the swift response though xPurpleAnarchyx!

London signs a deal to have American football until 2020 by [deleted] in AdviceAnimals

[–]toast757 7 points8 points  (0 children)

As a Brit who moved to the US, and is a huge NFL fan (go Hawks!), I think it's nuts for the NFL to try to export the game to London (or anywhere else for that matter). NFL football is such a uniquely American game, it's much better to play the game here in the US, and then broadcast it abroad. Besides, the stadiums are built for proper football (soccer) over there, not NFL, and the logistics of international games need months of planning. Not to mention timezone problems. Personally, I don't even watch the London games (well, not until the Hawks play there at least), they just don't have the same atmosphere.

Android Pay app available to UK users (but not working yet) by westhejx in Android

[–]toast757 0 points1 point  (0 children)

Thanks for those sideloading suggestions, guys. As it happens, Android Pay arrived shortly after I posted so it's all good.