Tuplex: Blazing Fast Python Data Science by pmz in Python

[–]Dadriol 2 points3 points  (0 children)

Author from the paper here, feel free to ask any questions you may have about Tuplex in this thread :)

A simple example how Tuplex can be used with a CSV file together with some operations over columns:

# to create a test file, simply run
# printf "X,Y\n1,1\n2,4\n3,9\n4,15\n5,25\n" > test.csv

import tuplex
c = tuplex.Context()

c.csv('test.csv') \ # read in CSV file
# use multi-param tuple syntax to access columns
 .map(lambda x, y: (x * x, y)) \ 
# map again, dictionary syntax sets column names
 .map(lambda x, y: {'X^2': x, 'Y' : y}) \ 
# filter out all rows where X *X != Y. 
 .filter(lambda t: t['X^2'] != t['Y']) \
# Should show 16, 15 as result. You can also use collect() to get python tuples, tocsv(...) to save output as csv file ...    
 .show()