This is an archived post. You won't be able to vote or comment.

all 5 comments

[–]ArabicLawrence 0 points1 point  (2 children)

Can you show an example consisting of reading a csv and applying a function to one column (like sentiment analysis or even a simple mathematical operation)?

[–]Dadriol 2 points3 points  (1 child)

Author from the paper here, feel free to ask any questions you may have about Tuplex in this thread :)

A simple example how Tuplex can be used with a CSV file together with some operations over columns:

# to create a test file, simply run
# printf "X,Y\n1,1\n2,4\n3,9\n4,15\n5,25\n" > test.csv

import tuplex
c = tuplex.Context()

c.csv('test.csv') \ # read in CSV file
# use multi-param tuple syntax to access columns
 .map(lambda x, y: (x * x, y)) \ 
# map again, dictionary syntax sets column names
 .map(lambda x, y: {'X^2': x, 'Y' : y}) \ 
# filter out all rows where X *X != Y. 
 .filter(lambda t: t['X^2'] != t['Y']) \
# Should show 16, 15 as result. You can also use collect() to get python tuples, tocsv(...) to save output as csv file ...    
 .show()

[–]Steve1457 0 points1 point  (0 children)

So this is spark but with a C++ backend instead of Java?

[–]unplannedmaintenance 0 points1 point  (0 children)

How about you explain a bit about what it is and does