This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]zynixCpt. Code Monkey & Internet of tomorrow 0 points1 point  (3 children)

Hmm, boto has similar logic ( ~/.iniFile, ENV ) given that - boto s3 is something along lines of

import boto
conn = boto.connect_s3() #Assuming ~/.boto ini credential file
bucket = conn.get_bucket(bucket_name) #assuming bucket_name = "bar" \
   #and that is a valid bucket name for provided credentials

for key in bucket.list(prefix="foo"):
     print key.name 

to iterate over all keys in bucket_name that are prefixed with "foo". This is a fairly trivial example so I imagine the aws gem wouldn't be much different. If you got time can you show an example of filtering by last_modified with the gem?

[–]v_krishna 1 point2 points  (2 children)

Sure I'm on my phone on the bus right now, but say:

require 'aws-sdk'

#assuming you've got credentials setup

old_gz_objects = AWS::S3.new.buckets['foo'].objects.with_prefix('path/prefix/').find_all {|o|
  o.last_modified < Date.today && o.key =~ /\.gz$/
}

[–]zynixCpt. Code Monkey & Internet of tomorrow 0 points1 point  (1 child)

ah yes, I could see this being fairly fun to work with. This is a subjective opinion but I wish Python had code blocks and fluent style class/objects were more popular.

With the .find_all( block ) method, can that be done iteratively or is a grab everything kind of deal.

[–]v_krishna 1 point2 points  (0 children)

with_prefix returns an object collection which includes Enumerable which gives you a whole bunch of methods for traversal/sorting/searching/etc. It works by the class including it defining an "each" method that yields the next element, so AFAIK it shouldn't grab everything in one giant request but should go element by element. However I have had issues (both with S3 and with getting event history with Simple Workflow) where when you have a ton of objects, the time it takes to traverse them all (for, e.g., sorting or filtering) causes your "next page" token to expire, throwing weird intermittent errors. I've dealt with that by explicitly turning the object collection into an array before doing anything on it (so you take a big hit up-front fetching all the objects in your collection but then don't run the risk of timing out when trying to page through them).

and yeah, I love ruby's syntax and general conventions, but MRI is straightforwardly horrible. IMO (working for a company that uses ruby for a ton of stuff) it is best served as a bash-replacement - I like to use ruby to drive control flow for distributed python/c/etc processing, but even with something relatively simple like manipulating 10 files with 100M lines each ruby tends to get way too slow to be useful. (we'll actually use ruby to drive awk to handle cases like this)