Downloading all files from a Amazon S3 bucket

I was trying to download all files from an Amazon S3 bucket, and did not feel like clicking through all of the files. Here is the little Python(2) script I came up with:

import urllib2
 
print("Retrieving file list ...")
url = urllib2.urlopen('https://s3.amazonaws.com/tripdata?max-keys=9999999')
data = url.read()
url.close()
 
print("Parsing file list ...")
import xml.etree.ElementTree
e = xml.etree.ElementTree.fromstring(data)
 
keys = e.findall('{http://s3.amazonaws.com/doc/2006-03-01/}Contents/{http://s3.amazonaws.com/doc/2006-03-01/}Key')
keys = filter(lambda k: ".zip" in k, [ k.text for k in  keys ])
 
print("Found files: " + str(len(keys)))
 
print("Start downloading ...")
import os
os.mkdir("data")
for k in keys:
        print("Downloading: " + k)
        with open("data/" + k, "wb") as f:
                f.write(urllib2.urlopen("https://s3.amazonaws.com/tripdata/" + k).read())
                f.close()
 
print("Done downloading.")

If anybody knows an easier way, please let me know.

Sources:

  • http://dabase.com/e/14003/
  • http://stackoverflow.com/questions/4028697/how-do-i-download-a-zip-file-in-python-using-urllib2
VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)
VN:F [1.9.22_1171]
Rating: 0 (from 0 votes)

Leave a Reply