How To: Download files from Amazon S3 using perl and Amazon::S3
After having such a hard time finding (m)any good examples online, this article is going to explain how to use perl to connect to Amazon's S3 service to download files. I am by no means a perl expert, so take the below code with a pound of salt. This example assumes that you have already a working Amazon S3 service, and the appropriate keys.
This code is used to download files which our customer's upload to our support site. The files are downloaded and stored in a directory by date (YYYYMMDD). We have a number of different users and as such only want to download the files for a specific user ($prefix).
The first thing that you will need to do is to install the Amazon::S3 CPAN module with a command similar to:
cpan -i Amazon::S3
#!/usr/bin/perl</p>
<p>use strict;
use warnings;
use Amazon::S3;</p>
<p>## Amazon S3 Stuff ##
my $aws_access_key_id = 'aws_access_key_id';
my $aws_secret_access_key = 'aws_secret_access_key';
my $bucket = 'bucket-name'; # The bucket name that files are uploaded to.
my $prefix = 'uploads/bucket-name@domain.tld/uuid'; # We only want to get the files that were uploaded for a specific user.
my $http_prefix = 'https://uploads.domain.tld/download/uuid/'; # I use this string to build the URL
## Amazon S3 Stuff ##</p>
<p>## Script Variables ##
my $directory = '/zstore/uploads/'; # Top level of where to save files to
my $logfile = $directory . "scrape.log"; # Where to log the file downloads to.
## Script Variables ##</p>
<p>open(my $fh, "+>>", $logfile); # Open the file handle for writing logs</p>
<p>my $s3 = Amazon::S3->new({ # This sets up the Amazon S3 connection.
aws_access_key_id => $aws_access_key_id,
aws_secret_access_key => $aws_secret_access_key
});</p>
<p>print $fh localtime() . ": Getting file list...\n";</p>
<p>my $files = $s3->list_bucket({ # This gets the entire list of files under the $prefix in the $bucket, or dies with an error. $files is a multidimensional hash.
bucket => $bucket,
prefix => $prefix
}) or die $s3->err . ": " . $s3->errstr;</p>
<p>print $fh localtime() . ": Got file list...\n"; # If we have made it this far, we now have the entire list of files.</p>
<p>foreach my $file ( @{ $files->{keys} } ) { # $file is a hash of arrays of hashes, AKA multidimensional hash.
my $file_name = substr($file->{key}, 69); # 20110224110053-file.tar.gz
my $file_true_name = substr($file->{key}, 84); # file.tar.gz
my $file_date = substr($file->{key}, 69, 8); # 20110224
my $sub_directory = $directory . $file_date; # /zstore/uploads/20110224
my $file_path = $sub_directory . '/' . $file_true_name; # /zstore/uploads/20110224/file.tar.gz
my $url = $http_prefix . $file_name; # $url is the actual url to the file</p>
<p> unless (-d $sub_directory){ # Unless the directory is already created
print $fh localtime() . ": Creating $sub_directory...\n"; # Log that we are creating the directory
mkdir($sub_directory); # And create that directory
}</p>
<p> unless (-e $file_path) { # Unless the file has already been downloaded
print $fh localtime() . ": Fetching $file_name...\n"; # Log that we are fetching the file
my $line = system("fetch -mq -o '$file_path' '$url'"); # Actually download the file, assigning the return code of `fetch` to $line
if ($line == 0) { # If the file is successfully downloaded
print $fh localtime() . ": Succesfully got $file_path...\n"; # Log it
} else { # Else
print $fh localtime() . ": Problem getting $file_path...\n"; # Could not fetch it for whatever reason. Move on.
}
}
}
print $fh localtime() . ": Exiting!\n\n"; # We've went through the entire file list. Log it.
close($fh); # Close the file handle.
If you spot any errors or have any questions, comments or feedback, please post a comment!