Set AWS Credentials in Cloudera Quickstart Docker Container

Cloudera’s Quickstart Image is a fantastic way to get started quickly with the big data ecosystem. With software such as Hadoop, Spark, Hive, Pig, Impala, and Hue already set up, this Docker image is a must in your big data toolkit. One thing the Cloudera Quickstart container is lacking however, is an easy way to

read more

Copy all Files in S3 Bucket to Local with AWS CLI

The AWS CLI makes working with files in S3 very easy. However, the file globbing available on most Unix/Linux systems is not quite as easy to use with the AWS CLI. S3 doesn’t have folders, but it does use the concept of folders by using the “/” character in S3 object keys as a folder

read more

Get List of Objects in S3 Bucket with Java

Often when working with files in S3, you need information about all the items in a particular S3 bucket. Below is an example class that extends the AmazonS3Client class to provide this functionality. For the most part this class has been adapted from the sample in this AWS post. Aside from some additional methods, one

read more

Using UNIX Wildcards with AWS S3 (AWS CLI)

Currently AWS CLI doesn’t provide support for UNIX wildcards in a command’s “path” argument. However, it is quite easy to replicate this functionality using the –exclude and –include parameters available on several aws s3 commands. The wildcards available for use are: “*” – Matches everything “?” – Matches any single character “[]” – Matches any

read more