How to Load a Text File into Spark

Loading text files in Spark is a very common task, and luckily it is easy to do. Below are a few examples of loading a text file (located on the Big Datums GitHub repo) into an RDD in Spark. If you have looked at the Spark Documentation you will notice that they do not include

read more

Get the MD5 Hash Code of a File with Java

Getting the hash code of a file is a common programming task. MD5 is a very popular and commonly used hashing algorithm. Getting the MD5 hash code of a file with Java can be easily done, and is shown in the code below:

The code above does several things: Creates a MessageDigest object that

read more

Create an MD5 Hash Code from a String in Java

Creating a hash codes from strings is a common programming task. MD5 is a very popular and commonly used hashing algorithm. Creating an MD5 hash code from a String in Java can be easily done, and is shown in the code below:

The code above is does several things: Creates a StringBuilder object to

read more

Generating Fake Data in Java with jFairy

Generating fake data can be a common need when developing applications or loading test data into a database. jFairy is a great fake data generator library built in Java that is very easy to use. jFairy allows you to build data sets containing diverse types of data including names, addresses, telephone numbers, dates, large integers,

read more

How to Create a Fat Jar with Maven

A fat jar or uber jar is a jar that contains the classes of your current project as well as all of the classes on which it depends. For example, if your application requires Joda-Time, your jar file will contain all the classes of your current project, as well as all the classes of Joda-Time.

read more

Get List of Objects in S3 Bucket with Java

Often when working with files in S3, you need information about all the items in a particular S3 bucket. Below is an example class that extends the AmazonS3Client class to provide this functionality. For the most part this class has been adapted from the sample in this AWS post. Aside from some additional methods, one

read more