Compressing Intermediate Map Output in Hadoop

It is generally recommended to always compress intermediate map output. This is because IO and network transfer are big bottlenecks in Hadoop, and compression can help with both of these issues. Map output is written to local disk, and then transferred (shuffled) across the network to reducer nodes. At this point in a MapReduce job,

read more

How to GZip a File in Java

One of the most common compression algorithms out there is gzip. Therefore you are likely to need to compress files using gzip at some time or another. Below is an example of doing this in Java. First create a FileInputStream from the file to be compressed. The data is read, and a compressed version of

read more