## How to Combine multiple Commits in Git

The ability to combine multiple commits in Git is a huge plus. Combining commits requires “rebasing” which will essentially rewrite the project history appending commits onto the last commit (by default) in a different branch, or even to an earlier commit in the same branch. Rebasing can have some damaging effects, so be careful when

## How to Create a Max Heap using an Array in Java

A Heap/Binary Heap is a data structure that takes the form of Binary Tree. Heaps are commonly used to implement priority queues (check out the PriorityQueue class in Java). Priority queues are great ways to identify the highest (or lowest) priority items in a collection. A Max Heap is a binary tree data structure in

## How to Create a Hash Table in Java (Chaining Example)

Knowing how to create a hash table is helpful when using built-in HashTable and HashMap implementations in various languages. Questions about hash tables are commonly asked in programming interviews, and often people are asked to create an implementation from scratch. Below is an example of how to create a hash table in Java using “chaining”

## How to Code a Recursive Fibonacci Sequence

A Fibonacci Sequence is a sequence of numbers in which the first and second numbers in the sequence are 0 and 1 respectively, and additional numbers in the sequence are calculated by adding the previous two. The first few numbers in the Fibonacci Sequence look like this: 0, 1, 1, 2, 3, 5, 8, 13,

## How to Code an Iterative Fibonacci Sequence

A Fibonacci Sequence is a sequence of numbers in which the first and second numbers in the sequence are 0 and 1 respectively, and additional numbers in the sequence are calculated by adding the previous two. The first few numbers in the Fibonacci Sequence look like this: 0, 1, 1, 2, 3, 5, 8, 13,

## How to do Total Order Sorting in Hadoop MapReduce

Being able to sort by all keys in a data set is a common need in the world of big data. Those familiar with Hive or relational databases know that this easily be done with with a simple SQL statement. For example, sorting an entire data set by “first_name” would look something like this: SELECT

## How to Create a Custom Writable for Hadoop

If you have gone through other Hadoop MapReduce examples, you will have noticed the use of “Writable” data types such as LongWritable, IntWritable, Text, etc… All values in used in Hadoop MapReduce must implement the Writable interface. Although we can do a lot with the primitive Writables already available with Hadoop, there are often times

## How to get Distinct Values with Hadoop MapReduce

Getting the distinct values from a dataset is a very common task, and actually very easy to do in MapReduce. In psuedo code your mapper and reducer will look something like this:

The mapper above will emit each record as the key, and null as the value. The reducer will take the key and

## Hadoop – Setting Configuration Parameters on Command Line

Often when running MapReduce jobs, people prefer setting configuration parameters from the command line. This helps avoid the need to hard code settings such as number of mappers, number of reducers, or max split size. Parsing options from the command line can be done easily by implementing Tool and extending Configured. Below is a simple

## How to GZip a File in Java

One of the most common compression algorithms out there is gzip. Therefore you are likely to need to compress files using gzip at some time or another. Below is an example of doing this in Java. First create a FileInputStream from the file to be compressed. The data is read, and a compressed version of