How to Change Hadoop Output Delimiter

Hadoop’s default output delimiter (character separating the output key and value) is a tab ("\t"). This post explains how to change the default Hadoop output delimiter.

Output Delimiter Configuration Property

The output delimiter of a Hadoop job can easily be changed by setting the mapred.textoutputformat.separator configuration property. This property can be set from the code itself or from the command line.

Setting delimiter in job class:

Setting delimiter from command line:

Example

We will use the word count example that comes packaged with Hadoop to show how set a custom output delimiter from the command line.

Running word count with default delimiter:


Running word count with custom delimiter

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">