Get Input File name in Hive Query

Often times it is useful to know the input file name you are processing in a Hive query. This is a common if useful metadata is stored in the file name. For example, logs from many different servers can be stored in S3, and these files’ names could contain the names or ip addresses of those servers.

Luckily, doing this in Hive is very easy using the INPUT__FILE__NAME “virtual column” which will give the input file’s name for a mapper task. Here is an example:

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">