Sorting JSON by Value with JQ (Command Line JSON Processor)

jq is a lightweight command line JSON processor that is very easy to use. Sometimes it is helpful to see your data sorted by a particular field value. Luckily jq makes this easy to do. Here are some sample JSON records we will be working with in this post:

Sorting JSON by value with

read more

Using Variables in JQ (Command Line JSON Parser)

jq is a lightweight command line JSON processor that is very easy to use. Sometimes being able to use variables within a jq script is very useful. Below are various examples of doing this. Here is a sample record from the JSON file we use in most examples:

Using Simple Variables in jq To

read more

Compressing Intermediate Map Output in Hadoop

It is generally recommended to always compress intermediate map output. This is because IO and network transfer are big bottlenecks in Hadoop, and compression can help with both of these issues. Map output is written to local disk, and then transferred (shuffled) across the network to reducer nodes. At this point in a MapReduce job,

read more

How to filter JSON records by value with jq

Often when working with JSON in a Linux/Unix environment, it is nice to be able to filter records based on the values of certain fields. jq is a lightweight command line JSON processor that is very easy to use. jq offers an easy way to filter JSON records based on field values with the select()

read more

How to Decode URLs in Hive

Decoding URLs and strings can be a common task, especially when working with web data. This is easy to do in a language like Java or Python, but what about in Hive? Luckily, this is fairly easy as well. Decoding URLs in Hive with Reflection The first and easiest approach is to use the reflect()

read more

How to Left Pad Numbers in Linux

There are many ways to left pad numbers in Linux. Usually this seems to be done by adding leading zeros in front of the main number. A few examples of doing this can be seen below: Using printf

Using AWK

Looping in Bash

How to Select Random Records in MySQL

The ability to select random records from a table in MySQL can be helpful. Luckily this is easy to do with the RAND() function. RAND() returns a random floating point value between 0 and 1. You can select random records in MySQL by using the RAND() function together with ORDER and LIMIT clauses. Here is

read more

How to Pretty Print JSON on the Command Line

JSON is a very popular platform independent data format. One of the great benefits of working with JSON is that it is generally easy to read. However, reading JSON objects becomes more difficult as the objects become large, especially on the command line. Pretty printing JSON records on the command line makes reading it much

read more

How to Stop and Delete all Docker Containers

People that work with Docker know it is easy to create a large number of containers. Occasionally it becomes necessary to delete unused and unneeded containers. Below is an example of stopping and deleting all existing Docker containers. This is accomplished by using docker stop and docker rm together with docker ps (using command substitution).

read more

Count Number of Occurrences of Characters in Line with AWK

Being able to count the number of occurrences of characters or words in text is a handy trick. Fortunately this is very easy to do in awk with the gsub() function. The syntax for using gsub() looks like this: gsub(regexp, replacement [, target]) gsub() will search target for substrings matching the provided regular expression and

read more