Delete all Nodes and Relationships in a Neo4j Database

Deleting Nodes and Relationships Deleting all nodes and relationships in a Neo4j database is very simple. Here is an example that does just that: MATCH (n) DETACH DELETE n; The DETACH keyword specifies to remove or “detach” all relationships from a particular node before deletion. If relationships exist on a node at the time deletion

read more

Load a Tab Delimited File into Neo4j

Neo4j is a popular graph database which provides an easy way to import text files using the Cypher query language and a LOAD CSV clause. LOAD CSV by default expects files to be delimited by commas. In order to specify another field delimiter, add FIELDTERMINATOR to the LOAD CSV clause. For example:

The statement

read more

Running Neo4j 3.x on Docker

Neo4j is the world’s leading graph database. Fortunately, Neo Technology publishes official Neo4j Docker images which makes it very easy to get started. Once you have Docker running, simply use the the following command to start running Neo4j in a new container (adjust Neo4j version as necessary):

The -p (or –publish) options above binds

read more

Sorting JSON by Value with JQ (Command Line JSON Processor)

jq is a lightweight command line JSON processor that is very easy to use. Sometimes it is helpful to see your data sorted by a particular field value. Luckily jq makes this easy to do. Here are some sample JSON records we will be working with in this post:

Sorting JSON by value with

read more

Using Variables in JQ (Command Line JSON Parser)

jq is a lightweight command line JSON processor that is very easy to use. Sometimes being able to use variables within a jq script is very useful. Below are various examples of doing this. Here is a sample record from the JSON file we use in most examples:

Using Simple Variables in jq To

read more

Compressing Intermediate Map Output in Hadoop

It is generally recommended to always compress intermediate map output. This is because IO and network transfer are big bottlenecks in Hadoop, and compression can help with both of these issues. Map output is written to local disk, and then transferred (shuffled) across the network to reducer nodes. At this point in a MapReduce job,

read more

How to filter JSON records by value with jq

Often when working with JSON in a Linux/Unix environment, it is nice to be able to filter records based on the values of certain fields. jq is a lightweight command line JSON processor that is very easy to use. jq offers an easy way to filter JSON records based on field values with the select()

read more

How to Decode URLs in Hive

Decoding URLs and strings can be a common task, especially when working with web data. This is easy to do in a language like Java or Python, but what about in Hive? Luckily, this is fairly easy as well. Decoding URLs in Hive with Reflection The first and easiest approach is to use the reflect()

read more

How to Left Pad Numbers in Linux

There are many ways to left pad numbers in Linux. Usually this seems to be done by adding leading zeros in front of the main number. A few examples of doing this can be seen below: Using printf

Using AWK

Looping in Bash

How to Select Random Records in MySQL

The ability to select random records from a table in MySQL can be helpful. Luckily this is easy to do with the RAND() function. RAND() returns a random floating point value between 0 and 1. You can select random records in MySQL by using the RAND() function together with ORDER and LIMIT clauses. Here is

read more