Loading Data from a Database into Spark

One of the great features of Spark is the variety of data sources it can read from. Loading data from a database into Spark using JDBC requires 3 major steps. First you need a running database that support JDBC connections. Next you will need to download and use the JDBC driver of that database. Finally

read more

How to Load Text Files into MySQL

MySQL databases are often populated by loading text files directly into tables. MySQL makes this very easy to do with the LOAD DATA INFILE statements. For example:

LOAD DATA INFILE Statements LOAD DATA INFILE statements can read data into MySQL tables at very high speeds. This will be much faster than running many single

read more

Execute MySQL Scripts on the Command Line

When working with MySQL it is often necessary to execute SQL statements or scripts programmatically from the command line. SQL statements can be included in a SQL script (text) file and executed by the MySQL Client. This can be done in a few different ways. Execute MySQL Script File from Standard Input Statements in a

read more

Connecting Docker Containers with Networking

Docker Containers are isolated environments often running a single service. Usually individual containers need to be connected to others to accomplish more complex tasks. Starting in Docker 1.9 this has become very easy to do with Networking. Docker Networking Docker Networking is a feature that allows you to create a virtual network and attach containers

read more

Select Random Lines from file in Linux

Being able to select random lines from a file in Linux can be very helpful and convenient. There are a few easy ways to do this, including using the shuf utilty. shuf is included on many Linux/Unix systems as part of GNU coreutils. GNU coreutils can be easily installed if not already. Selecting Random Lines

read more

Set AWS Credentials in Cloudera Quickstart Docker Container

Cloudera’s Quickstart Image is a fantastic way to get started quickly with the big data ecosystem. With software such as Hadoop, Spark, Hive, Pig, Impala, and Hue already set up, this Docker image is a must in your big data toolkit. One thing the Cloudera Quickstart container is lacking however, is an easy way to

read more

The Difference between GUID and UUID

GUID (Global Unique Identifier) and UUID (Universally Unique Identifier) are different implementations of the same idea. GUIDs and UUIDs are used as IDs (to identify) unique objects or records. These are very common in a big data environment where coordinating unique IDs in a central location is difficult to do. For practical considerations and nearly

read more

How to Generate a UUID in Java

UUID stands for Universally Unique Identifier. UUIDs are used as IDs (to identify) unique objects or records. An easy way to generate UUIDs in Java is to use the java.util.UUID class. Different variants and variant-versions exist for UUID objects. The methods of this class generally manipulate the Leach-Salz variant, although the constructors allow the creation

read more

How to Generate a UUID in Linux

UUID stands for Universally Unique Identifier. UUIDs are used as IDs (to identify) unique objects or records. An easy way to generate UUIDs in Linux is to use the uuidgen utility on the Linux/Unix command line. Generating UUID with uuidgen Simply executing uuidgen will generate a random UUID. Using the -t option will create a

read more

What is a UUID?

UUID stands for Universally Unique Identifier. UUIDs are used as IDs (to identify) unique objects or records. These are very common in a big data environment where coordinating unique IDs in a central location is difficult to do. Most values (if not all) in a UUID are generated randomly (depending on UUID version). UUID Format

read more