Writing Text File contents to Kafka with Kafka Connect

When working with Kafka you might need to write data from a local file to a Kafka topic. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a file’s content to a topic requires only a few simple steps.

Starting Kafka and Zookeeper

The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.

Creating a Topic to Write to

Creating a topic from the command line is very easy to do. In this example we create the my-connect-test topic.

Creating a Source Config File

Since we are reading the contents of a local file and writing to Kafka, this file is considered our “source”. Therefore we will use the FileSource connector. We must create a configuration file to use with this connector. For this most part you can copy the example available in $KAFKA_HOME/config/connect-file-source.properties. Below is an example of our my-file-source.properties file.

This file indicates that we will use the FileStreamSource connector class, read data from the /tmp.my-test.txt file, and publish records to the my-connect-test Kafka topic. We are also only using 1 task to push this data to Kafka, since we are reading/publishing a single file.

Creating a Worker Config File

Processes that execute Kafka Connect connectors and tasks are called workers. Since we are reading data from a single machine and publishing to Kafka, we can use the simpler of the two types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in $KAFKA_HOME/config/connect-standalone.properties. We will call our file my-standalone.properties.

The main change in this example in comparison to the default is the key.converter and value.converter settings. Since our file contains simple text, we use the StringConverter types.

Running Kafka Connect

Now it is time to run Kafka Connect with our worker and source configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom config files:

Our input file /tmp/my-test.txt will be read in a single process to the Kafka my-connect-test topic. Here is a look at the file contents:

Reading from the Kafka Topic

If we read from the Kafka topic that we created earlier, we should see the 3 lines in the source file that were written to Kafka:

* More about Kafka Connect can be found at http://docs.confluent.io/2.0.0/connect/intro.html,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">