Writing Data from Apache Kafka to Text File

When working with Apache Kafka you might want to write data from a Kafka topic to a local text file. This is actually very easy to do with Kafka Connect. Kafka Connect is a framework that provides scalable and reliable streaming of data to and from Apache Kafka. With Kafka Connect, writing a topic’s content to a local text file requires only a few simple steps.

Starting Kafka and Zookeeper

The first step is to start the Kafka and Zookeeper servers. Check out our Kafka Quickstart Tutorial to get up and running quickly.

Creating a Topic to Write to

Creating a topic from the command line is very easy to do. In this example we create the my-connect-test topic.

Creating a Sink Config File

Since we are reading from a Kafka topic and writing to a local text file, this file is considered our “sink”. Therefore we will use the FileSink connector. We must create a configuration file to use with this connector. For the most part you can copy the example available in $KAFKA_HOME/config/connect-file-sink.properties. Below is an example of our my-file-sink.properties file.

This file indicates that we will use the FileStreamSink connector class, read data from the my-connect-test Kafka topic, and write records to /tmp/my-file-sink.txt. We are also only using 1 task to read this data from Kafka.

Creating a Worker Config File

Processes that execute Kafka Connect connectors and tasks are called workers. In this example we can use the simpler of the two worker types, standalone workers (as opposed to distributed workers). You can find a sample config file for standalone workers in $KAFKA_HOME/config/connect-standalone.properties. We will call our file my-standalone.properties.

The main change in this example in comparison to the default is the key.converter and value.converter settings. Since our data is simple text, we use the StringConverter types.

Writing Data to a Kafka Topic

We now need to write some sample data to our Kafka topic. This can easily be done with the kafka-console-producer which takes data from STDIN and writes to Kafka.

Running Kafka Connect

Now it is time to run Kafka Connect with our worker and sink configuration files. As mentioned before we will be running Kafka Connect in standalone mode. Here is an example of doing this with our custom configuration files:

At this point the all data available in the Kafka topic should be written to our local text file. We can confirm this by reading the file contents.

* More about Kafka Connect can be found at http://docs.confluent.io/2.0.0/connect/intro.html,

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">