Simple Apache Avro Example using Java

Apache Avro is a popular data serialization system that relies on schemas. The official Avro documentation can be found here:

This post walks through an example of serializing and deserializing data using Avro in Java. Maven is not necessary for working with Avro in Java, but we will be using Maven in this post.

Step 1 – Update pom.xml
Add the text below (versions might need updating) to your Maven pom.xml file. This lower section will allow us to use the convenience of code generation (discussed below).

Step 2 – Define your schema
You will need to create an Avro schema file in the location shown in the sourceDirectory field in your pom.xml file. Here are the contents of our person.avsc file:

Step 3 – Compile your project
By compiling your project, you will trigger code generation. Code generation allows us to automatically create classes to work with BdPerson schema objects (from the previous step). After compiling, the BdPerson class will appear in our com.bigdatums.avro package (defined in schema). There are various ways to compile the project, including running mvn compile from the command line in the project directory.

Step 4 – Create Schema Objects

Use the classes in the newly generated BdPerson class to create schema objects.

Step 5 – Serialize Data to Disk
Use Avro classes and BdPerson schema to serialize data and write to bdperson-test.avro.

Step 6 – Deserialize Avro file and Print Contents
Use Avro classes and BdPerson schema to deserialize Avro file and print contents.

Deserialized output:

Complete code available in the Big Datums GitHub repo.

2 thoughts on “Simple Apache Avro Example using Java”

  1. Tristan

    Please add absolute paths for the files you mention. Why is does the avro schema file need to be within the outputDirectory? It seems like the avro schema would be the source of what the code generation would be based on, so its confusing that its considered an output. Or am I reading into this wrongly? Isn’t the output the generated code? And the input is the schema? Some absolute paths in for every file you mention would really clear this up for me.

    • Big Datums Post Author

      Thank you for pointing this out. This is actually a mistake in the post, and we have corrected it. Your Avro schema should be placed in the sourceDirectory or ${project.basedir}/src/main/avro/ in this example.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code class="" title="" data-url=""> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong> <pre class="" title="" data-url=""> <span class="" title="" data-url="">