Flag This Hub

Alternative to XML - Protocol Buffers

By


What are the benefits of Protocol Buffers? How they can be used with a REST service? Does it offer performance benefits over XML?

XML is widespread

Today in the enterprise world, XML is the de facto message format for exchanging data between applications. XML has been widely adopted because of its simplicity and readability. Numerous software infrastructures have been built with XML at its core. Web Services have been ubiquitously used to communicate between applications. The last few years has seen adoption of Service Oriented Architecture (SOA) across the enterprises. SOA has enabled organizations to abstract the underlying logic and infrastructure from the consumers of the service and expose well defined interfaces which are generally expressed in XML. Web services - both SOAP and REST based are prevalently being used, thus furthering the spread of XML.

Data transfer has costs

But as organizations continue to grow, with large volume of data being exchanged between applications, serialization and deserialization of XML have begun to become bottlenecks. Applications are finding it hard to significantly scale. Latency has become a concern in many enterprises. They are increasingly relying on FTP or SCP processes for file transfer, usually in a CSV format. This process is cumbersome and error prone. The interface is not clearly defined.

If the cost of exchanging messages between applications can be brought down to an acceptable level, organizations would more readily be able to meet the demands of their growing business. To this effect there has been a growing interest in other technologies in the IT industry. The last few years have seen an explosion in binary protocols. Google released Protocol Buffers. Facebook has released Thrift to the Apache foundation. Avro and BSON are other protocols available. The underlying theme with all of these is the reduction of data size going over the wire as well as faster serialization and deserialization.

This article will explore Protocol Buffers in more detail and show how they can fit into the existing infrastructure seamlessly. More specifically, this author will delve into how Protocol Buffers can be used over HTTP in a REST Style pattern.

Protocol Buffers

Protocol Buffers (ProtoBuf) is a language neutral way of serializing structured data. They can be transported over the wire using any network protocol. Since HTTP is already widely used in organizations, Protocol Buffers can be used over HTTP instead of XML over HTTP. REST style services seem like an ideal candidate for their use. This is especially true for services that are consumed internally within an organization as they then have control over the service design as well as the consumers.

In addition to the reduced message size over the wire, there are two other aspects present in ProtoBuf that make it a very attractive replacement for XML.

Interface Definition

Similar to XSD, Protocol Buffer makes use of well defined structures to define the message format. The message structure is specified in a .proto file. The details are available on the Google web site. Examples of the .proto messages can be found in the next section. There is a proto compiler that generates files in the language desired. A number of languages are supported – java, C++, python to name a few. A programmer needs to fill the generated objects with the data and call a method to write the byte array to a stream. On the client side, the bytes are read and the object constructed from it.

Versioning

Message versioning can be explained in context of the diagram in Fig 1. Let us assume that we have two applications – a client and a service and the Person message is exchanged between the two

Fig 1  Consumer Producer  Configuration with message Person being exchanged
Fig 1 Consumer Producer Configuration with message Person being exchanged

In ProtoBuf the message is defined as follows:

message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
optional string gender = 4;
optional string age = 5;
}
Fig 2 A Proto Message Structure

The number on the right of the field, called tags, is used to identify the fields when the message is serialized in the binary format. Once client and service start using this message to communicate, it is important that these tags do not change on either side.

When a new functionality forces new fields to be added to the message, care must be taken to not break existing clients. In ProtoBuf this can be done by adding new fields with unique tag numbers and making that field optional. It is important to make the field optional so that existing clients' messages continue to be accepted by the server.

message PersonV2 {
required string name = 1;
optional string email = 3;
optional string gender = 4;
optional string age = 5;
}
Fig 3 - Proto Message Structure with additional attribute

To indicate that a change has occurred the service’s minor version can be incremented. New clients use the new fields and the old clients are oblivious to the change. If the service sends the new fields to the old client, they will just be ignored.

Deleting the field is not possible without affecting all clients. At that point it may be better to create a new major version of the service and create a new version of the message as shown below.

message PersonV2 {
required string name = 1;
optional string email = 3;
optional string gender = 4;
optional string age = 5;
}
Fig 4- Field "id" is deleted which requires a new version of the message and also a new major version of the service. Older clients continue using version 1 until they can upgrade.
Fig 4- Field "id" is deleted which requires a new version of the message and also a new major version of the service. Older clients continue using version 1 until they can upgrade.

Using Protocol Buffer with JAX-RS in a RESTful manner

This section will give a brief overview of using Protocol Buffers over a REST service. Protocol Buffers can be downloaded from here. Download protobuf-2.3.0.zip and the Proto compiler - protoc-2.3.0-win32.zip under src/main/java in the class path.

Define a proto message. For this example, the message looks as follows

package "demo.rest.proto.customer";

option java_outer_classname = "CustomerProtos";

enum PhoneType {
    MOBILE = 0;
    HOME = 1;
    WORK = 2;
}
message PhoneNumber {
	required string phoneNumber=1;
	optional PhoneType phoneType=3;
} 
	
message Address {
	required string addressLine1=2;
	optional string zipCode=4;
	optional string country=5;
	optional string state=6;
}

message CreditCard {
	required string cardNumber=1;
	optional string expMonth=2;
	optional string expYear = 3;
	required Address billingAddress = 5;
	required string type=7;
} 	

message Customer{
	 required int32 id=1;
	 required string lastName=2;
	 required string firstName=3;
	 required string email=4;
	 repeated PhoneNumber phoneNumber=6;
	 repeated Address addresses=8;
	 repeated CreditCard creditCards=9;
}	 
Fig 5: Customer.proto.         

It is not necessary to define all message structures in a single .Proto file. One could very well have multiple .proto files. The messages can then be referenced using the import directive. A customer has basic attributes like name, email etc. Customer can have multiple phone numbers, addresses and credit cards designated by the repeated tag. Further details can be found at the Google developer site

Using the protoc compiler on customer.proto, a file called CustomerProtos.java will be created. This file contains all the message structures represented as java inner classes. These inner classes are all extended from ProtoBuf’s abstract class GeneratedMessage which in turn implements the Message interface.

A REST service  - CustomerService can then be created as shown in Fig 6. Annotations are used to describe how the service will be accessed. Via the @Consumes and @Produces annotations one can describe the MIME types that it accepts and produces. The @Path annotation tells the URL the service can be found at.

@Path("/customer")
public class CustomerService {

    @POST
    @Consumes("application/x-protobuf")
    @Produces("application/x-protobuf")
public CustomerProtos.Customer process(CustomerProtos.Customer  person) {
		return person;    
    	}		
    
 @GET
	 @Produces("application/x-protobuf")
	 public CustomerProtos.Customer getCust(){
		return  CustomerProtos.Customer.newBuilder()
		 	.setEmail("a@bdc.com")
		 	.setFirstName("Test")
		 	.setLastName("Customer")
			.setId(1)
		 	.addAddresses(CustomerProtos.Address.newBuilder()
		 			.setAddressLine1("123 Main St")
		 			.setCountry("US")
		 			.setState("CA")
		 			.setZipCode("12345"))
		 	.addPhoneNumber(CustomerProtos.PhoneNumber.newBuilder()
		 			.setPhoneNumber("415 510 5100")
		 			.setPhoneType(CustomerProtos.PhoneType.HOME))
		 	.addCreditCards(CustomerProtos.CreditCard.newBuilder()
		 			.setCardNumber("54111111111111")
		 			.setExpMonth("10")
		 			.setExpYear("2010")
		 			.setType("VISA")
		 			.setBillingAddress(CustomerProtos.Address.newBuilder()
			 			.setAddressLine1("456Main St")
			 			.setCountry("US")
			 			.setState("AZ")
			 			.setZipCode("12345"))
		 	).build();
	 }
Fig 6 – Sample Customer Service which consumes  and produces application/x-protobuf. Notice
the newBuilder method used to generate new objects. Also notice the build
method on the builder object that generates the final Message object that will
be serialized.

This service uses the Jersey implementation of JAX-RS. This article will not delve into details of a REST service implementation. A good reference can be found here.

JAX-RS provides handy hooks to let one describe how to serialize/deserialize a message stream into a JAVA type via the interfaces MessageBodyReader and MessageBodyWriter. The implementation needs to be in the same package as the service. Both these interfaces take Message type parameter. Here is the implementation for this service

@Provider
@Consumes("application/x-protobuf")
@Produces("application/x-protobuf")

public class ProtoBufMimeProvider implements MessageBodyWriter<Message>, MessageBodyReader<Message> {
	//MessageBodyWriter Implementation
	@Override
	public long getSize(Message message, Class<?> arg1, Type arg2, Annotation[] arg3,
			MediaType arg4) {
		ByteArrayOutputStream baos = new ByteArrayOutputStream();
        try {
        	message.writeTo(baos);
        } catch (IOException e) {
        	return -1;
        }
        return baos.size();
	}

	@Override
	public boolean isWriteable(Class<?> arg0, Type arg1, Annotation[] arg2,
			MediaType arg3) {
		return Message.class.isAssignableFrom(arg0);
	}

	@Override
	public void writeTo(Message message, Class<?> arg1, Type arg2, Annotation[] arg3,
			MediaType arg4, MultivaluedMap<String, Object> arg5,
			OutputStream ostream) throws IOException, WebApplicationException {
        ByteArrayOutputStream baos = new ByteArrayOutputStream();
        message.writeTo(baos);
        ostream.write(baos.toByteArray());
	}
	
	
	//MessageBodyReader Implementation
	@Override
	public boolean isReadable(Class<?> arg0, Type arg1, Annotation[] arg2,
			MediaType arg3) {
		return Message.class.isAssignableFrom(arg0);
	}
	@Override
	public Message readFrom(Class<Message> arg0, Type arg1, Annotation[] arg2,
			MediaType arg3, MultivaluedMap<String, String> arg4,
			InputStream istream) throws IOException, WebApplicationException {
		try {
            Method builderMethod = arg0.getMethod("newBuilder");
            GeneratedMessage.Builder<?> builder = (GeneratedMessage.Builder<?>) builderMethod.invoke(arg0);
            return builder.mergeFrom(istream).build();
        } catch (Exception e) {
            throw new WebApplicationException(e);
        }
	}
Fig 7- Provider implementation for translating
application/x-protobuf mime type to JAVA

Performance gains

So much for the possibilities that binary protocols open up! How much of a gain can be achieved? Google claims that there is an increase in efficiency anywhere from 20% to 200% depending on the size of the data. To gain a better insight, performance tests were run on two services – one written in traditional REST style returning XML and the other written to return the same data in binary format over HTTP. The second service was essentially the same service described in the previous section. The test was conducted using the POST verb in which data was posted to the service and the same data returned back. JMeter, a Load testing framework from Apache Foundation, was used to generate load and the response time measured. The chart below summarizes the results achieved.

Fig 8 - Chart is an overlay of two different data points. Bar graph denotes the message size over the wire. Line graph denotes the throughput achieved. Note that eethe Y-axis represents size in case of bar graph and throughput in case of line graph
Fig 8 - Chart is an overlay of two different data points. Bar graph denotes the message size over the wire. Line graph denotes the throughput achieved. Note that eethe Y-axis represents size in case of bar graph and throughput in case of line graph

This chart illustrates that as the object size increases, its representation in XML increases significantly more than its representation in ProtoBuf message. Also as the size increases, the throughput of XML falls dramatically. In the test that was run, there was almost 40% drop in the throughput for XML when the data size was about 47 KB. Another interesting point is that when the data size is relatively small – about 5 KB in XML, the throughput for both ProtoBuf and XML are almost the same. From this it can be inferred that for smaller data sets, then there is no need to abandon XML. If the data being exchanged starts going well north of 50KB, then an organization can reap tremendous benefit by moving to Protocol Buffers.

Conclusion

There are viable options available today to replace XML as the data exchange format over existing infrastructure. Binary protocols like ProtoBuf offer significantly higher throughput for large data sets without sacrificing much. In fact the only drawback in using these binary protocols seems to be the lack of message readability. That is a small price to pay for the benefits that it has to offer and organizations will certainly stand to gain from its adoption.

Comments

Ricky 13 months ago

When was this article written. I cannot find the date.

Submit a Comment
Members and Guests

Sign in or sign up and post using a hubpages account.



    Like this Hub?
    Please wait working