Monday, February 16, 2015

Storing Data in OpenTSDB Using Java

Not a lot of examples exist for showing how to use Java to store a time series in OpenTSDB.  It isn't complicated, but there are a couple of noteworthy items to keep in mind.

The OpenTSDB I'm using is running on a 3 node distributed cluster built on the following:

CentOS 6
Apache Zookeeper 3.4.6
Hadoop 2.4.0
HBase 0.98.8
OpenTSDB 2.0.1
Java 7

This post assumes you have a working OpenTSDB instance.

The first thing to understand is that the OpenTSDB HTTP API is what makes it universal, in terms of what languages you can use to build an interface with the database.  Ultimately what gets passed into the OpenTSDB URL is a JSON array containing all of the time points to be stored.

In this example, we're storing ECG time series data for an anonymous patient.  Due to HIPAA regulations, even the specific date the data was gathered is off limits, so we will arbitrarily choose 12:00 AM on 1 January 2015 as the start time for this time series.

Step 1: Create an ArrayList of dumb data objects to represent the time series.  Each instance of the DDO will represent a single time point.  There is an example of a suitable class on OpenTSDB's GitHub repository.  The class should have, at minimum, a field for the timestamp, a field for the data value, the metric to be stored and a series of tags.

The metric is the unit being stored.  In an ECG time series, for example, the value is microvolts.  Since ECGs store data in multiple simultaneous channels, I am creating a separate metric for each.

NOTE:  Before you can store values using a particular metric, you must register that metric in the database!

Use the command mkmetric to register your new metric.  For example:

./tsdb mkmetric ecg.V6.uv

where ecg.V6.uv is the new metric being created.

In my own version of the IncomingDataPoint, I changed the type for the field "value" to an int.  This resulted in the resulting JSON array looking the same as in OpenTSDB's documentation.

Timestamps should be expressed in epoch format when stored in the IncomingDataPoint.  In the case of my time series, the data comes in at variable sampling rates, usually around 500 Hz.  OpenTSDB can store time series in intervals as small as one millisecond, which is sufficient for this rate.  The epoch value for 12:00 AM 1 January 2015 is 1420088400000, with resolution to the millisecond.

Tags are an optional way to add details to individual time points that can be used in searches for data as well as provide meta data.  They are stored in the dumb data object as a HashMap.


ArrayList dataPoints = new ArrayList();
HashMap tags = new HashMap();
dataPoints.add(new IncomingDataPoint("ecg.V6.uv", 1420088400000, 5, tags));

Once your ArrayList contains all of the data points, it needs to be converted into a JSON array using GSON.

Gson gson = new Gson();
String json = gson.toJson(dataPoints);

Now all that remains is to open up the URL connection and send the data.  Since we're using the OpenTSDB HTTP API we'll be using port 4242 and the put operation.  An example URL string looks like this

String urlString = "";

Next we open the url:

HttpURLConnection httpConnection = TimeSeriesUtility.openHTTPConnection(urlString);
OutputStreamWriter wr = new OutputStreamWriter(httpConnection.getOutputStream());

And we write our JSON array to it:


We should also listen for the response code, since OpenTSDB will provide feedback that may be useful.

int HttpResult = httpConnection.getResponseCode();

Next, we talk about how to query the data back...