Skip to main content

Metrics

The Genesis Platform provides metrics both on the framework and at the application level.

Framework metrics are provided out of the box. Once these have been enabled, they provide base-line information. You can build on top of this to provide application-specific metrics.

Metrics allow in-depth monitoring of running applications and early detection of issues.

tip

It is recommended that you always enable metrics in production applications. This helps early detection of problems and gives you invaluable context if an incident occurs.

Enabling metrics

To enable metrics on your application, add the following settings to your system definition file:

item(name = "MetricsEnabled", value = "true")
item(name = "MetricsReportType", value = "{see below for details}")
item(name = "MetricsClassLoaderStatsEnabled", value = "true")
item(name = "MetricsProcessorStatsEnabled", value = "true")
PropertyDefaultExplanation
MetricsEnabledfalseenables metrics for your application
MetricsReportTypen/aa comma-separated list of metrics reporters
MetricsProcessorStatsEnabledfalseenables additional process metrics
MetricsClassLoaderStatsEnabledfalseenables additional classloader metrics

Metric names

The framework provides an abstraction on top of the reporters. A single interface is used, regardless of how the metrics are reported.

Each metric has an identifier that consists of a name and a series of tags.

The framework provides these tags for every metric:

MetricExplanation
groupNameprocess group, e.g. GENESIS or AUTH
processNamethe name of the process e.g. GENESIS_ROUTER
hostnamethe hostname e.g. TAM_PROD1

In addition to these tags, each metric can provide additional tags. How these tags are reported depends on the metric reporter.

Metric reporters

You can specify multiple reporters. Your list must be comma-separated, e.g. GRAPHITE,DATADOG. The reporter affects the name of metrics. Within the framework, metrics have a name and tags.

The available reporters are:

ReporterDescription
GRAPHITEPublishes metrics to a Graphite server
DATADOGPublishes metrics to a Datadog server
SLF4JPublishes metrics to the services' log file

Graphite

Graphite is an open-source sink that you can use for metrics in Genesis applications.

Graphite is often paired with Grafana, which can source data from Graphite and provides the means to set up and display alerts.

Graphite reports metrics in a hierarchical - tree - structure.

There are two ways for Genesis to generate the metric id for graphite: either hierarchical or dimensional.

Hierarchical metric structure

When the hierarchical structure is used, the metric id will be built of the following components separated by dots ..

  1. genesis
  2. the value of the group name tag
  3. the value of the process name tag
  4. the value of the host name tag
  5. additional tags, including the key and value
  6. the metric name

e.g. genesis.genesis.genesis_router.tam_prod1.active_connections

Dimensional metric structure

When the dimensional metric structure is used, the metric id will begin with the metric name and all tag keys and values are joined.

e.g. active_connections{hostName=tam_prod1.processName=genesis_router.groupName=genesis}.

Settings

To support Graphite, the URL and port should be specified in your system definition file:

item(name = "MetricsGraphiteURL", value = "localhost")
item(name = "MetricsGraphitePort", value = "2003")
item(name = "MetricsStructureType", value = "hierarchical")

For the MetricsStructureType, Genesis metrics supports both hierarchical and dimensional.

Datadog

Datadog is a proprietary cloud-based metrics sink that you can use with Genesis.

Datadog supports tags, and these are provided, along with the metric name.

To use Datadog, make the following three settings to your application's system definition file:

item(name = "MetricsDatadogApiKey", value = "YOUR_API_KEY")
item(name = "MetricsDatadogApplicationKey", value = "YOUR_APP_KEY")
item(name = "MetricsDatadogUri", value = "https://api.datadoghq.com")

SLF4J

Metrics can be reported straight to the log file.

This is an easy way to test metrics during development and testing; however, it is not recommended in a production environment.

There are three optional properties for the logger reporter:

item(name = "MetricsReportIntervalSecs", value = "60")
item(name = "Slf4jReporterLoggingLevel", value = "DEBUG")
item(name = "Slf4jReporterLogInactive", value = "true")
PropertyDefaultExplanation
MetricsReportIntervalSecs60interval between metrics log messages
Slf4jReporterLoggingLevelDEBUGthe log level to use
Slf4jReporterLogInactivetruewhen set to true, metrics with a zero 0 will not be reported

Adding custom metrics

The MetricService class can be injected into any application code. This class is the entry point for registering and tracking metrics.

There are no additional dependencies to add for using the metric service, for example:

class MetricsSample @Inject constructor(
private val metricService: MetricService,
) {
// more class here...

}

Metrics names and tags

When registering metrics, you should provide a metric name and, optionally, you can provide tags.

The way that the metric is then displayed depends on the reporter that you select.

In addition to the provided tags, the Genesis Platform automatically adds the following tags to all metrics:

  • process name
  • host name
  • process group name

It is recommended that you make your metric names clear and distinct. for example, processing_latency rather than just latency.

In addition to a name, tags can also be provided.

Each tag is a key-value-pair that is reported alongside the metric name.

The syntax for registering all metric types is the same, so we will use counter as an example name here.

Tags can be provided as a vararg or as a map.

val counter = metricService.counter(
"update_queue.message_rate",
"topic" to topic,
)

Types of metric

The Genesis Platform provides support for three types of metric.

TypeExplanationExample
counterTracks a rate, monotonically increasingMessages received
timerMeasures timingMessage latency
gaugeTracks an numerical valueNumber of messages queued
gaugeCounterGauge with a counter-like interfaceNumber of messages queued
Gauge or counter?

Use a counter when the rate is interesting. Use a gauge when the number is interesting.

Let us consider an example. When you track the number of messages received, you are usually interested in the number of messages a system handles over an interval, not the total number of messages received since the last restart.

An increase in the rate indicates a higher load on the system.

Conversely, with gauges, the number itself is meaningful; for example, the number of connections available in the connection pool. You don't want to measure how often connections are requested and released in the application. However, you do want to track that connections are always available during application runtime.

Note that a timer also counts the occurrences, so you never need to use a counter and a timer in the same place.

Counter

A counter is used to track an ever-increasing number of specific events. You can think of this as providing a rate over time.

It is generally more interesting to see how many messages have been processed in the last 5 minutes, than it is to see how many messages have been processed since a service was last started.

Example of counters:

  • update queue counter - how many database updates on a table are published every interval
  • message counter - how many messages does a service receive at every interval
Usage

Once a counter has been declared, to use it, just use the increment() function. In this example we will assume that a counter with name myCounter has been declared.

    myCounter.increment()

Timer

Timers are used for two things:

  • to time specific events
  • to count the number of events

So you never need to use both a timer and a counter to track the same event.

Examples of timers are:

  • message latency - how long a service takes to process a specific message
  • query latency - how long a database operation takes
Usage

To use a timer: start the sample first, and then register it on completion.

The example below declares a timer with the name myTimer.

val sample = Timer.start()
doLotsOfWork()
sample.stop(myTimer)

Gauge

Gauges are useful when you are not interested in a rate or a latency, but you need to measure an absolute value that can go either up or down.

For example:

  • queued messages - how many messages are waiting to be processed
  • memory usage - how much memory is available to a service

There is a clear distinction between counters and gauges:

  • Counters are concerned with the rate of events, e.g. the number of messages in an interval
  • With a gauge we are concerned with the absolute value, e.g. what is the number of outstanding messages
Usage

Gauges require a more verbose syntax. You need to register a class, along with a way to extract a value:

// register the gauge when initialising the class
val myTags = HashMap<String, String>()
myTags["TAG"] = "VALUE"
val myGauge = metricService.gauge(
"message_rate",
myTags,
myClass,
MyClass::value
)

// update the value when required
myClass.value = 12.0

GaugeCounter

A gauge counter is a gauge that is set up to behave like a counter. The benefit is that the counter can increase as well as decrease.

Usage
// register the gauge when initialising the class
myGaugeCounter = metricService.gaugeCounter(
"message_rate",
"topic" to "topic",
)

// update the value when required
myGaugeCounter.increment()
// or
myGaugeCounter.decrement()