Metrics
The Genesis Platform provides metrics both on the framework and at the application level.
Framework metrics are provided out of the box. Once these have been enabled, they provide base-line information. You can build on top of this to provide application-specific metrics.
The platform supports multiple metrics collection methods including built-in reporters (Graphite, Datadog, SLF4J) and OpenTelemetry Java Agent integration for enhanced observability and integration with monitoring tools like Prometheus.
Metrics allow in-depth monitoring of running applications and early detection of issues.
It is recommended that you always enable metrics in production applications. This helps early detection of problems and gives you invaluable context if an incident occurs.
Enabling metrics
To enable metrics on your application, add the following settings to your system definition file:
item(name = "MetricsEnabled", value = "true")
item(name = "MetricsReportType", value = "{see below for details}") // Not required when using OpenTelemetry
item(name = "MetricsClassLoaderStatsEnabled", value = "true")
item(name = "MetricsProcessorStatsEnabled", value = "true")
item(name = "MetricsExecutorEnabled", value = "true")
item(name = "MetricsDbCacheEnabled", value = "false")
| Property | Default | Explanation |
|---|---|---|
| MetricsEnabled | false | enables metrics for your application |
| MetricsReportType | n/a | a comma-separated list of metrics reporters (not required when using OpenTelemetry) |
| MetricsProcessorStatsEnabled | false | enables additional process metrics |
| MetricsClassLoaderStatsEnabled | false | enables additional classloader metrics |
| MetricsExecutorEnabled | true | enables executor service metrics (thread pools, connection pools) |
| MetricsDbCacheEnabled | false | enables database cache metrics (Caffeine cache statistics) |
Framework metrics
The Genesis Platform automatically collects metrics for various framework components when metrics are enabled. These metrics provide insight into the internal operations of the platform without requiring any additional configuration. Some of the collected metrics are detailed below.
JVM Memory Metrics
When metrics are enabled, the platform automatically collects JVM memory metrics using Micrometer's JvmMemoryMetrics. These metrics provide comprehensive visibility into memory usage and include heap memory, non-heap memory and memory pool metrics.
- Heap memory metrics:
- Used heap memory
- Committed heap memory
- Maximum heap memory
- Non-heap memory metrics:
- Used non-heap memory
- Committed non-heap memory
- Maximum non-heap memory
- Memory pool metrics (varies by JVM):
- Eden space (used, committed, max)
- Survivor space (used, committed, max)
- Old generation (used, committed, max)
- Metaspace (used, committed, max)
- Code cache (used, committed, max)
- Compressed class space (used, committed, max)
These metrics help monitor memory consumption patterns, identify potential memory leaks, and optimize heap sizing.
JVM Garbage Collection Metrics
When metrics are enabled, the platform automatically collects garbage collection metrics using Micrometer's JvmGcMetrics. These metrics provide insight into GC performance and include:
- GC pause times: Duration of garbage collection pauses
- GC counts: Number of garbage collection cycles
- GC memory reclaimed: Amount of memory freed during GC cycles
- Per-collector metrics: Specific metrics for each GC collector.
- Young generation GC metrics
- Old generation GC metrics
- Concurrent GC metrics (where applicable)
These metrics are essential for understanding GC behavior, tuning GC settings, and identifying performance bottlenecks related to garbage collection.
JVM Thread Metrics
When metrics are enabled, the platform automatically collects JVM thread metrics using Micrometer's JvmThreadMetrics. These metrics provide visibility into thread usage and include:
- Live thread count: Current number of live threads
- Daemon thread count: Number of daemon threads
- Peak thread count: Maximum number of threads that have been alive since JVM startup
- Thread states: Distribution of threads across different states (runnable, blocked, waiting, timed waiting)
These metrics help monitor thread pool health, detect thread leaks, and optimize thread usage patterns.
ClassLoader Metrics
When MetricsClassLoaderStatsEnabled is set to true, the platform collects classloader metrics using Micrometer's ClassLoaderMetrics. These metrics include:
- Loaded classes count: Total number of classes currently loaded
- Unloaded classes count: Total number of classes that have been unloaded since JVM startup
ClassLoader metrics help monitor class loading behavior and can be useful for detecting classloader leaks or excessive class loading activity.
Processor Metrics
When MetricsProcessorStatsEnabled is set to true, the platform collects processor metrics using Micrometer's ProcessorMetrics. These metrics include:
- CPU usage: Current CPU utilization percentage
- System load average: Average system load over different time periods (1, 5, 15 minutes)
Processor metrics provide insight into CPU resource utilization and can help identify CPU-bound performance issues.
Executor service metrics
When MetricsExecutorEnabled is set to true (the default), the platform automatically tracks metrics for executor services and thread pools used throughout the framework. This includes:
- SQL coroutine dispatcher thread pool: Metrics for the thread pool used by the SQL database layer for coroutine execution, including active threads, queued tasks, and completed tasks.
- Metrics are collected using Micrometer's
ExecutorServiceMetricsand include standard executor metrics such as:- Active threads
- Queued tasks
- Completed tasks
- Thread pool size
Hikari connection pool metrics
When metrics are enabled, the platform automatically instruments HikariCP connection pools used for SQL database connections. These metrics are collected via Micrometer's MicrometerMetricsTrackerFactory and provide visibility into:
- Connection pool size
- Active connections
- Idle connections
- Pending threads waiting for connections
- Connection creation and usage statistics
- Connection timeout metrics
Database cache metrics
When MetricsDbCacheEnabled is set to true, the platform collects metrics for the Genesis database cache (Caffeine cache). These metrics include:
- Cache size (estimated)
- Cache hit rate
- Cache miss rate
- Eviction count
- Load statistics
Cache metrics are automatically registered when the cache is created and provide insight into cache performance and effectiveness.
Metric names
The framework provides an abstraction on top of the reporters. A single interface is used, regardless of how the metrics are reported.
Each metric has an identifier that consists of a name and a series of tags.
The framework provides these tags for every metric:
| Metric | Explanation |
|---|---|
| groupName | process group, e.g. GENESIS or AUTH |
| processName | the name of the process e.g. GENESIS_ROUTER |
| hostname | the hostname e.g. TAM_PROD1 |
In addition to these tags, each metric can provide additional tags. How these tags are reported depends on the metric reporter.
Metric reporters
The Genesis Platform provides built-in metric reporters that can be configured via the MetricsReportType property. These reporters are separate from OpenTelemetry (which is configured independently - see the OpenTelemetry section below).
You can specify multiple reporters. Your list must be comma-separated, e.g. GRAPHITE,DATADOG.
The reporter affects the name of metrics.
Within the framework, metrics have a name and tags.
The available built-in reporters are:
| Reporter | Description |
|---|---|
| GRAPHITE | Publishes metrics to a Graphite server |
| DATADOG | Publishes metrics to a Datadog server |
| SLF4J | Publishes metrics to the services' log file |
Graphite
Graphite is an open-source sink that you can use for metrics in Genesis applications.
Graphite is often paired with Grafana, which can source data from Graphite and provides the means to set up and display alerts.
Graphite reports metrics in a hierarchical - tree - structure.
There are two ways for Genesis to generate the metric id for graphite: either hierarchical or dimensional.
Hierarchical metric structure
When the hierarchical structure is used, the metric id will be built of the following components separated by dots ..
- genesis
- the value of the group name tag
- the value of the process name tag
- the value of the host name tag
- additional tags, including the key and value
- the metric name
e.g. genesis.genesis.genesis_router.tam_prod1.active_connections
Dimensional metric structure
When the dimensional metric structure is used, the metric id will begin with the metric name and all tag keys and values are joined.
e.g. active_connections{hostName=tam_prod1.processName=genesis_router.groupName=genesis}.
Settings
Genesis supports Pickle protocol, defaulting to port 2004. If your Graphite instance exposes a different port for the Pickle protocol, update your system definition as shown below:
item(name = "MetricsGraphiteURL", value = "localhost")
item(name = "MetricsGraphitePort", value = "2004")
item(name = "MetricsStructureType", value = "hierarchical")
Currently, the Genesis Application Platform does not support UDP and PLAINTEXT.
For the MetricsStructureType, Genesis metrics supports both hierarchical and dimensional.
Datadog
Datadog is a proprietary cloud-based metrics sink that you can use with Genesis.
Datadog supports tags, and these are provided, along with the metric name.
To use Datadog, make the following three settings to your application's system definition file:
item(name = "MetricsDatadogApiKey", value = "YOUR_API_KEY")
item(name = "MetricsDatadogApplicationKey", value = "YOUR_APP_KEY")
item(name = "MetricsDatadogUri", value = "https://api.datadoghq.com")
SLF4J
Metrics can be reported straight to the log file.
This is an easy way to test metrics during development and testing; however, it is not recommended in a production environment.
There are three optional properties for the logger reporter:
item(name = "MetricsReportIntervalSecs", value = "60")
item(name = "Slf4jReporterLoggingLevel", value = "DEBUG")
item(name = "Slf4jReporterLogInactive", value = "true")
| Property | Default | Explanation |
|---|---|---|
| MetricsReportIntervalSecs | 60 | interval between metrics log messages |
| Slf4jReporterLoggingLevel | DEBUG | the log level to use |
| Slf4jReporterLogInactive | true | when set to true, metrics with a zero 0 will not be reported |
OpenTelemetry
OpenTelemetry is an open-source observability framework that provides standardized APIs, libraries, agents, and instrumentation to collect telemetry data (metrics, traces, and logs) from applications. By integrating the OpenTelemetry Java Agent into your Genesis application, you can automatically capture comprehensive telemetry data without modifying your codebase.
OpenTelemetry is configured independently from the built-in metric reporters. When using OpenTelemetry, the MetricsReportType property is not required. OpenTelemetry handles metric reporting independently through its own configuration.
OpenTelemetry also provides distributed tracing capabilities. For more information about using OpenTelemetry for tracing in Genesis applications, see the Tracing documentation.
The OpenTelemetry Java Agent offers several key benefits:
- Automatic Instrumentation: Captures metrics, traces, and logs with minimal configuration, automatically instrumenting popular Java libraries and frameworks
- Vendor-Neutral: Supports integration with multiple observability backends, including Prometheus, Grafana, Jaeger, and many others
- Standardized Data: Ensures consistent telemetry data collection across services, making it easier to correlate metrics, traces, and logs
Using the OpenTelemetry Java Agent complements the existing Genesis metrics reporters, allowing you to leverage both the platform's native metrics and OpenTelemetry's broader observability ecosystem.
Enabling OpenTelemetry
To enable the OpenTelemetry Java Agent for your application, add the following settings to your system definition file:
item(name = "OpenTelemetryJavaAgentEnabled", value = "true")
item(name = "OpenTelemetryServiceNameEnabled", value = "true")
item(name = "OpenTelemetryJavaAgentPropertiesFilePath", value = "/path/to/otel-config.properties")
| Property | Default | Explanation |
|---|---|---|
| OpenTelemetryJavaAgentEnabled | false | enables the OpenTelemetry Java Agent for your application |
| OpenTelemetryServiceNameEnabled | true | automatically sets the service name to the process name (when enabled) |
| OpenTelemetryJavaAgentPropertiesFilePath | n/a | path to an OpenTelemetry Java Agent configuration properties file (optional) |
When enabled, the platform automatically:
- Attaches the OpenTelemetry Java Agent JAR from
$GENESIS_HOME/genesis/lib/opentelemetry-javaagent.jar - Sets the service name to the process name using
-Dotel.service.name={processName}(whenOpenTelemetryServiceNameEnabledistrue) - Configures the agent using the provided properties file path (if specified) via
-Dotel.javaagent.configuration-file={path}
OpenTelemetry Configuration
You can provide additional OpenTelemetry configuration through a properties file. This file allows you to configure exporters, sampling rates, resource attributes, and many other OpenTelemetry options.
Example otel-config.properties file:
otel.metrics.exporter=otlp
otel.logs.exporter=none
otel.traces.exporter=none
otel.instrumentation.micrometer.enabled=true
otel.instrumentation.log4j-appender.enabled=true
otel.instrumentation.log4j-appender.experimental.capture-mdc-attributes=*
otel.exporter.otlp.metrics.enabled=true
otel.exporter.otlp.traces.enabled=false
otel.exporter.otlp.logging.enabled=false
otel.exporter.otlp.protocol=http/protobuf
otel.exporter.otlp.endpoint=http://localhost:4318
Alternatively you can configure the OpenTelemetry agent by providing the options to the system definition JVM_OPTIONS property alongside your existing JVM arguments. The platform appends this value to every process defined in the environment.
item(
name = "JVM_OPTIONS",
value = "-Dotel.metrics.exporter=otlp -Dotel.traces.exporter=none -Dotel.exporter.otlp.metrics.enabled=true"
)
For a complete list of available configuration options, refer to the OpenTelemetry Java Instrumentation documentation.
Gradle Configuration (Local Development)
For local development, you can configure OpenTelemetry in your build.gradle.kts file using the genesisExec extension:
genesisExec {
openTelemetry {
enabled = true
serviceNameEnabled = true
extraOptions = "-Dotel.metrics.exporter=prometheus"
propertiesFilePath = "/path/to/otel-config.properties"
}
}
See the Exec Plugin documentation for more details on local development configuration.
Adding custom metrics
The MetricService class can be injected into any application code.
This class is the entry point for registering and tracking metrics.
There are no additional dependencies to add for using the metric service, for example:
- Kotlin
- Java
class MetricsSample @Inject constructor(
private val metricService: MetricService,
) {
// more class here...
}
public class MetricsSample {
private final MetricService metricService;
@Inject public MetricsSample(MetricService metricService) {
this.metricService = metricService;
}
}
Metrics names and tags
When registering metrics, you should provide a metric name and, optionally, you can provide tags.
The way that the metric is then displayed depends on the reporter that you select.
In addition to the provided tags, the Genesis Platform automatically adds the following tags to all metrics:
- process name
- host name
- process group name
It is recommended that you make your metric names clear and distinct. for example, processing_latency rather than just latency.
In addition to a name, tags can also be provided.
Each tag is a key-value-pair that is reported alongside the metric name.
The syntax for registering all metric types is the same, so we will use counter as an example name here.
Tags can be provided as a vararg or as a map.
- Kotlin
- Java
val counter = metricService.counter(
"update_queue.message_rate",
"topic" to topic,
)
var counter = metricService.counter(
"message_rate",
new Pair("topic", topic)
);
Types of metric
The Genesis Platform provides support for three types of metric.
| Type | Explanation | Example |
|---|---|---|
| counter | Tracks a rate, monotonically increasing | Messages received |
| timer | Measures timing | Message latency |
| gauge | Tracks an numerical value | Number of messages queued |
| gaugeCounter | Gauge with a counter-like interface | Number of messages queued |
Use a counter when the rate is interesting. Use a gauge when the number is interesting.
Let us consider an example. When you track the number of messages received, you are usually interested in the number of messages a system handles over an interval, not the total number of messages received since the last restart.
An increase in the rate indicates a higher load on the system.
Conversely, with gauges, the number itself is meaningful; for example, the number of connections available in the connection pool. You don't want to measure how often connections are requested and released in the application. However, you do want to track that connections are always available during application runtime.
Note that a timer also counts the occurrences, so you never need to use a counter and a timer in the same place.
Counter
A counter is used to track an ever-increasing number of specific events. You can think of this as providing a rate over time.
It is generally more interesting to see how many messages have been processed in the last 5 minutes, than it is to see how many messages have been processed since a service was last started.
Example of counters:
- update queue counter - how many database updates on a table are published every interval
- message counter - how many messages does a service receive at every interval
Usage
Once a counter has been declared, to use it, just use the increment() function.
In this example we will assume that a counter with name myCounter has been declared.
- Kotlin
- Java
myCounter.increment()
myCounter.increment();
Timer
Timers are used for two things:
- to time specific events
- to count the number of events
So you never need to use both a timer and a counter to track the same event.
Examples of timers are:
- message latency - how long a service takes to process a specific message
- query latency - how long a database operation takes
Usage
To use a timer: start the sample first, and then register it on completion.
The example below declares a timer with the name myTimer.
- Kotlin
- Java
val sample = Timer.start()
doLotsOfWork()
sample.stop(myTimer)
var sample = Timer.start();
doLotsOfWork();
sample.stop(myTimer);
Gauge
Gauges are useful when you are not interested in a rate or a latency, but you need to measure an absolute value that can go either up or down.
For example:
- queued messages - how many messages are waiting to be processed
- memory usage - how much memory is available to a service
There is a clear distinction between counters and gauges:
- Counters are concerned with the rate of events, e.g. the number of messages in an interval
- With a gauge we are concerned with the absolute value, e.g. what is the number of outstanding messages
Usage
Gauges require a more verbose syntax. You need to register a class, along with a way to extract a value:
- Kotlin
- Java
// register the gauge when initialising the class
val myTags = HashMap<String, String>()
myTags["TAG"] = "VALUE"
val myGauge = metricService.gauge(
"message_rate",
myTags,
myClass,
MyClass::value
)
// update the value when required
myClass.value = 12.0
// register the gauge when initialising the class
var myTags = new HashMap<String, String>();
myTags.put("TAG", "VALUE");
var myGauge = metricService.gauge(
"message_rate",
myTags,
myClass,
MyClass::getRecord
);
// update the value when required
myClass.setValue(12.0);
GaugeCounter
A gauge counter is a gauge that is set up to behave like a counter. The benefit is that the counter can increase as well as decrease.
Usage
- Kotlin
- Java
// register the gauge when initialising the class
myGaugeCounter = metricService.gaugeCounter(
"message_rate",
"topic" to "topic",
)
// update the value when required
myGaugeCounter.increment()
// or
myGaugeCounter.decrement()
// register the gauge when initialising the class
var myGaugeCounter = metricService.gaugeCounter(
"message_rate",
new Pair<>("topic", "topic")
);
// update the value when required
myGaugeCounter.increment();
// or
myGaugeCounter.decrement();