Metrics
The Genesis Metrics module enables you to capture metrics for specific components of your application. You achieve this by inserting programmatic calls into appropriate places in your code.
Any metrics system will have an effect on the performance of the application it is monitoring. The extra code (log statements, metrics, etc) will have an impact in some way.
Genesis uses the well-known metrics library, which is commonly used in Java apps. In all known Genesis applications, the impact is negligible; the benefits far outweigh the very tiny impact on performance.
To make use of the metric calls:
Set
MetricsEnabled
totrue
in the system-definition file.Define the
MetricsReportType
to include a comma-separated list ofMetricsReportType
outputs, which should include at least one of the following:
Set-up (example using SLF4J and GRAPHITE)
In this example, we use a SLF4J log, a Graphite server to capture metrics, and a Datadog reporter. Detailed set-up of a Graphite server is beyond the scope of this document, but it can be run in a Docker container
item(name = "MetricsEnabled", value = "true")
item(name = "MetricsReportType", value = "GRAPHITE,SLF4J,DATADOG")
item(name = "MetricsStructureType", value = "hierarchical") // The choices are 'hierarchical' or 'dimensional' for Graphite, it's dimensional only for Datadog
item(name = "MetricsGraphiteURL", value = "localhost")
item(name = "MetricsGraphitePort", value = "2003")
item(name = "MetricsDatadogApiKey", value = "YOUR_API_KEY")
item(name = "MetricsDatadogApplicationKey", value = "YOUR_APP_KEY")
item(name = "MetricsDatadogUri", value = "https://api.datadoghq.com") // Just an example
item(name = "MetricsReportIntervalSecs", value = "60") // Optional, defaults to 10 seconds if not specified
item(name = "Slf4jReporterLoggingLevel", value = "DEBUG") // Optional, defaults to DEBUG, options include {DEBUG, INFO, TRACE, WARN, ERROR}
Metrics API
To use the API, include the following in your project dependencies:
- Gradle
- Maven
implementation("global.genesis:genesis-metrics")
<dependency>
<groupId>global.genesis</groupId>
<artifactId>genesis-metrics</artifactId>
</dependency>
The object MetricService
is available via the dependency injection mechanism. Simply mark a constructor with the @Inject
annotation and the parameters will be resolved automatically.
Once the MetricService
object is in scope, you can invoke methods on it to retrieve the appropriate metric objects.
The MetricService
distinguishes between two different kind of metrics; process and resource.
Process metrics are values that will exist once and only once per process, current memory utilisation, GC stats etc.
Resource metrics represent data about a named resource that may be one or more resources hosted within a process. Dataservers, event handlers, reqreps and consolidators are all examples of named resources.
For a consistent approach and to make things easier for operations and support staff, the Genesis framework enforces a standard convention for the naming of hierarchical metrics.
For process level metrics:
genesis.$groupName.$processName.$hostname.process.$metricClassifier.$metricName
For resource level metrics:
genesis.$groupName.$processName.$hostname.$resourceClassifier.$resourceName.$metricClassifier.$metricName
Where:
$groupName
is the value in the groupId field of the process definition (or unknown if missing)
$processName
is the name defined in the processes definition
$hostname
is the machine hostname
$resourceClassifier
is the type of resource being monitored in the plural form, e.g. dataservers
$resourceName
is the name given to the resource in its XML or GPAL definition
$metricClassifier
is a qualifying name for the metric value, used to organise the resulting dir structure
$metricName
is the thing actually being measured, such as 'latency', 'rate', 'count'
The path separator is a period '.'
Any of the above parameters can contain any number of periods to further sub-divide the classifiers as required.
If you are using Datadog or the system definition property item(name = "MetricsStructureType", value = "dimensional")
with Graphite, then the metrics will be sent in a dimensional format: $metricName{tag1=value1,tag2=value2,...}
.
Path sanitisation is performed in order to ensure path consistency. Any character in the following set:
*@/\’”;:|[]{}()&^%$,
will be replaced with an underscore.
To create metrics, use the [counter], [timer], [summary], [gauge] and [gaugeCounter] functions.
The metric functions [counter], [timer], [summary], [gauge] and [gaugeCounter] are also lookup functions into the metric registry. As such, path sanitisation only occurs if the non-sanitised path does not already have a mapping. This is to avoid string scans and regex evaluation on every lookup to retrieve a metric object. In order to maximise performance (for example when counting messages in a high volume stream) do not use any upper-case or forbidden characters in your classifiers or names, or store the metric object in a local variable and do not perform a lookup each time it needs to be used.
Counters
Counters are simple metrics that only go up (monotonically increasing). They are typically used to count requests, tasks, errors, etc.
- Kotlin
- Java
class UserAuthentication @Inject constructor(
val metricService: MetricService
) {
fun login(user: User) {
val userLoginCounter = metricService.counter("user_login.count", "username" to user.userName)
userLoginCounter.increment()
// functional code would go here
}
fun logout(user: User) {
val userLogoutCounter = metricService.counter("user_logout.count", "username" to user.userName)
userLogoutCounter.increment()
// functional code would go here
}
}
class UserAuthentication {
private final MetricService metricService;
@Inject
public UserAuthentication(MetricService metricService) {
this.metricService= metricService;
}
void login(User user) {
Counter userLoginCounter = metricService.counter("user_login.count", "username", user.getUserName());
userLoginCounter.increment();
// functional code would go here
}
void logout(User user) {
Counter userLogoutCounter = metricService.counter("user_logout.count", "username", user.getUserName());
userLogoutCounter.increment();
// functional code would go here
}
}
Gauges and gauge counters
- Kotlin
- Java
class UserAuthentication @Inject constructor(
val metricService: MetricService
) {
private val users = mutableSetOf<User>()
val userLoginGaugeCounter = metricService.gaugeCounter("user_count", mapOf())
metricService.gauge("user.count",mapOf(),users)
{
it.size.toDouble() // this is the value that will be reported
}
fun login(user: User) {
// functional code would go here
userLoginGaugeCounter.increment()
}
fun logout(user: User) {
// functional code would go here
userLoginGaugeCounter.decrement()
}
}
public class UserAuthentication {
private final MetricService metricService;
private final HashSet<User> users = new HashSet<>();
private final GaugeCounter userLoginGaugeCounter;
@Inject
public UserAuthentication(MetricService metricService) {
this.metricService = metricService;
this.userLoginGaugeCounter = metricService.gaugeCounter("user_count");
metricService.gauge("user.count", new HashMap<>(), users, HashSet::size);
}
public void login(User user) {
// functional code would go here
userLoginGaugeCounter.increment();
}
public void logout(User user) {
// functional code would go here
userLoginGaugeCounter.decrement();
}
}
Timers
Timers measure how long a piece of code takes to execute. There are several ways we can record the time taken.
Note: Timers and Distribution Summaries support data collection for observing percentile distributions and always publish a count of events in addition to other measurements
- Kotlin
- Java
class UserAuthentication @Inject constructor(
val metricService: MetricService
) {
fun login(user: User) {
val userLoginTime = metricService.timer(
"user.processing_latency",
"username" to user.userName
)
timer {
// functional code would go here
}
}
}
----------------------------------------------
class UserAuthentication @Inject constructor(
val metricService: MetricService
) {
fun login(user: User) {
val userLoginTime = metricService.timer(
"user.processing_latency",
"username" to user.userName
)
val typeTimer = Timer.start()
// functional code would go here
typeTimer.stop(userLoginTime)
}
}
class UserAuthentication {
private final MetricService metricService;
@Inject
public UserAuthentication(MetricService metricService) {
this.metricService= metricService;
}
void login(User user) {
LatencyContext userLoginTime = metricService.timer(
"user.processing_latency",
"username", user.getUserName());
Timer.Sample timer = Timer.start();
// functional code would go here
timer.stop(userLoginTime);
}
}
Summaries
Histograms are built into Distribution Summaries. These track the distribution of events in a similar way to timers, but they don't represent units of time. For instance, Distribution Summaries can measure payload sizes of incoming requests.
- Kotlin
- Java
class Queue @Inject constructor(val name: String, val metricService: MetricService) {
fun queueRequest(request: MetricUtilsTest.Request) {
val summary = metricService.summary("queues.summary", "name" to name)
summary.record(request.size.toLong())
// functional code would go here
}
}
class Queue {
private String name;
private final MetricService metricService;
@Inject
public Queue(String name, MetricService metricService)
void queueRequest(Request request) {
var summary = metricService.summary("queues", "name", name);
summary.record(request.getSize());
// functional code would go here
}
}