Monitoring tools provide insights into the health and performance of your IT infrastructure. It also helps to identify any suspicious activities such as impending failures, unusual increases in traffic patterns, slow load times, and more.
The increasing complexity of networks makes monitoring a necessity for organizations of all sizes today. However, manually monitoring multiple devices and users round-the-clock is impractical, and this is why automated monitoring tools are increasingly used to do the same.
One such monitoring tool is Riemann. In this article, we will talk about what's Riemann, what are its benefits and limitations, and how you can use it to gain the insights
What is Riemann?
Riemann is a monitoring tool that gathers and processes events from different devices and applications. Next, it sends this gathered data in the form of a stream processing language to any other application that can further analyze it. In other words, Riemann acts as a pipeline to collate data from different devices, filters them as needed, and sends the data to different applications based on these filters. Riemann doesn't analyze the data, but simply forwards them based on the existing configurations.
At first glance, Riemann may seem like a redundant tool, as many apps like collectd that gather information can also send them to other applications. Now, let's imagine a scenario here. Let's say, you have two or three apps to collect data from different sources/environments, and you want to send this data to different destinations. For example, you want to use filters and send relevant information as an email, certain data like CPU utilization and memory management for further analysis, and the remaining ones to be displayed as graphs and charts. Implementing this scenario can be a nightmare without a tool like Riemann.
In the above example, Riemann takes all the data from different sources like collectd, SNMP traps, etc. It puts all the data together and looks at the existing filters. Based on what you've written in the config file, Riemann sends pertinent data to email servers, analysis software, and display apps respectively. The entire process is achieved with just a few lines of code and looks clean and simplified. Also, the complete pipeline is driven by events, where you can configure Riemann to take a specific action when a certain event occurs.
Now, this brings up an important question. What is an event in this context?
What are Events?
An event is anything important that happens in your network. You can configure what events are important for you. Some examples of events include an HTTP request, an exception, a spike in activity beyond a threshold level, and more. These events can be from the CPU, memory, or disk of a device. Also, these events can come from applications, services, and databases as well.
Next, let's talk about data streams.
What are Data Streams?
The Riemann tool is based on the idea of data streams. As environments moved towards heterogeneous cloud environments, the concept of data streams emerged. The idea behind these streams was to quantify the availability and performance of different applications and services by gathering data from multiple sources and filtering them to match certain criteria. The result was information grouped into different categories that could be further used for processing and analysis.
How this essentially works is that every application or service has a data forwarder to send data to monitoring systems in the form of streams.
With these basics, let's understand how Riemann works.
An Overview of Riemann's Working
Riemann gathers events from these different sources and takes actions, as per your configuration. Some sample actions include,
- Emailing the details.
- Generating a notification on tools like PagerDuty.
- Transmitting data to tools like Graphite that, in turn, generate charts and reports based on them.
- Sending the metrics to online monitoring platforms like Librato for further analysis and interpretation.
The above actions are some examples of what actions you can do with the events captured by Riemann. Note that the subsequent actions depend on your requirements, and Riemann will simply follow your instructions and configuration.
Next, let's look at some benefits and limitations of Reimann, so you can better decide where this tool fits into your infrastructure.
Advantages and Disadvantages of Riemann
With a basic understanding of Riemann, let's now understand its advantages and disadvantages, so you can better decide where Riemann will fit within your overall infrastructure.
Advantages of Riemann
The advantages of Riemann are:
- Simple to understand and use as everything is based on streams. Also, it's built to mimic real-world plumbing systems for easy understanding of data flow.
- Works well for even those with a limited technical background.
- Supports sophisticated stream processing when needed.
- Enables you to see the state of your stream at a glance.
- You can throttle or send multiple messages in the same stream.
- Quickly queries the different states.
Disadvantages of Riemann
Below are some of the disadvantages or shortcomings of Riemann.
- Limited integrations.
- Always need another service like Kafka or Grafana to make sense of data streams.
In all, Reimann is handy for integrating data from different sources and sending them to different apps or services based on preconfigured filters.
Now that you have a broad idea of what Riemann and how it works, let's get down to the technical details.
Technical Details
As explained earlier, Riemann combines the flow of different events and takes preconfigured actions to help you understand the state of your systems, and take interventions when needed. Here are some terminologies that you must know before you can start making the most of Reimann.
Events
Earlier, we saw what events are and now, let's see them from a technical implementation standpoint.
Events are a data structure or a struct that combines many related variables of different data types into a single block of memory. This is a composite data type that can be passed between functions. In Reimann, an event has the following variables, and all of them are optional. This means you can include one or more of these variables to form your event struct.
- Host This is the hostname of a device or app.
- Service Any API or other service.
- State A string that can contain values about the criticality of an event, like, “warning”, “error”, “critical”, etc. It can contain a maximum of 255 bytes only.
- Time The time of occurrence of an event. This value is in UNIX epoch seconds.
- Description A description of the event.
- Tags Any relevant tags you want to add to easily identify the event in a log search.
- Metric Depends on what you're trying to measure through this data structure. For example, it could be a value in MB if you want to measure memory utilization.
- TTL The time for which this event is valid. This parameter is described in seconds and expires after the said period.
Other than these fields, you can also add custom fields and attributes to pass pertinent information.
Streams
Streams are the source from which events begin and flow. Every event that enters Riemann is mapped to a stream, and these streams are mentioned in your config file. One stream can also have child streams, to which events are passed as parameters.
Here's a sample stream code.
(streams
(where (and (service #”^app_fail”)
(state “critical”))
(email “support@appteam.com”)))
Streams are treated as conditions. In the above example, if there's a match, the program will go on to execute the remaining code, which is sending an email to the support team. If the event doesn't happen, the entire block is not executed and the program will move to the next “where” block.
You can further filter these streams using the keyword “by”, especially if you don't want a flood of emails to the support team, in the above example. You can partition, or segment streams based on hosts or services Also, you can limit the number of events for which actions must be taken. For example, you can restrict it to about 2 for every 3600 seconds.
In many ways, streams are like the plumbing systems in your real world. You have water, for example, entering the plumbing system at one point, flowing through it for some distance, and finally flowing out into another pipe or a tank for storage. Similarly, events enter your streams and while they are inside, you can filter them into multiple streams, perform a specific action on each event, and so much more.
When you start using these streams, you'll better understand their flexibility.
Riemann Index
The Riemann index is a table that houses all the services and their states that Reimann is tracking. You can even explicitly ask Reimann to index specific events, and it will create a new index with the host and service fields. It also stores the current state of the service that you can look up at any time. In this sense, the Reimann index acts as a single source of truth.
This is why Reimann clients and dashboards query the index table to get the data they want. You can even set the validity of an event and if this value expires, it will no longer be usable, and will be sent back to the stream with the state as “expired.”
Queries
Queries are used to look for particular events. The Reimann dashboard views are displayed from queries, and they can be applied to both past and real-time events. It works like a simple query, where anything that matches your query value is returned. You can use wildcards and standard operator precedence to query your data.
Protocols
A protocol is the common “language” that enables communication with other tools. TCP is the preferred and recommended protocol for Reimann, and it works on the default port 5555. You can also use UDP to send large amounts of data.
In all, these are the basics of working with Riemann. Understanding these concepts will help you to make the most of this versatile tool.
Next, we'll talk about how you can use Riemann for specific actions.
Implementing Actions
As mentioned earlier, you can ask Riemann to perform a host of services such as sending emails, forwarding to other tools, sending the data to storage systems, and more. Below is a brief description of the different actions that you can ask Riemann to take and how you can go about it.
Send Email
You can send a single event or a cluster of events by email. Often, it's a good idea to batch similar events, especially if they are not critical, to avoid overloading any inbox. The service that's responsible for sending emails is Riemann.email. You can also use “Postal” to send emails.
Notification through PagerDuty
You can ask Riemann to send notifications through PagerDuty based on the state of specific events. Use the keywords “:trigger” and “:resolve” to open and close services, respectively. You can also “:http” to notify PagerDuty. Once you've identified the event and its associated notification, use the PagerDuty v2 API to access this tool.
Messages to Slack
You can now send Slack messages to select individuals or channels when certain events occur. To do this, you must use the Slack webhook. Typically, a webhook is a way to post messages from any app to Slack.
As soon as you create a webhook, you'll get a unique URL and all that you have to do is send a JSON payload to it. Note that you can send messages in any format or layout.
Once you have your webhook token, update this value in your configuration file.
Here's a sample of what your configuration file will look like.
(def credentials {:account “your_org”, :token “your_token”})
(def slacker (slack credentials {:username “Riemann bot”
:channel “#channel1”}))
(streams
(where (state “critical”)
slacker))
Integrate with Kafka
Apache Kafka is an open-source platform that processes streams and provides a high-throughput and low-latency performance for real-time data.
You can integrate with Kafka by creating a Kafka client. Simply use the “kafka-output” method in your streams and have a topic name and message key (optional).
Send Messages to Other Riemann Servers
You can also send messages between other Riemann servers. If you're wondering about a use case, here's one.
Let's say, you have only one Riemann server per data center. To avoid overloading, you might want to forward only state changes to the master server, and you can communicate easily between servers with some simple configuration.
Below is a sample code to implement the above-explained scenario.
(streams
(let [client (tcp-client :host “aggregator”)]
(by [:host :service]
(changed :state
(forward client)))))
Forward to Librato
Librato is an analysis and monitoring platform that takes inputs from Riemann streams and displays them in easy-to-understand charts and graphs. Forwarding to Librato from Riemann is simple and all that you have to do is create a Librato client with your username and API key. Next, use this in your stream.
Send Data to Graphite
Graphite is a platform that takes data and converts them into easily digestible charts and graphs. Like Librato, define a client for Graphite and use this client as a stream to send data. An advantage of Graphite clients is that it creates and maintains a connection pool.
Other than these pre-existing integrations, you can also create custom connections by writing a client for the service you want. Before you do, here are some aspects to keep in mind.
- Every message is four bytes, with one byte each for event, query, msg, and attribute.
- Any TCP connection is nothing but a stream of messages.
- The length header is omitted in a UDP message.
- All UDP datagrams have a maximum size of 16384 bytes.
- You can send a list of repeated events and the server will accept it.
- Every event is uniquely identified by its host and service parameters.
- It's possible to query events using a basic query language.
- Leverage Riemann's test suite to write high-quality code.
In all, Reimann is highly configurable and easy to use. As you start creating streams to move your data, you'll get more familiar with using streams to filter data and send the relevant ones to another service for further processing. Once you get the basics, you can better appreciate the power of Riemann as it can take raw data from different sources including collectd, filter them, and pipeline the filtered data to your chosen applications or services. We hope the above information acts as a good starting point for you to get started.
Before we end, here's a quick recap of all that we discussed in this guide.
A Quick Recap
To conclude, Reimann sits as an intermediary tool with data collection tools on one side and data processing and analysis tools on the other. Essentially, it collates data, filters them based on your configurations, and transmits the data to end applications in the form of a data stream for easy processing. It sends these data streams when a specified event occurs. You can control the events and their resulting action through code in your config file. In this article, we talked in depth about the building blocks of the Reimann tool and how you can forward the data stream to other services. We hope this acts as a good starting point for you to leverage the capabilities of Reimann to improve the overall monitoring of your infrastructure.