Making sense of data over some time is essential to understand the progress made during the same time. Based on these insights, you can take the necessary action to move closer to achieving your business goals. That said, you need appropriate tools to accurately measure the time-series data and discern patterns or movements from them.
One such tool that effectively measures time-series data is Graphite. This is an open-source tool that was developed by Chris Davis of Orbitz and released in 2006. Today, it's an enterprise-grade monitoring tool that takes in data and displays it in the form of time-series graphs and charts.
Read on as we talk all about Graphite, how you can use it, what tools it integrates with, and more. At the end of this article, you'll be all set to use Graphite to understand the patterns based on your data.
Graphite's Capabilities
Before getting into how you can use Graphite, it's important to know what this tool can and can't do, so you can better understand its fit within your infrastructure.
Essentially, Graphite stores your data and displays them in the form of time-series graphs and charts on demand. Note that Graphite doesn't collect data from any source. This means you need one or more tools like collectd that will collect data from different sources and will send them to Graphite for storage and display.
Graphite's capabilities are powered by a simple architecture that consists of three components:
- Carbon Carbon is a daemon that listens for time-series data. It is based on Twisted, an event-driven networking engine written in Python and licensed as open source. Twisted supports many common protocols such as SMTP, IMAP, POP3, and DNS.
- Whisper Whisper is the database where Graphite stores all the time-series data. It's a fixed database that's similar to the RRDtool. Whisper is fast, reliable, and ideal for storing numeric data. It even allows degrading high-resolution data to lower resolutions for storing historical data.
- Graphite webapp This is a Django-based web application that takes the time-series data and renders them as graphs and dashboards. It uses a 2D graphics library called Cairo to render graphs on demand. As a result, Graphite delivers consistent output on all devices and makes the most of hardware acceleration when available.
Here's a peek into Graphite's architecture.
Now that you have a good idea of what Graphite is and how its architecture looks, let's move on to understanding how you can use it. But before that, a check on the system requirements.
Graphite's System Requirements
Below are the basic requirements you need to run Graphite.
- UNIX operating system or something similar.
- Python 2.7 or higher
- A WSGI and a web server. You can choose from Apache and mod_wsgi, and gunicorn/uWSGI with Nginx.
- Django 1.8 to 2.2
- Cairo graphics library – cairocffi
- The pytz library
- Whisper database library.
Other than these requirements, Graphite has some dependencies for additional features. Some of them include Memcached, python-rrdtool, python-LDAP, and Django database.
With all these requirements in place, you're now all set to install Graphite.
Install Graphite
There are many ways to install Graphite and let's look at these different options.
Source
Head to the download page to install Carbon, Graphite webapp, and Whisper. Alternatively, you can clone the latest releases from the GitHub page.
git clone https://github.com/graphite-project/graphite-web.git
To install, execute the following command:
python setup.py install
This will install Graphite in its default location, which is /opt/graphite
You can also install Graphite and its components in a custom location. For example, if you want to install them in mydrive/graphite, use the following code.
python setup.py install –prefix=/mydrive/graphite –install-lib=/mydrive/graphite/lib
Pip
You can download and install Graphite via Pip, the installer for Python packages. During this installation, Python will automatically identify and install the dependent packages.
Here's the code to install Graphite in the default location.
export PYTHONPATH=”/opt/graphite/lib/:/opt/graphite/webapp/”
pip install –no-binary=:all: https://github.com/graphite-project/whisper/tarball/master
pip install –no-binary=:all: https://github.com/graphite-project/carbon/tarball/master
pip install –no-binary=:all: https://github.com/graphite-project/graphite-web/tarball/master
For a custom installation, use the below code.
pip install https://github.com/graphite-project/carbon/tarball/master –install-option=”–prefix=/mydrive/graphite” –install-option=”–install-lib=/mydrive/graphite/lib”
Replicate it for other Graphite configurations as well.
Virtualenv
Virtualenv is a tool used for creating isolated Python environments. It is mainly used to circumvent dependencies and permissions, as this environment doesn't share libraries or access the global libraries directly. However, virtualenv is slow, not extensible, and doesn't have a rich API. If you decide to use virtualenv, here's the code for installing it.
virtualenv /opt/graphite
source /opt/graphite/bin/activate
To run Carbon, you have to activate virtualenv before starting Carbon.
The above three options are the most popular choices, though you can also use Synthesize and RESynthesize for installing Graphite.
Next, let's talk about configuring Graphite.
Configure Graphite
The configuration is a bit complex, and not as straightforward as changing the config file.
As a first step, tell Django to create the database tables that Graphite will use. The code is:
PYTHONPATH=$GRAPHITE_ROOT/webapp django-admin.py migrate –settings=graphite.settings
Note that Graphite uses SQLite database, but consider changing it to MySQL or PostgreSQL if you want to run multiple Graphite webapp instances. To change the database, use the below code.
$GRAPHITE_ROOT/webapp/graphite/local_settings.py
This command will override the default settings found in settings.py. However, make sure to create this python file. The easiest way is by mimicking the original settings file and changing just the parameters you want.
Next, let's talk about how you can configure Carbon.
Configure Carbon
You can find Carbon's config files in /opt/graphite/conf/
Note that installing Graphite will NOT automatically create Carbon's config files, but you can see an example file for each that will have the “.conf.example” extension. Open these files, remove the “example” extension, and customize the settings.
Next, let's talk about configuring webapp.
Configure Webapp
Start by installing your chosen WSGI and web servers. Here's how you can do it for the three combinations listed earlier.
Nginx + gunicorn
On Debian systems, run
sudo apt install nginx
sudo apt install gunicorn
Otherwise, you can use pip to install these packages. Next, configure nginx.
sudo ln -s /etc/nginx/sites-available/graphite /etc/nginx/sites-enabled
sudo rm -f /etc/nginx/sites-enabled/default
Apache + mod_wsgi
Start by installing mod_wsgi. Next, create graphite.wsgi and finally configure the Apache vhost. You can find sample code for these configurations at https://github.com/graphite-project/graphite-web/blob/master/conf/graphite.wsgi.example and https://github.com/graphite-project/graphite-web/blob/master/examples/example-graphite-vhost.conf respectively.
Nginx + uWSGI
Install uWSGI on Debian with the following code:
uwsgi-plugin-python
Next, create a uWSGI file in /etc/uwsgi/apps-available/graphite.ini
Then, create the wsgi.py file, enable graphite.ini, and restart uWSGI.
Lastly, configure the nginx host and restart nginx.
With this, your Graphite is configured.
Now, let's get down to using Graphite, and we'll start with how you can input data into Graphite.
Input Data Formats
Graphite is highly flexible when it comes to taking data, and you can send data in any of the three formats – Plaintext, Pickle, and AMQP. Essentially, the data that you send to Graphite goes to Carbon, and this is responsible for managing all the data that comes into the application.
Let's talk a bit about each of these formats.
Plaintext
Needless to say, Plaintext is the most straightforward option to input data to Graphite. Any data that you send as plaintext must be in the following format:
<metric path> <metric value> <metric timestamp>
As soon as Carbon receives data in this format, it will translate the contents into a format that both Whisper and webapp can understand. The downside is that it's hard to send data in batches.
Pickle
Pickle is way more efficient than plaintext because you can send data in batches. You can accumulate the metrics and send them in one go, in the pickle format. However, you'll have to add a header that could look something like this:
payload = pickle.dumps(listOfMetricTuples, protocol=2)
header = struct.pack(“!L”, len(payload))
message = header + payload
All the metrics are packed along with this head and sent as a message object through the network socket.
AMQP
Advanced Message Queuing Protocol (AMQP) is an application layer protocol that sends data across IP networks. You can use this format if you've set the value of the AMQP_METRIC_NAME_IN_BODY metric to True in the carbon config file.
Now that you know the different file formats, make sure your data adheres to one of these formats. Next, let's look at the input process.
Sending Data to Graphite
As a first step, decide what kind of data you want to send to graphite. Know that Graphite specializes in helping you see and understand patterns in time-series data.
Here are the steps to follow while sending data to Graphite.
Create a Hierarchy
Every time series that's stored in Graphite has a unique identifier consisting of the name of the metric and a few other parameters as needed. As the data can get overwhelming quickly, it makes sense to create some kind of structure or hierarchy for all the time-series data.
For example, you can say “ittsystems.emailcampaign.pageclicks” to know the number of page clicks of your email marketing campaign done for ittsystems. Though you don't have to follow the same naming convention as the above example, consider having names and hierarchies that will make it easy to search and understand.
Graphite also supports using tags to describe your data, so you can take that option as well.
Decide the Retention Period
All the data sent to Graphite is stored on the Whisper database, and you must configure this database before you start sending data to it. As a part of the configuration process, decide how long you want to store the data and the precision. For example, if you say 5-minute precision for 2 days, it means Graphite will store one data point for every five minutes, and this will be retained for two days.
Know that the storage costs increase with more data points and longer retention periods, so keep your budgets in mind while deciding on these two factors. In general, you can tone down the precision if you want to store data for long periods and vice-versa.
Create a Schema
After making relevant decisions, it's now time to create the schema. Head to /opt/graphite/conf/storage-schemas.conf for the custom location you configured earlier. Essentially, this schema tells Whisper the frequency of data points and their retention period. In the file, mention the metric name and its path.
Some rules to keep in mind:
- The name must be mentioned within square brackets only.
- The retention period must be mentioned as, “retentions=”. This value can be:
- S – second
- M – minute
- H – hour
- D – day
- W – week
- Y – year
- If you want to specify a regex, mention it after “pattern=”
Here's an example.
[page_visits]
pattern = pagevisits$
retentions = 30s:7d
You must use the above pattern to configure all the metrics you want to send to Graphite.
Send Data to Graphite
Finally, you can send data from any supported format to Graphite. Choose the data format in which you want to send the data.
Once you package the data, send it in the below format.
metric_path value timestamp\n
Here, metric_path is the namespace, value is what you want to assign to this metric, and timestamp is the number of seconds since Unix epoch time.
When you send data in this format, Graphite will send it to Whisper and WebApp for storage and rendering, respectively.
Next, let's look at the tools from which you can send data to Graphite.
Graphite's Supported Tools
Graphite can take inputs from a wide range of tools. Let's look at a few popular ones.
- collectd A daemon that gathers performance metrics from different sources and sends them to Graphite for further processing. You can use the Write-Graphite plugin in collectd to send data to Graphite.
- Ganglia A distributed system for monitoring high-performance systems such as grids. This tool collects data and stores them in the RRD format. Later, it sends this information to Graphite for storage and rendering. At the time of writing this article, Ganglia is working on an add-on that can send data directly to Graphite, so this is something to keep an eye on.
- Host sFlow This is the open-source version of the sFlow protocol that collects and sends information about CPU, memory, and disk usage.
- Logster A handy utility for reading log files and sending them to Graphite. This workflow can help you to better understand the event trends happening in your applications and systems.
- Netdata An efficient monitoring agent that can collect metrics from different systems through plugins. It can seamlessly send this data to Graphite for visualization.
- Sensu Another popular monitoring framework that works well with Graphite. It sends metrics at predetermined intervals to Graphite.
- Statusengine A PHP daemon that can send data about the performance of Nagios and Naemon to Graphite.
The above list is not an exhaustive one, but rather intended to give you an idea of the vast integrations, tools, and backends that support Graphite. Check the official documentation if you want to use a specific tool with Graphite. Many developers and the open-source community are building integrations for Graphite because of its versatility, so even if you don't find an integration today, it could be available soon.
Lastly, let's briefly understand the Ceres database, as it could replace Whisper as the default storage format.
What's the Ceres Database?
The Ceres database is a time-series database like Whisper, but has the capabilities to overcome the limitations of Whisper. In particular, Ceres is not a fixed-size database, so it can support a wider range of data. More importantly, this flexibility allows Graphite to distribute its data across multiple servers.
Ceres databases have a simple naming convention as well. They consist of a single tree where the nesting directories and metrics are stored as nodes. This way, you'll have to configure just a single path and create nodes for each subdirectory and metric, respectively.
Due to these benefits, Ceres could replace Whisper in the future, though it is not being developed actively as of now. Still, it may be something to keep in mind for the future.
Before we end, here's a quick recap of all that we have learned so far.
A Quick Recap
Graphite is a popular tool for displaying time-series data in many formats and is widely used to understand the changes that have occurred in metrics over a given period. Graphite is supported on Unix systems and contains three components. Carbon is responsible for sending data to the database called Whisper for storing or to the webapp for rendering.
In this article, we looked at how to install, configure, and use Graphite. We also talked about a few of the hundreds of tools that integrate seamlessly with Graphite to send information to it.
We hope it acts as a useful guide to getting started with Graphite. For more guides, browse www.ittsystems.com