From Unified Logging with Fluentd by Phil Wilkins

This article describes how and when to deploy Fluentd.


Take 40% off Unified Logging with Fluentd by entering fccwilkins into the discount code box at checkout at manning.com.


Deployment considerations

When considering Fluentd deployment for production we need to consider volume metrics i.e. the amount of log data needing to be captured, filtered, routed and stored. To start with let’s assume we’re working in an environment that doesn’t demand hyper scaling. Despite this, at the end of the day we’re dealing with an element of processing demand, but most importantly log entries are being consumed and stored which creates a lot of I/O activity involving either network or physical storage. If you’re familiar with the low levels of computer operations, you can appreciate that every operation comes with an overhead, such as:

  • Every network message is topped and tailed with routing, verification, and details such as the size of the message.
  • Every file write requires the hardware to locate a chunk of physical storage which can be used, record the details of the block of storage used, mechanical positioning of the writing device for physical media.

The more we can group up in cache up log events and transmit them as a block, the more efficient the resources are used. Like all things in life, there’s a tradeoff. We cache up before storage means the data is slower to reach the end of the processes it is subject to. The greater the likelihood that a sudden power loss or component failure results in data information loss. For this article, we don’t need to do anything (other than making sure our environment has enough resources to run).

Minimum footprint

Fluentd resource requirements are pretty minimal by modern machine specs, but worth noting when you get to dealing with small footprint setups.

Table 1. Fluentd minimum footprint

RubyInstaller size

130MB

Ruby Installed Storage needs (with DevKit)

(80MB for basic Ruby, plus 820MB for the Dev Kit) 1GB

Memory required

~20MB

Fluentd additional storage

300KB

Ruby min Version

Ruby 2.1 (against Fluentd v1)

Simple deployment of Fluentd

To get ready to run Fluentd we need to first install Ruby. This is best done by using the latest stable version of Ruby using your operating system’s package framework.  Links to the different installation packages can be found via www.ruby-lang.org. For Windows, we do this by going to the Downloads page has links to the relevant artefact. For Windows, we get taken to https://rubyinstaller.org   to retrieve the Ruby Installer which has been produced to work with a Windows installation tool.

Once downloaded, run the installer, it takes you through the steps to define the preferred location, and it also asks you if you want to install Mysys –say yes to this. Mysys is needed for Ruby Gems, which have a low-level C dependency, such as plugins that call interact with the OS. With it comes several development-related tools such as MinGW, which allows Ruby development to make use of Windows C native libraries. This means we should have Mysys and we recommend taking the full installation with Mingw to support any possible development.

The installer should add Ruby to the Windows PATH environment variable. This can be checked using a Windows shell using the command:

 
 echo %PATH%
  

This displays the PATH environment variable which includes the \bin folder of the installation location of your Ruby installation, for example, C:\Ruby27-x64. If it doesn’t appear, it needs to be, and we can add it within the Windows shell with the command:

 
 setx path "%path%;c:\dir1\dir2"
  

The c:\dir1\dir2 obviously needs to be replaced with the full path to the bin folder of your Ruby install. For example, C:\Ruby27-x64\bin. Linux obviously has its equivalent settings.

With a fresh Windows shell, it should be possible to execute the command ruby –version and Ruby displays the installed version. The next step is to get Fluentd installed.

Fluentd can be installed in a variety of different ways.  Treasure Data provide a Windows installer for Fluentd; it introduces a prefix of td into file and folder names. To avoid those confusions, if you take configuration from Windows development environment to a Linux production environment, we aren’t going to use this route, but rather install Fluentd using Ruby Gems, ensuring we don’t get any of these differences. The process of installing via gems is incredibly easy, and with Ruby in our environment path we need only issue the command:

 
 gem install fluentd
  

As long as you’ve connectivity to https://rubygems.org/ then relevant Gems including dependencies safely download and install. In enterprise and production environments these sites may need to be accessed via a proxy tier.  The installation can be tested by running the following command:

 
 fluentd –-help
  

This displays the help information for Fluentd.  It should also be possible to see the Fluentd and other gems installed in the deployment location lib\ruby\gems\2.7.0\gems\.

In addition to the core Fluentd, the installation also provides some secondary tools.

Table 2. Fluentd support tools

Fluentd Tool

Tool Description

fluent-binlog-reader

Fluentd can create binary log files – for example, file caching. This utility can be used to read the file and generate readable content

fluent-ca-generate

Utility for creating basic (self-signed) certificates that can be used to encrypt communications between Fluentd / Fluent Bit nodes

fluent-cat

fluent-cat provides a means to inject a single log message into Fluentd using the forwarding / REST endpoint.  For example (in a Linux environment:

echo '{"message":"hello"}' | fluent-cat debug.log --host localhost --port 18080

This sends a log event to the local Fluentd instance listening into port 18080.  This is Ideal for a quick verification for the routing, filtering and output steps, but crucially it doesn’t allow us to check the input plugin configurations (hence the LogSimulator).

fluent-debug

A utility to help with remote debugging, used in conjunction with the druby tooling.

fluent-gem

This is an alias to the Rub gem command which lists all the gems available

fluent-plugin-config-format

This provides the means to interrogate a plugin to obtain details of the configuration parameters the plugin supports. As some plugins may work in several different types (e.g. input and output) it’s necessary to specify the role. For example, try:

fluent-plugin-config-format -f txt input tail

This retrieves in a text format the tail output plugin’s configuration parameters.

This utility is ideal for including into a continuous integration pipeline as it can generate documentation in several formats for a custom-built plugin

fluent-plugin-generate

This generates a code skeleton for plugin development. The template includes, a Gemfile, README, stubbed Ruby code for the plugin, plus test code

A couple of O/S Differences

When working with Linux based operating systems, there’s the concept of interrupt signals, perhaps the most commonly known of these are NOHUP and SIGHUP.  Fluentd can make use of these signals in a Linux environment to trigger operations such as reloading the configuration file without the process needing to restart.  The following table summarizes the key interrupts and their impact.

Table 3. Linux Signals and impact

Linux Signal

Effect on Fluentd

SIGINT or SIGTERM

This tells Fluentd to gracefully shutdown and that it clears down everything in memory and any file buffering is left in a clean state. If another process calls Fluentd, it’s better to stop that process first to complete processing the log events completely.

SIGUSR1

This tells Fluentd to ensure all of its cached values – including its own log entries to be flushed to storage and then refresh the file handles to the file storage. This is repeated based on a system value called flush_interval.

SIGUSR2

Secure and graceful handling of reloading the configuration, creating the relevant process lines. It can be considered graceful as it ensures cache any cache is safely stored before reloading.

SIGHUP

This interrupt is most known for forcing a configuration to reload. It performs the same operations as SIGUSR2 but also flushes its internal logs.

SIGCONT

This signal gets Fluentd to record its internal status – thread information, memory allocation etc.

Sending Linux kill commands to the correct process can be used to send these signals – for example, kill -s USR1 3699 sends the SIGUSR1 to Fluentd if the process has an Id of 3699. At present, there isn’t a Windows equivalent way to send these signals, although several change requests have been submitted to the project for such features.

File Handles

Within a Linux file system, the number of file handles that can be used at any one time can be controlled, unlike Windows has these limits driven entirely by OS version and architecture e.g. 32 or 64bit. Additionally, Linux not only uses file handles for real files, but these handles also represent things like network connections etc. For several reasons, the number of file handles can be by default restrictive. It isn’t unusual to need to adjust the number of file handles that can be held open in production environments. For running the use cases in this article on a Linux environment this restriction should not be an issue, but when ramping up the volume in a production context it’s something +to be aware of.

Docker Image

It is possible to also download a prepared Docker image which is made available via Docker Hub (https://hub.docker.com/r/fluent/fluentd/) or directly from the Fluentd GitHub site (https://github.com/fluent/fluentd-docker-image). Running Docker, and particularly Linux based Docker images can be problematic for some Windows environments (the need to get the Windows Subsystem for Linux (WSL) working which is only available for Windows 10, otherwise you need to work through Docker Toolbox for Windows (https://docs.docker.com/toolbox/toolbox_install_windows/) which requires Windows 10 Pro or via the older Docker Windows tooling which requires using VirtualBox (www.virtualbox.org). For the more native installation may demand BIOS changes.

If you’re comfortable with addressing these challenges and adapting the instructions as we go through the examples and exercises then go ahead.

Deploying a log generator

As ideally, we want to prove our configuration for input plugins and confirm configuration for things like log rotation, we want a configuration driven utility that can continuously generate log events. We’ve one available at http://bit.ly/LogGenerator

The utility is written using Groovy which means at its heart is Java and the use of standard Java classes and libraries. Groovy adds several conveniences over Java. Specifically, it includes some libraries that make it perform REST-based activities and it executes as a script meaning tweaking it for your own future needs is quick and easy.

Java Installation

To install Java you can either use a package manager or retrieve and download from https://www.java.com/en/download/. The implementation of the tool has been done to allow Java 8 or later to work, but you need the Java Development Kit (JDK), rather the Java Runtime Engine (JRE).  Once downloaded and installed, you need to ensure that the right version of Java is set up in your PATH environment variable and JAVA_HOME. As you may have other applications also using Java – depending on how they’re set up multiple Java versions can be installed. You can check which version of Java is in use with the command java –version. If this is the case, then it’s worthwhile creating a simple script like, the one below which can be run in the command shell before running the Log Simulator. This assures that all the correct settings are in place (the most common cause of a problem is x32 and x64 bit installations of Groovy and Java trying to interact).

Groovy installation

With Java installed, the next step is to install Groovy, which can be downloaded from https://groovy.apache.org/download.html or like Java installed using a Package manager. As with Java you also want Groovy to be set on the PATH environment variable and GROOVY_HOME setup.  Like Java, you can confirm whether Groovy is suitably installed using the command groovy -–version.

 
 set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_221
 set PATH=%JAVA_HOME%\bin;%PATH%
 echo Set Shell to Java
 java -version
 set GROOVY_HOME=C:\Program Files\Groovy-3.0.2\
 set PATH=%GROOVY_HOME%\bin;%PATH%
 echo Set Shell to Groovy
 groovy --version
  

The simulator uses a properties file to control its behavior and uses a file which describes a series of log entries to transmit.

If you’d like to know what’s going on in more depth, then edit the test-tool.properties file and change the verbose property from false to true. This should display to the console some log entries which are defined in the file small-source.txt.  All the properties for the simulator are explained in the documentation at http://bit.ly/LogGenerator.


Figure 1. LogSimulator output when in verbose mode


Installing Postman

To send single log events to exercise the Fluentd configuration in our ‘Hello World’ scenario. A tool which allows us to do this is needed. Whilst utilities such as cURL can be used, we’ve elected to use Postman as we well-known tool that supports the majority of environments (Windows, Mac OS, Linux). Postman for individual use is Free, and the binary can be retrieved from https://www.postman.com/downloads/.

For Windows, this is an installer which resolves the relevant file locations. Linux, the download is a tarred gzip file which needs to be packed. Once installed/untarred its best to ensure that Postman can be started – for Windows this can be done with the installed links.

The next step is to send a log event using Postman. With Postman started we need to configure it to send a simple JSON payload to Fluentd. The next screenshots show the settings highlighted:


Figure 2. Defined JSON payload to send to Fluentd using Postman


Click on the Send button, and we’ll see the following result:


Figure 3. Fluentd output after sending the REST event


You may have noticed that in the REST API invocation we haven’t defined the time for the log event, therefore the Fluentd instance applies to the current time.

This configuration is as good as ‘chocolate teapot’, as the expression goes. It illustrates the basic idea of Fluentd, the ability to take log events and direct them. Let’s finish this illustration by using the Log Simulator to create a stream of log events.

That’s all for this article. If you want to see more of the book’s contents, you can preview them on our browser-based liveBook reader here.