Contents

Custom Sql Metrics Gathering With Telegraf (Part 2)

In Part 1, we covered how to get telegraf to execute a stored procedure on a schedule. This time we’ll take a look at what that procedure looks like.

What is Influx Line Protocol format

Telegraf is expecting the data to be returned in InfluxDB Line Protocol format. It’s a very simple text format that contains 4 parts.

[measurement],[tags] [fields] [timestamp]

Measurement

This is the name of the metric in influx. Think of it as the equivalent of a table name.

Tags

Tags are what the data is indexed upon. The values in tags shouldn’t change over time because when you have too many tag series influx starts to perform poorly.

Fields

These are the measurements you want to track over time. A good example of this is the time it takes for a backup to occur or the size of a backup.

Timestamp

Influx is a timeseries database so everything must be timestamped. If you exclude the timestamp then Influx will insert the current UTC epoch value for you. I prefer to explicitly include this information. There could be scenarios (ie. a DMZ environment) where we write this data to a file and then load into Influx via a batch process. This timestamp is always in UTC Epoch format.

Here’s an example of a backup measurement in line protocol format

sqlserver_backup,sql_instance=SQL2005,database=master,backuptype=FullDatabase duration_sec=0,size_bytes=8486400 1545951601000000000

Measurement: sqlserver_backup

Tags: 3

  • sql_instance=SQL2005
  • database=master
  • backuptype=FullDatabase

Fields: 2 *duration_sec=0 *size_bytes=8486400

Timestamp: 1545951601000000000 ( Thursday, December 27, 2018 11:00:01 PM UTC )

Creating a stored procedure to output Influx line protocol

With this knowledge in hand we can easily create a procedure to output each row of data as an Influx line. The key thing to remember is that we’re returning a single column with a long string of data concatenated together into line protocol format.

SELECT N'sqlserver_backup' 
     /* TAGS */
     + N',sql_instance=' + Replace( @@serverName, '\', ':' ) 
     + N',database=' + bup.database_name
     + N',backuptype=' + CASE bup.type
                           WHEN N'D' THEN N'FullDatabase'
                           WHEN N'I' THEN N'DiffDatabase'
                           WHEN N'L' THEN N'Log'
                           WHEN N'F' THEN N'Filegroup'
                           WHEN N'G' THEN N'DiffFilegroup'
                           WHEN N'P' THEN N'Partial'
                           WHEN N'Q' THEN N'DiffPartial'
                           ELSE bup.type
                         END 
     /* FIELDS */
     + N' duration_sec=' + Cast(Cast(DateDiff( SECOND, bup.    backup_start_date, bup.backup_finish_date ) AS INT) AS     NVARCHAR(50))
     + N',size_bytes=' + Cast(backup_size AS NVARCHAR) + N' '

     /* UTC Time Epoch */
     + Cast(DateDiff( s, '1970-01-01 00:00:00', DateAdd( hh,     DateDiff( hh, GetDate(), GetUtcDate()), bup.    backup_start_date )) AS NVARCHAR) + N'000000000' AS     lineprotocol

FROM   msdb.dbo.backupset AS bup
WHERE  bup.backup_finish_date > IsNull( @FromDateTime, DateAdd    ( DAY, -1, GetDate()))

This will output a line for each backup that telegraf then feeds into influx directly.

Followup

You can find the procedures I created for this at my GitHub. It also contains the telegraf config files. With this method, Tracy‘s dashboards will now show SQL 2005 instances.

I wouldn’t recommend setting up backup metrics gathering if you haven’t been maintaining your backup history. You could run into timeout issues if there is too much backup data. If you’re use Ola‘s maintenance scripts, there’s a job specifically to cleanup backup history that you can enable.

Next time we’ll setup a new Grafana dashboard to display the collected backup metrics.