Custom Sql Metrics Gathering With Telegraf (Part 2)
In Part 1, we covered how to get telegraf to execute a stored procedure on a schedule. This time we’ll take a look at what that procedure looks like.
What is Influx Line Protocol format
Telegraf is expecting the data to be returned in InfluxDB Line Protocol format. It’s a very simple text format that contains 4 parts.
[measurement],[tags] [fields] [timestamp]
This is the name of the metric in influx. Think of it as the equivalent of a table name.
Tags are what the data is indexed upon. The values in tags shouldn’t change over time because when you have too many tag series influx starts to perform poorly.
These are the measurements you want to track over time. A good example of this is the time it takes for a backup to occur or the size of a backup.
Influx is a timeseries database so everything must be timestamped. If you exclude the timestamp then Influx will insert the current UTC epoch value for you. I prefer to explicitly include this information. There could be scenarios (ie. a DMZ environment) where we write this data to a file and then load into Influx via a batch process. This timestamp is always in UTC Epoch format.
Here’s an example of a backup measurement in line protocol format
sqlserver_backup,sql_instance=SQL2005,database=master,backuptype=FullDatabase duration_sec=0,size_bytes=8486400 1545951601000000000
Fields: 2 *duration_sec=0 *size_bytes=8486400
Timestamp: 1545951601000000000 ( Thursday, December 27, 2018 11:00:01 PM UTC )
Creating a stored procedure to output Influx line protocol
With this knowledge in hand we can easily create a procedure to output each row of data as an Influx line. The key thing to remember is that we’re returning a single column with a long string of data concatenated together into line protocol format.
SELECT N'sqlserver_backup' /* TAGS */ + N',sql_instance=' + Replace( @@serverName, '\', ':' ) + N',database=' + bup.database_name + N',backuptype=' + CASE bup.type WHEN N'D' THEN N'FullDatabase' WHEN N'I' THEN N'DiffDatabase' WHEN N'L' THEN N'Log' WHEN N'F' THEN N'Filegroup' WHEN N'G' THEN N'DiffFilegroup' WHEN N'P' THEN N'Partial' WHEN N'Q' THEN N'DiffPartial' ELSE bup.type END /* FIELDS */ + N' duration_sec=' + Cast(Cast(DateDiff( SECOND, bup. backup_start_date, bup.backup_finish_date ) AS INT) AS NVARCHAR(50)) + N',size_bytes=' + Cast(backup_size AS NVARCHAR) + N' ' /* UTC Time Epoch */ + Cast(DateDiff( s, '1970-01-01 00:00:00', DateAdd( hh, DateDiff( hh, GetDate(), GetUtcDate()), bup. backup_start_date )) AS NVARCHAR) + N'000000000' AS lineprotocol FROM msdb.dbo.backupset AS bup WHERE bup.backup_finish_date > IsNull( @FromDateTime, DateAdd ( DAY, -1, GetDate()))
This will output a line for each backup that telegraf then feeds into influx directly.
I wouldn’t recommend setting up backup metrics gathering if you haven’t been maintaining your backup history. You could run into timeout issues if there is too much backup data. If you’re use Ola‘s maintenance scripts, there’s a job specifically to cleanup backup history that you can enable.
Next time we’ll setup a new Grafana dashboard to display the collected backup metrics.