Log parsing is the process of translating unstructured log data into attributes (key:value pairs) based on rules you define. You can use these attributes in your NRQL queries to facet or filter logs in useful ways.
New Relic parses log data automatically according to certain parsing rules. In this doc, you'll learn how logs parsing works, and how to create your own custom parsing rules.
You can also create, query, and manage your log parsing rules by using NerdGraph, our GraphQL API. A helpful tool for this is our Nerdgraph API explorer. For more information, see our NerdGraph tutorial for parsing.
Here's a 5-minute video about log parsing:
Parsing example
A good example is a default NGINX access log containing unstructured text. It's useful for searching but not much else. Here's an example of a typical line:
In an unparsed format, you would need to do a full text search to answer most questions. After parsing, the log is organized into attributes, like response code and request URL:
Parsing makes it easier to create custom queries that facet on those values. This helps you understand the distribution of response codes per request URL and quickly find problematic pages.
How log parsing works
Here's an overview of how New Relic implements parsing of logs:
Log parsing
How it works
What
Parsing is applied to a specific selected field. By default, the message field is used. However, any field/attribute can be chosen, even one that doesn't currently exist in your data.
Each parsing rule is created by using a NRQL WHERE clause that determines which logs the rule will attempt to parse.
To simplify the matching process, we recommend adding a logtype attribute to your logs. However, you are not limited to using logtype; one or more attributes can be used as matching criteria in the NRQL WHERE clause.
When
Parsing will only be applied once to each log message. If multiple parsing rules match the log, only the first that succeeds will be applied.
Parsing rules are unordered. If more than one parsing rules matches a log, one is chosen at random. Be sure to build your parsing rules so that they do not match the same logs.
Parsing takes place during log ingestion, before data is written to NRDB. Once data has been written to storage, it can no longer be parsed.
Parsing occurs in the pipeline before data enrichments take place. Be careful when defining the matching criteria for a parsing rule. If the criteria is based on an attribute that doesn't exist until after parsing or enrichment take place, that data won't be present in the logs when matching occurs. As a result, no parsing will happen.
How
Rules can be written in Grok, regex, or a mixture of the two. Grok is a collection of patterns that abstract away complicated regular expressions.
We support the Java Regex syntax in our Parsing UI. For attribute or field names in capture groups, Java Regex only allows for [A-Za-z0-9].
Parse attributes using Grok
Parsing patterns are specified using Grok, an industry standard for parsing log messages. Any incoming log with a logtype field will be checked against our built-in parsing rules, and if possible, the associated Grok pattern is applied to the log.
Grok is a superset of regular expressions that adds built-in named patterns to be used in place of literal complex regular expressions. For instance, instead of having to remember that an integer can be matched with the regular expression (?:[+-]?(?:[0-9]+)), you can just write %{INT} to use the Grok pattern INT, which represents the same regular expression.
PATTERN_NAME is one of the supported Grok patterns. The pattern name is just a user-friendly name representing a regular expression. They are exactly equal to the corresponding regular expression.
OPTIONAL_EXTRACTED_ATTRIBUTE_NAME, if provided, is the name of the attribute that will be added to your log message with the value matched by the pattern name. It's equivalent to using a named capture group using regular expressions. If this is not provided, then the parsing rule will just match a region of your string, but not extract an attribute with its value.
OPTIONAL_TYPE specifies the type of attribute value to extract. If omitted, values are extracted as strings. For instance, to extract the value 123 from "File Size: 123" as a number into attribute file_size, use value: %{INT:file_size:int}.
OPTIONAL_PARAMETER specifies an optional parameter for certain types. Currently only the datetime type takes a parameter, see below for details.
You can also use a mix of regular expressions and Grok pattern names in your matching string.
Click this link for a list of supported Grok patterns, and here for a list of supported Grok types.
Note that variable names must be explicitly set and be lowercase like %{URI:uri}. Expressions such as %{URI} or %{URI:URI} would not work.
A log record could look something like this:
{
"message":"54.3.120.2 2048 0"
}
This information is accurate, but it's not exactly intuitive what it means. Grok patterns help you extract and understand the telemetry data you want. For example, a log record like this is much easier to use:
{
"host_ip":"43.3.120.2",
"bytes_received":2048,
"bytes_sent":0
}
To do this, create a Grok pattern that extracts these three fields; for example:
After processing, your log record will include the fields host_ip, bytes_received, and bytes_sent. Now you can use these fields in New Relic to filter, facet, and perform statistical operations on your log data. For more details about how to parse logs with Grok patterns in New Relic, see our blog post.
If you have the correct permissions, you can create parsing rules in our UI to create, test, and enable Grok parsing. For example, to get a specific type of error message for one of your microservices called Inventory Services, you would create a Grok parsing rule that looks for a specific error message and product. To do this:
Give the rule a name; for example, Inventory Services error parsing.
Select an existing field to parse (default = message), or enter a new field name.
Identify the NRQL WHERE clause that acts as a pre-filter for the incoming logs; for example, entity.name='Inventory Service'. This pre-filter narrows down the number of logs that need to be processed by your rule, removing unnecessary processing.
Select a matching log if one exists, or click on the Paste log tab to paste in a sample log.
Add the Grok parsing rule; for example:
Inventory error: %{DATA:error_message} for product %{INT:product_id}
Where:
Inventory error: Your parsing rule's name
error_message: The error message you want to select
product_id: The product ID for Inventory Service
Enable and save the parsing rule.
Soon you will see that your Inventory Service logs are enriched with two new fields: error_message and product_id. From here, you can query on these fields, create charts and dashboards, set alerts, etc.
The OPTIONAL_TYPE field specifies the type of attribute value to extract. If omitted, values are extracted as strings.
Supported types are:
Type specified in Grok
Type stored in the New Relic database
boolean
boolean
byteshortintinteger
integer
long
long
float
float
double
double
string (default)
text
string
datedatetime
Time as a long
By default it is interpreted as ISO 8601. If OPTIONAL_PARAMETER is present, it specifies
the date and time pattern stringto use to interpret the datetime.
Note that this is only available during parsing. We have an additional, separate timestamp interpretation step that occurs for all logs later in the ingestion pipeline.
If you have multiline logs, be aware that the GREEDYDATA Grok pattern does not match newlines (it is equivalent to .*).
So instead of using %{GREEDYDATA:some_attribute} directly, you will need to add the multiline flag in front of it: (?s)%{GREEDYDATA:some_attribute}
The New Relic logs pipeline parses your log JSON messages by default, but sometimes you have JSON log messages that are mixed with plain text. In this situation, you may want to be able to parse them and then be able to filter using the JSON attributes.
If that is the case, you can use the jsongrok type, which will parse the JSON captured by the grok pattern. This format relies on 3 main parts: the grok syntax, the prefex you would like to assign to the parsed json attributes, and the jsongrok type. Using the jsongrok type, you can extract and parse JSON from logs that are not properly formatted; for example, if your logs are prefixed with a date/time string:
You can define the list of attributes to extract or drop with the options keepAttributes or dropAttributes.
For example, with the following Grok expression:
If you want to omit the my_attribute_prefix prefix and only keep the status attribute, you can include "noPrefix": true and "keepAttributes: ["status"] in the configuration.
If your JSON has been escaped, you can use the isEscaped option to be able to parse it.
If your JSON has been escaped and then quoted, you need to match the quotes as well, as shown below.
For example, with the following Grok expression:
To configure the jsonGrok type, use :json(_CONFIG_):
json({"dropOriginal": true}): Drop the JSON snippet that was used in parsing. When set to true (default value), the parsing rule will drop the original JSON snippet. Note the JSON attributes will remain in the message field.
json({"dropOriginal": false}): This will show the JSON payload that was extracted. When set to false, the full JSON-only payload will be displayed under an attribute named in my_attribute_prefix above. Note the JSON attributes will remain in the message field here as well giving the user 3 different views of the JSON data. If storage of all three versions is a concern it's recommended to use the default of true here.
json({"depth": 62}): Levels of depth you want to parse the JSON value (defaulted to 62).
json({"keepAttributes": ["attr1", "attr2", ..., "attrN"]}): Specifies which attributes will be extracted from the JSON. The provided list cannot be empty. If this configuration option is not set, all attributes are extracted.
json({"dropAttributes": ["attr1", "attr2", ..., "attrN"]}): Specifies which attributes to be dropped from the JSON. If this configuration option is not set, no attributes are dropped.
json({"noPrefix": true}): Set this option to true to remove the prefix from the attributes extracted from the JSON.
json({"isEscaped": true}): Set this option to true to parse JSON that has been escaped (which you typically see when JSON is stringified, for example {\"key\": \"value\"})
If your system sends comma-separated values (CSV) logs and you need to parse them in New Relic, you can use the csvGrok type, which parses the CSV captured by the Grok pattern.
This format relies on 3 main parts: the Grok syntax, the prefix you would like to assign to the parsed CSV attributes, and the csvGrok type. Using the csvGrok type, you can extract and parse CSV from logs.
It's mandatory to indicate the columns in the CSV Grok type configuration (which should be a valid JSON).
You can ignore any column by setting "_" (underscore) as the column name to drop it from the resulting object.
Optional configuration options:
While the "columns" configuration is mandatory, it's possible to change the parsing of the CSV with the following settings.
dropOriginal: (Defaults to true) Drop the CSV snippet used in parsing. When set to true (default value), the parsing rule drops the original field.
noPrefix: (Defaults to false) Doesn't include the Grok field name as prefix on the resulting object.
separator: (Default to ,) Defines the character/string that split each column.
Another common scenario is tab-separated values (TSV), for that you should indicate \t as separator, ex. %{GREEDYDATA:log:csv({"columns": ["timestamp", "status", "method", "url", "time", "bytes"], "separator": "\t"})
quoteChar: (Default to ") Defines the character that optionally surrounds a column content.
If your system sends logs containing IPv4 addresses, New Relic can locate them geographically and enrich log events with the specified attributes. You can use the geoGrok type, which finds the position of an IP address captured by the Grok pattern. This format can be configured to return one or more fields related to the address, such as the city, country, and latitude/longitude of the IP.
It's mandatory to specify the desired lookup fields returned by the geo action. At least one item is required from the following options.
city: Name of city
countryCode: Abbreviation of country
countryName: Name of country
latitude: Latitude
longitude: Longitude
postalCode: Postal code, zip code, or similar
region: Abbreviation of state, province, or territory
regionName: Name of state, province, or territory
Organizing by logtype
New Relic's log ingestion pipeline can parse data by matching a log event to a rule that describes how the log should be parsed. There are two ways log events can be parsed:
Rules are a combination of matching logic and parsing logic. Matching is done by defining a query match on an attribute of the logs. Rules aren't applied retroactively. Logs collected before a rule is created aren't parsed by that rule.
The simplest way to organize your logs and how they're parsed is to include the logtype field in your log event. This tells New Relic what built-in rule to apply to the logs.
중요
Once a parsing rule is active, data parsed by the rule is permanently changed.
This can't be reverted.
Limits
Parsing is computationally expensive, which introduces risk. Parsing is done for custom rules defined in an account and for matching patterns to a log. A large number of patterns or poorly defined custom rules will consume a huge amount of memory and CPU resources while also taking a very long time to complete.
In order to prevent problems, we apply two parsing limits: per-message-per-rule and per-account.
Limit
Description
Per-message-per-rule
The per-message-per-rule limit prevents the time spent parsing any single message from being greater than 100 ms. If that limit is reached, the system will cease attempting to parse the log message with that rule.
The ingestion pipeline will attempt to run any other applicable on that message, and the message will still be passed through the ingestion pipeline and stored in NRDB. The log message will be in its original, unparsed format.
Per-account
The per-account limit exists to prevent accounts from using more than their fair share of resources. The limit considers the total time spent processing all log messages for an account per-minute.
팁
To easily check if your rate limits have been reached, go to your system
Limits page in the New Relic UI.
Built-in parsing rules
Common log formats have well-established parsing rules already created for them. To get the benefit of built-in parsing rules, add the logtype attribute when forwarding logs. Set the value to something listed in the following table, and the rules for that type of log will be applied automatically.
List of built-in rules
The following logtype attribute values map to a predefined parsing rule. For example, to query the Application Load Balancer:
From the New Relic UI, use the format logtype:"alb".
When aggregating logs, it's important to provide metadata that makes it easy to organize, search, and parse those logs. One simple way of doing this is to add the attribute logtype to the log messages when they're shipped. Built-in parsing rules are applied by default to certain logtype values.
팁
The fields logType, logtype, and LOGTYPE are all supported for built-in rules. For ease of searching, we recommend that you align on a single syntax in your organization.
Add logtype as an attribute. You must set the logtype for each named source.
logs:
-name: file-simple
file: /path/to/file
attributes:
logtype: fileRaw
-name: nginx-example
file: /var/log/nginx.log
attributes:
logtype: nginx
Add a filter block to the .conf file, which uses a record_transformer to add a new field. In this example we use a logtype of nginx to trigger the build-in NGINX parsing rule. Check out other Fluentd examples.
<filter containers>
@type record_transformer
enable_ruby true
<record>
#Add logtype to trigger a built-in parsing rule for nginx access logs
logtype nginx
#Set timestamp from the value contained in the field "time"
timestamp record["time"]
#Add hostname and tag fields to all records
hostname "#{Socket.gethostname}"
tag ${tag}
</record>
</filter>
Add a filter block to the .conf file that uses a record_modifier to add a new field. In this example we use a logtype of nginx to trigger the build-in NGINX parsing rule. Check out other Fluent Bit examples.
[FILTER]
Name record_modifier
Match *
Record logtype nginx
Record hostname ${HOSTNAME}
Record service_name Sample-App-Name
Add a filter block to the Logstash configuration which uses an add_field mutate filter to add a new field. In this example we use a logtype of nginx to trigger the build-in NGINX parsing rule. Check out other Logstash examples.
filter {
mutate {
add_field=> {
"logtype"=> "nginx"
"service_name"=> "myservicename"
"hostname"=> "%{host}"
}
}
}
You can add attributes to the JSON request sent to New Relic. In this example we add a logtype attribute of value nginx to trigger the built-in NGINX parsing rule. Learn more about using the Logs API.
POST /log/v1 HTTP/1.1
Host: log-api.newrelic.com
Content-Type: application/json
X-License-Key: YOUR_LICENSE_KEY
Accept: */*
Content-Length: 133
{
"timestamp": TIMESTAMP_IN_UNIX_EPOCH,
"message": "User 'xyz' logged in",
"logtype": "nginx",
"service": "login-service",
"hostname": "login.example.com"
}
Create and view custom parsing rules
Many logs are formatted or structured in a unique way. In order to parse them, custom logic must be built and applied.
From the left nav in the logs UI, select Parsing, then create your own custom parsing rule with a valid NRQL WHERE clause and Grok pattern.
To create and manage your own, custom parsing rules:
From Manage data on the left nav of the logs UI, click Parsing, then click Create parsing rule.
Enter a name for the new parsing rule.
Select an existing field to parse (default = message), or enter a new field name.
Enter a valid NRQL WHERE clause to match the logs you want to parse.
Select a matching log if one exists, or click on the Paste log tab to paste in a sample log. Note that if you copy text from the logs UI or the query builder to paste into the parsing UI, ensure that it's the Unformatted version.
From Manage data on the left nav of the logs UI, click Parsing.
Troubleshooting
If parsing isn't working the way you intended, it may be due to:
Logic: The parsing rule matching logic doesn't match the logs you want.
Timing: If your parsing matching rule targets a value that doesn't exist yet, it will fail. This can occur if the value is added later in the pipeline as part of the enrichment process.
Limits: There is a fixed amount of time available every minute to process logs via parsing, patterns, drop filters, etc. If the maximum amount of time has been spent, parsing will be skipped for additional log event records.