Troubleshooting Pipeline Control gateway

This guide helps you troubleshoot common issues with Pipeline Control gateway. Problems are organized by symptom to help you quickly identify and resolve issues.

Installation issues

Insufficient user capabilities

Problem: You do not have the necessary permissions associated with Org Product Admin and Organization Manager.

Symptoms:

Error message stating "You don't have the required organization-level capabilities to set up the agent authentication"
Unable to complete gateway installation process

Solution:

Engage the admin(s) of your account to be granted a role that includes the necessary capabilities for system identity creation
Refer to user permissions documentation for guidance

Outdated Helm chart version

Problem: The command line script to install the Helm chart fails because your local version of Helm is outdated.

Symptoms:

Helm installation script fails with version compatibility errors
Error messages indicating the need for a Helm update

Solution:

Update your local Helm installation to the latest version to ensure compatibility with the installation script
Follow the instructions in the error message to upgrade Helm

Data transmission issues

These issues occur when data cannot flow from your agents or telemetry producers to the gateway, or from the gateway to New Relic.

DNS resolution failures

Problem: Agents are unable to connect to the gateway due to DNS resolution errors.

Symptoms:

Agents cannot reach the gateway endpoint
Connection errors in agent logs

Solution:

Access agent logs (via UI or directly) and search for connection errors
Consult your network administrator to adjust DNS configurations based on your infrastructure and network topology
Refer to agent documentation for more details

SSL certificate issues

Problem: There is a problem with the SSL certificate configuration between the agent and gateway.

Symptoms:

SSL connection errors in agent logs
Certificate validation failures
TLS handshake errors

Solution:

Check agent logs for SSL connection errors
Ensure SSL certificates are correctly configured and valid, considering your infrastructure and network topology
Verify certificate expiration dates and certificate chain
Refer to DNS and certificate configuration for more details

Gateway misconfiguration

Problem: Data reaches the gateway but fails to publish to New Relic.

Symptoms:

Gateway receives data from agents but nothing appears in New Relic
Pods failing to start or restarting repeatedly

Solution:

Check outgoing request and error metrics from the gateway
Review failed rule metrics to identify configuration issues
Inspect logs of pods that are not starting
Correct gateway configuration settings and ensure all pods are operational
Verify New Relic license key is correctly configured

Unsupported telemetry producer or protocol

Problem: Data is sent from an unsupported API or protocol.

Symptoms:

Gateway returns 501 status code (Not Implemented)
No data appears in New Relic despite successful agent connection

Solution:

Verify compatibility with New Relic's supported protocols (OTLP, New Relic agent protocols)
If using an unsupported protocol, file a feature request for support
Configure telemetry producer to send data directly to New Relic as a temporary workaround

Destructive rule dropping all data

Problem: A rule is dropping all data, preventing it from reaching New Relic.

Symptoms:

Data stops appearing in New Relic after rule deployment
Drop data metrics show high volumes being filtered

Solution:

Check drop data metrics in the gateway monitoring dashboard
Review your filter and sampling processor configurations
Modify or remove the destructive rule to allow data flow
Test rules in a non-production environment before deploying

Data missing after ingestion

Problem: Data is missing in the New Relic backend after ingestion.

Symptoms:

Gaps in telemetry data
Incomplete traces or log records

Solution:

Review error metrics and check for client-side timeouts
Assess resource exhaustion signs (CPU, memory, network)
Check New Relic status for platform issues
Examine gateway logs during the affected time period

Data reception issues

These issues occur when the gateway is running but not receiving data from telemetry producers.

Telemetry producer misconfiguration

Problem: The telemetry producer is misconfigured, resulting in no telemetry data being sent to the gateway.

Symptoms:

Gateway is running and healthy but receives no data
Gateway monitoring data is present but no application telemetry

Solution:

Access the producer logs to identify configuration errors
Verify the gateway endpoint URL is correctly configured in the agent or producer
Ensure the gateway port is reachable from the producer
Refer to the appropriate API, agent, or telemetry producer documentation for correct configuration steps
See Modify agent configuration for guidance

Rate limiting from New Relic

Problem: You're hitting rate limits on your telemetry data.

Symptoms:

Gateway HTTP client receives 429 status codes from the New Relic API
Events created in your account indicating rate limiting
Data appears intermittently or with delays

Solution:

Check gateway HTTP client response codes for 429 status codes
Review events created in your account indicating rate limiting
Refer to rate limiting documentation for guidance on managing and adjusting telemetry data rates
Consider using sampling processors to reduce data volume

Performance and health issues

These issues affect gateway performance, resource utilization, and data latency.

Resource exhaustion

Problem: The cluster has exhausted its CPU or memory resources.

Symptoms:

Cluster shows as unhealthy in the gateway page
Pods are pending or failing to start
Pod crashes or restarts
Out of memory (OOM) errors in logs

Solution:

Use the Kubernetes UI to view pod events and pending jobs to identify resource constraints
Increase the node pool sizes or adjust the resource limits (CPU and memory) requests for pods
Verify cloud provider limits for the active number of nodes and adjust configurations as needed
Review sizing and scaling guidance to right-size your deployment

Data arrival delays (latency beyond SLA)

Problem: Data is not being received within the expected time frame.

Symptoms:

Data arrives in New Relic but with significant delay
Latency metrics show high values
Processing queues are backing up

Solution:

Check latency metrics to identify delays in data transmission
Increase the minimum number of gateway pods to enhance processing capacity and reduce latency
Review autoscaling configuration to ensure it responds to load appropriately
Consider implementing sampling to reduce data volume during peak periods

Monitoring and diagnostics issues

These issues affect the gateway's ability to send its own monitoring data to New Relic.

Invalid license key

Problem: The gateway is configured with an invalid license key or one that has been rotated out.

Symptoms:

No gateway monitoring data appears in New Relic
403 errors from the internal monitoring pipeline and usage exporter in gateway logs
Agent data reaches New Relic successfully, but gateway metrics do not

Solution:

Access gateway logs directly to verify the issue. Look for 403 errors from the internal monitoring pipeline and usage exporter, but not for agent data
Ensure the license key is valid and correctly configured
Update the key in your gateway configuration if necessary
Redeploy the gateway after updating the license key

Cloud rule affecting gateway monitoring data

Problem: Metrics produced by the gateway are being unintentionally dropped by a cloud rule.

Symptoms:

Gateway monitoring data appears initially but then stops
Cloud rule usage data shows gateway metrics being dropped

Solution:

Review usage data from cloud rules to identify any unintended drops
Modify cloud rule configuration to exclude gateway metrics from being dropped
Ensure cloud rules have appropriate conditions to avoid dropping infrastructure metrics

Rate limiting on metrics API

Problem: You may have exceeded the request limit against the metrics API, causing subsequent requests from the gateway to fail with 429 response codes.

Symptoms:

429 response codes in gateway logs
Gateway monitoring data appears intermittently
Rate limiting events in your account

Solution:

Check for rate limiting events related to the OpenTelemetry metrics API in your account
Review your account's metric cardinality and volume
Refer to rate limiting documentation for guidance on managing and adjusting request rates

Configuration and deployment workflow issues

These issues affect the Pipeline Control UI workflow and deployment process for gateway configurations.

Pipeline Control UI shows no data

Problem: You cannot see any gateway data in the Pipeline Control UI.

Symptoms:

Gateway is operational and sending monitoring data to New Relic
Pipeline Control UI appears empty or shows no gateway information
Unable to view or edit gateway configuration

Solution:

Check the account dropdown in the Pipeline Control UI to ensure the correct account is selected
Ensure you're viewing All accounts or the specific account associated with the gateway license key
Verify the license key used by the gateway matches the account you're viewing in the UI

Configuration changes not taking effect

Problem: You've made changes to gateway configuration in the UI but they're not being applied.

Symptoms:

Modified pipeline configuration or processor settings but data processing hasn't changed
Expected rules aren't filtering or transforming data
Changes appear in UI but not in gateway behavior

Solution:

Check the updates page in Pipeline Control UI for pending deployments
Remember that changes are staged until explicitly deployed via Fleet Control/Agent Control
Click Deploy to push pending configuration changes to your gateway clusters
Verify deployment completes successfully and pods restart with new configuration
Check gateway pod logs for configuration validation errors during deployment

Configuration changes disappeared

Problem: Configuration changes vanished from the UI after saving.

Symptoms:

Made changes to pipelines or processors but they don't appear in the UI
Updates list doesn't show recent modifications
Changes seem to have been lost

Solution:

Check if multiple users are editing the gateway configuration simultaneously
API race condition: When multiple users send configuration updates concurrently, changes can overwrite each other
Review the updates page to see which changes were actually saved
Coordinate with team members to avoid simultaneous edits to the same gateway configuration
Redo any lost changes
Contact New Relic support if race conditions occur frequently

Data schema mismatch

Problem: Your filter or transform processor isn't matching or modifying data as expected because the attribute doesn't exist at the gateway level.

Symptoms:

Filter conditions don't match data you expect them to match
Transform statements don't find attributes to modify
Processor works in testing with NRDB data but not at the gateway
Attributes like entity.guid, appName, or entityGuid aren't accessible

Solution:

Understand that attributes available in NRDB may not exist at the gateway before enrichment
Review gateway data schema differences to see which attributes are unavailable at the gateway
Use attributes that exist in raw telemetry sent by your agents or collectors
For filtering based on enriched attributes (like entity.guid or appName), consider using cloud rules instead, which process data after enrichment
Verify your OTTL syntax is correct for accessing attributes (e.g., attributes["key"] vs direct field access)

ConfigMap deployment errors

Problem: A Kubernetes ConfigMap was updated with an error, preventing gateway pods from starting.

Symptoms:

Gateway pods fail to restart after configuration deployment
Pods are in CrashLoopBackOff or Error state
Gateway becomes unhealthy after pushing configuration changes
Configuration validation errors in pod logs

Solution:

Check pod status and logs for configuration errors:
bash
```
$kubectl get pods -n newrelic
$kubectl logs <pod-name> -n newrelic
```
Look for YAML syntax errors or invalid processor configurations
Verify ConfigMap content matches expected schema:
bash
```
$kubectl get configmap -n newrelic -o yaml
```

Roll back to the previous working configuration:

bash

$kubectl rollout undo deployment/<deployment-name> -n newrelic

Fix the configuration error in the Pipeline Control UI or directly in the ConfigMap
Re-deploy the corrected configuration
Verify pods restart successfully after applying the fix

Diagnostic commands

Use these commands to gather diagnostic information when troubleshooting gateway issues:

Check pod status

bash

$kubectl get pods -n newrelic

View pod logs

bash

$kubectl logs <pod-name> -n newrelic

Check pod resource usage

bash

$kubectl top pods -n newrelic

View pod events

bash

$kubectl describe pod <pod-name> -n newrelic

Check gateway configuration

bash

$kubectl get configmap -n newrelic -o yaml

Check deployment status

bash

$kubectl rollout status deployment/<deployment-name> -n newrelic

Next steps

If you continue to experience issues after following this troubleshooting guide:

Review gateway setup documentation to verify your configuration
Check sizing and scaling guidance to ensure appropriate resource allocation
Verify your load balancer configuration if using one
Contact New Relic Support with diagnostic information gathered from the commands above

Troubleshooting Pipeline Control gateway

Installation issues.css-21sua1{background:none;border:none;width:0;padding:0;}

Insufficient user capabilities

Outdated Helm chart version

Data transmission issues

DNS resolution failures

SSL certificate issues

Gateway misconfiguration

Unsupported telemetry producer or protocol

Destructive rule dropping all data

Data missing after ingestion

Data reception issues

Telemetry producer misconfiguration

Rate limiting from New Relic

Performance and health issues

Resource exhaustion

Data arrival delays (latency beyond SLA)

Monitoring and diagnostics issues

Invalid license key

Cloud rule affecting gateway monitoring data

Rate limiting on metrics API

Configuration and deployment workflow issues

Pipeline Control UI shows no data

Configuration changes not taking effect

Configuration changes disappeared

Data schema mismatch

ConfigMap deployment errors

Diagnostic commands

Check pod status

View pod logs

Check pod resource usage

View pod events

Check gateway configuration

Check deployment status

Next steps

Installation issues