This guide helps you troubleshoot common issues with Pipeline Control gateway. Problems are organized by symptom to help you quickly identify and resolve issues.
Installation issues
Insufficient user capabilities
Problem: You do not have the necessary permissions associated with Org Product Admin and Organization Manager.
Symptoms:
- Error message stating "You don't have the required organization-level capabilities to set up the agent authentication"
- Unable to complete gateway installation process
Solution:
- Engage the admin(s) of your account to be granted a role that includes the necessary capabilities for system identity creation
- Refer to user permissions documentation for guidance
Outdated Helm chart version
Problem: The command line script to install the Helm chart fails because your local version of Helm is outdated.
Symptoms:
- Helm installation script fails with version compatibility errors
- Error messages indicating the need for a Helm update
Solution:
- Update your local Helm installation to the latest version to ensure compatibility with the installation script
- Follow the instructions in the error message to upgrade Helm
Data transmission issues
These issues occur when data cannot flow from your agents or telemetry producers to the gateway, or from the gateway to New Relic.
DNS resolution failures
Problem: Agents are unable to connect to the gateway due to DNS resolution errors.
Symptoms:
- Agents cannot reach the gateway endpoint
- Connection errors in agent logs
Solution:
- Access agent logs (via UI or directly) and search for connection errors
- Consult your network administrator to adjust DNS configurations based on your infrastructure and network topology
- Refer to agent documentation for more details
SSL certificate issues
Problem: There is a problem with the SSL certificate configuration between the agent and gateway.
Symptoms:
- SSL connection errors in agent logs
- Certificate validation failures
- TLS handshake errors
Solution:
- Check agent logs for SSL connection errors
- Ensure SSL certificates are correctly configured and valid, considering your infrastructure and network topology
- Verify certificate expiration dates and certificate chain
- Refer to DNS and certificate configuration for more details
Gateway misconfiguration
Problem: Data reaches the gateway but fails to publish to New Relic.
Symptoms:
- Gateway receives data from agents but nothing appears in New Relic
- Pods failing to start or restarting repeatedly
Solution:
- Check outgoing request and error metrics from the gateway
- Review failed rule metrics to identify configuration issues
- Inspect logs of pods that are not starting
- Correct gateway configuration settings and ensure all pods are operational
- Verify New Relic license key is correctly configured
Unsupported telemetry producer or protocol
Problem: Data is sent from an unsupported API or protocol.
Symptoms:
- Gateway returns
501status code (Not Implemented) - No data appears in New Relic despite successful agent connection
Solution:
- Verify compatibility with New Relic's supported protocols (OTLP, New Relic agent protocols)
- If using an unsupported protocol, file a feature request for support
- Configure telemetry producer to send data directly to New Relic as a temporary workaround
Destructive rule dropping all data
Problem: A rule is dropping all data, preventing it from reaching New Relic.
Symptoms:
- Data stops appearing in New Relic after rule deployment
- Drop data metrics show high volumes being filtered
Solution:
- Check drop data metrics in the gateway monitoring dashboard
- Review your filter and sampling processor configurations
- Modify or remove the destructive rule to allow data flow
- Test rules in a non-production environment before deploying
Data missing after ingestion
Problem: Data is missing in the New Relic backend after ingestion.
Symptoms:
- Gaps in telemetry data
- Incomplete traces or log records
Solution:
- Review error metrics and check for client-side timeouts
- Assess resource exhaustion signs (CPU, memory, network)
- Check New Relic status for platform issues
- Examine gateway logs during the affected time period
Data reception issues
These issues occur when the gateway is running but not receiving data from telemetry producers.
Telemetry producer misconfiguration
Problem: The telemetry producer is misconfigured, resulting in no telemetry data being sent to the gateway.
Symptoms:
- Gateway is running and healthy but receives no data
- Gateway monitoring data is present but no application telemetry
Solution:
- Access the producer logs to identify configuration errors
- Verify the gateway endpoint URL is correctly configured in the agent or producer
- Ensure the gateway port is reachable from the producer
- Refer to the appropriate API, agent, or telemetry producer documentation for correct configuration steps
- See Modify agent configuration for guidance
Rate limiting from New Relic
Problem: You're hitting rate limits on your telemetry data.
Symptoms:
- Gateway HTTP client receives 429 status codes from the New Relic API
- Events created in your account indicating rate limiting
- Data appears intermittently or with delays
Solution:
- Check gateway HTTP client response codes for 429 status codes
- Review events created in your account indicating rate limiting
- Refer to rate limiting documentation for guidance on managing and adjusting telemetry data rates
- Consider using sampling processors to reduce data volume
Performance and health issues
These issues affect gateway performance, resource utilization, and data latency.
Resource exhaustion
Problem: The cluster has exhausted its CPU or memory resources.
Symptoms:
- Cluster shows as unhealthy in the gateway page
- Pods are pending or failing to start
- Pod crashes or restarts
- Out of memory (OOM) errors in logs
Solution:
- Use the Kubernetes UI to view pod events and pending jobs to identify resource constraints
- Increase the node pool sizes or adjust the resource limits (CPU and memory) requests for pods
- Verify cloud provider limits for the active number of nodes and adjust configurations as needed
- Review sizing and scaling guidance to right-size your deployment
Data arrival delays (latency beyond SLA)
Problem: Data is not being received within the expected time frame.
Symptoms:
- Data arrives in New Relic but with significant delay
- Latency metrics show high values
- Processing queues are backing up
Solution:
- Check latency metrics to identify delays in data transmission
- Increase the minimum number of gateway pods to enhance processing capacity and reduce latency
- Review autoscaling configuration to ensure it responds to load appropriately
- Consider implementing sampling to reduce data volume during peak periods
Monitoring and diagnostics issues
These issues affect the gateway's ability to send its own monitoring data to New Relic.
Invalid license key
Problem: The gateway is configured with an invalid license key or one that has been rotated out.
Symptoms:
- No gateway monitoring data appears in New Relic
- 403 errors from the internal monitoring pipeline and usage exporter in gateway logs
- Agent data reaches New Relic successfully, but gateway metrics do not
Solution:
- Access gateway logs directly to verify the issue. Look for 403 errors from the internal monitoring pipeline and usage exporter, but not for agent data
- Ensure the license key is valid and correctly configured
- Update the key in your gateway configuration if necessary
- Redeploy the gateway after updating the license key
Cloud rule affecting gateway monitoring data
Problem: Metrics produced by the gateway are being unintentionally dropped by a cloud rule.
Symptoms:
- Gateway monitoring data appears initially but then stops
- Cloud rule usage data shows gateway metrics being dropped
Solution:
- Review usage data from cloud rules to identify any unintended drops
- Modify cloud rule configuration to exclude gateway metrics from being dropped
- Ensure cloud rules have appropriate conditions to avoid dropping infrastructure metrics
Rate limiting on metrics API
Problem: You may have exceeded the request limit against the metrics API, causing subsequent requests from the gateway to fail with 429 response codes.
Symptoms:
- 429 response codes in gateway logs
- Gateway monitoring data appears intermittently
- Rate limiting events in your account
Solution:
- Check for rate limiting events related to the OpenTelemetry metrics API in your account
- Review your account's metric cardinality and volume
- Refer to rate limiting documentation for guidance on managing and adjusting request rates
Configuration and deployment workflow issues
These issues affect the Pipeline Control UI workflow and deployment process for gateway configurations.
Pipeline Control UI shows no data
Problem: You cannot see any gateway data in the Pipeline Control UI.
Symptoms:
- Gateway is operational and sending monitoring data to New Relic
- Pipeline Control UI appears empty or shows no gateway information
- Unable to view or edit gateway configuration
Solution:
- Check the account dropdown in the Pipeline Control UI to ensure the correct account is selected
- Ensure you're viewing All accounts or the specific account associated with the gateway license key
- Verify the license key used by the gateway matches the account you're viewing in the UI
Configuration changes not taking effect
Problem: You've made changes to gateway configuration in the UI but they're not being applied.
Symptoms:
- Modified pipeline configuration or processor settings but data processing hasn't changed
- Expected rules aren't filtering or transforming data
- Changes appear in UI but not in gateway behavior
Solution:
- Check the updates page in Pipeline Control UI for pending deployments
- Remember that changes are staged until explicitly deployed via Fleet Control/Agent Control
- Click Deploy to push pending configuration changes to your gateway clusters
- Verify deployment completes successfully and pods restart with new configuration
- Check gateway pod logs for configuration validation errors during deployment
Configuration changes disappeared
Problem: Configuration changes vanished from the UI after saving.
Symptoms:
- Made changes to pipelines or processors but they don't appear in the UI
- Updates list doesn't show recent modifications
- Changes seem to have been lost
Solution:
- Check if multiple users are editing the gateway configuration simultaneously
- API race condition: When multiple users send configuration updates concurrently, changes can overwrite each other
- Review the updates page to see which changes were actually saved
- Coordinate with team members to avoid simultaneous edits to the same gateway configuration
- Redo any lost changes
- Contact New Relic support if race conditions occur frequently
Data schema mismatch
Problem: Your filter or transform processor isn't matching or modifying data as expected because the attribute doesn't exist at the gateway level.
Symptoms:
- Filter conditions don't match data you expect them to match
- Transform statements don't find attributes to modify
- Processor works in testing with NRDB data but not at the gateway
- Attributes like
entity.guid,appName, orentityGuidaren't accessible
Solution:
- Understand that attributes available in NRDB may not exist at the gateway before enrichment
- Review gateway data schema differences to see which attributes are unavailable at the gateway
- Use attributes that exist in raw telemetry sent by your agents or collectors
- For filtering based on enriched attributes (like
entity.guidorappName), consider using cloud rules instead, which process data after enrichment - Verify your OTTL syntax is correct for accessing attributes (e.g.,
attributes["key"]vs direct field access)
ConfigMap deployment errors
Problem: A Kubernetes ConfigMap was updated with an error, preventing gateway pods from starting.
Symptoms:
- Gateway pods fail to restart after configuration deployment
- Pods are in CrashLoopBackOff or Error state
- Gateway becomes unhealthy after pushing configuration changes
- Configuration validation errors in pod logs
Solution:
- Check pod status and logs for configuration errors:bash$kubectl get pods -n newrelic$kubectl logs <pod-name> -n newrelic
- Look for YAML syntax errors or invalid processor configurations
- Verify ConfigMap content matches expected schema:bash$kubectl get configmap -n newrelic -o yaml
- Roll back to the previous working configuration:bash$kubectl rollout undo deployment/<deployment-name> -n newrelic
- Fix the configuration error in the Pipeline Control UI or directly in the ConfigMap
- Re-deploy the corrected configuration
- Verify pods restart successfully after applying the fix
Diagnostic commands
Use these commands to gather diagnostic information when troubleshooting gateway issues:
Check pod status
$kubectl get pods -n newrelicView pod logs
$kubectl logs <pod-name> -n newrelicCheck pod resource usage
$kubectl top pods -n newrelicView pod events
$kubectl describe pod <pod-name> -n newrelicCheck gateway configuration
$kubectl get configmap -n newrelic -o yamlCheck deployment status
$kubectl rollout status deployment/<deployment-name> -n newrelicNext steps
If you continue to experience issues after following this troubleshooting guide:
- Review gateway setup documentation to verify your configuration
- Check sizing and scaling guidance to ensure appropriate resource allocation
- Verify your load balancer configuration if using one
- Contact New Relic Support with diagnostic information gathered from the commands above