Problem
You installed our on-host ECS integration and waited a few minutes, but your cluster is not showing in the entity list.
重要
We have two ECS integrations: a cloud-based integration and an on-host integration. This document is about the on-host integration.
Solution
If you'd previously installed the infrastructure agent or an infrastructure on-host integration, your data should appear in the UI within a few minutes.
If you had not previously done either of those things before installing the on-host ECS integration, it may take tens of minutes for data to appear in the UI. In that case, we recommend waiting up to an hour before doing the following troubleshooting steps or contacting support.
There are several options for troubleshooting no data appearing:
- Troubleshoot via the awscli tool (recommended when talking to New Relic technical support)
- Troubleshoot via the UI
For information about stopped tasks, see Stopped tasks reasons.
Troubleshoot via awscli
When interacting with New Relic support, use this method and send the generated files with your support request:
Retrieve the information related to the
newrelic-infra
service or the Fargate service that contains a task with anewrelic-infra
sidecar:bash$aws ecs describe-services --cluster YOUR_CLUSTER_NAME --service newrelic-infra > newrelic-infra-service.jsonbash$aws ecs describe-services --cluster YOUR_CLUSTER_NAME --service YOUR_FARGATE_SERVICE_WITH_NEW_RELIC_SIDECAR > newrelic-infra-sidecar-service.jsonThe
failures
attribute details any errors for the services.Under
services
is thestatus
attribute. It saysACTIVE
if the service has no issues.The
desiredCount
should match therunningCount
. This is the number of tasks the service is handling. Because we use the daemon service type, there should be one task per container instance in your cluster. ThependingCount
attribute should be zero, because all tasks should be running.Inspect the
events
attribute ofservices
to check for issues with scheduling or starting the tasks. For example: if the service is unable to start tasks successfully, it will display a message like:{"id": "5295a13c-34e6-41e1-96dd-8364c42cc7a9","createdAt": "2020-04-06T15:28:18.298000+02:00","message": "(service newrelic-ifnra) is unable to consistently start tasks successfully. For more information, see the Troubleshooting section of the Amazon ECS Developer Guide."}In the same section, you can also see which tasks were started by the service from the events:
{"id": "1c0a6ce2-de2e-49b2-b0ac-6458a804d0f0","createdAt": "2020-04-06T15:27:49.614000+02:00","message": "(service fargate-fail) has started 1 tasks: (task YOUR_TASK_ID)."}Retrieve the information related to the task with this command:
bash$aws ecs describe-tasks --tasks YOUR_TASK_ID --cluster YOUR_CLUSTER_NAME > newrelic-infra-task.jsonThe
desiredStatus
andlastStatus
should beRUNNING
. If the task couldn't start normally, it will have aSTOPPED
status.Inspect the
stopCode
andstoppedReason
. One reason example: a task that couldn't be started because the task execution role doesn't have the appropriate permissions to download the license-key-containing secret would have the following output:"stopCode": "TaskFailedToStart","stoppedAt": "2020-04-06T15:28:54.725000+02:00","stoppedReason": "Fetching secret data from AWS Secrets Manager in region YOUR_AWS_REGION: secret arn:aws:secretsmanager:YOUR_AWS_REGION:YOUR_AWS_ACCOUNT:secret:NewRelicLicenseKeySecret-Dh2dLkgV8VyJ-80RAHS-fail: AccessDeniedException: User: arn:aws:sts::YOUR_AWS_ACCOUNT:assumed-role/NewRelicECSIntegration-Ne-NewRelicECSTaskExecution-1C0ODHVT4HDNT/8637b461f0f94d649e9247e2f14c3803 is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:YOUR_AWS_REGION:YOUR_AWS_ACCOUNT:secret:NewRelicLicenseKeySecret-Dh2dLkgV8VyJ-80RAHS-fail-DmLHfs status code: 400, request id: 9cf1881e-14d7-4257-b4a8-be9b56e09e3c","stoppingAt": "2020-04-06T15:28:10.953000+02:00",If the task is running but you’re still not seeing data, generate verbose logs and examine them for errors.
For details about reasons for stopped tasks, see Stopped tasks.
Troubleshoot in the UI
To use the UI to troubleshoot:
- Log in to your AWS Console and navigate to the EC2 Container Service section.
- Click on the cluster where you installed the New Relic ECS integration.
- On the Services tab, use the filter to search for the integration service. If you used the automatic install script, the name of the service will be
newrelic-infra
. If you are using Fargate, it will be the name of your monitored service. Once found, click on the name. - The service page shows the Status of the service. It says
ACTIVE
if the service has no issues. - On the same page, the Desired count should match the Running count. This is the number of tasks the service is handling. Because we use the daemon service type, there should be one task per container instance in your cluster. Pending count should be zero, because all tasks should be running.
- Inspect the Events tab to check for issues with scheduling or starting the tasks.
- In the Tasks tab of your service, you can inspect the running tasks and the stopped tasks by clicking on the Task status selector. Containers that failed to start are shown when you select the Stopped status.
- Click on a task to go to the task details page. Under Stopped reason, it displays a message explaining why the task was stopped.
- If the task is running but you’re still not seeing data, generate verbose logs and examine them for errors.
For details about reasons for stopped tasks, see Stopped tasks.
Reasons for stopped tasks
In the AWS ECS troubleshooting documentation you can find information on common causes of errors related to running tasks and services. See below for details about some reasons for stopped tasks.
Task stopped with reason:
Fetching secret data from AWS Secrets Manager in region YOUR_AWS_REGION: secret arn:aws:secretsmanager:YOUR_AWS_REGION:YOUR_AWS_ACCOUNT:secret:YOUR_SECRET_NAME: AccessDeniedException: User: arn:aws:sts::YOUR_AWS_ACCOUNT:assumed-role/YOUR_ROLE_NAME is not authorized to perform: secretsmanager:GetSecretValue on resource: arn:aws:secretsmanager:YOUR_AWS_REGION:YOUR_AWS_ACCOUNT:secret:YOUR_SECRET_NAME status code: 400, request id: 9cf1881e-14d7-4257-b4a8-be9b56e09e3c"
This means that the IAM role specified using executionRoleArn
in the task definition doesn't have access to the secret used for the NRIA_LICENSE_KEY
. The execution role should have a policy attached that grants it access to read the secret.
Get the execution role of your task:
bash$aws ecs describe-task-definition --task-definition newrelic-infra --output text --query taskDefinition.executionRoleArnYou can replace the
--task-definition newrelic-infra
with the name of your fargate task that includes the sidecar container.bash$aws ecs describe-task-definition --task-definition YOUR_FARGATE_TASK_NAME --output text --query taskDefinition.executionRoleArnList the policies attached to role:
bash$aws iam list-attached-role-policies --role-name YOUR_EXECUTION_ROLE_NAMEThis should return 3 policies
AmazonECSTaskExecutionRolePolicy
,AmazonEC2ContainerServiceforEC2Role
and a third one that should grant read access to the . In the following example the policy it's namedNewRelicLicenseKeySecretReadAccess
.{"AttachedPolicies": [{"PolicyName": "AmazonECSTaskExecutionRolePolicy","PolicyArn": "arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy"},{"PolicyName": "AmazonEC2ContainerServiceforEC2Role","PolicyArn": "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"},{"PolicyName": "YOUR_POLICY_NAME","PolicyArn": "arn:aws:iam::YOUR_AWS_ACCOUNT:policy/YOUR_POLICY_NAME"}]}Retrieve the default policy version:
bash$aws iam get-policy-version --policy-arn arn:aws:iam::YOUR_AWS_ACCOUNT:policy/YOUR_POLICY_NAME --version-id $(aws iam get-policy --policy-arn arn:aws:iam::YOUR_AWS_ACCOUNT:policy/YOUR_POLICY_NAME --output text --query Policy.DefaultVersionId)This retrieves the policy permissions. There should be an entry for Action
secretsmanager:GetSecretValue
if you used AWS Secrets Manager to store your , or an entry forssm:GetParameters
if you used AWS Systems Manager Parameter Store: