• /
  • EnglishEspañolFrançais日本語한국어Português
  • ログイン今すぐ開始

Synthetics job manager configuration

This doc will guide you through configuring your synthetics job manager by showing you how to:

Configuration using environment variables

Environmental variables allow you to fine-tune the synthetics job manager configuration to meet your specific environmental and functional needs.

User-defined variables for scripted monitors

Private synthetics job managers let you configure environment variables for scripted monitors. These variables are managed locally on the SJM and can be accessed via $env.USER_DEFINED_VARIABLES. You can set user-defined variables in two ways. You can mount a JSON file or you can supply an environment variable to the SJM on launch. If both are provided, the SJM will only use values provided by the environment.

Accessing user-defined environment variables from scripts

To reference a configured user-defined environment variable, use the reserved $env.USER_DEFINED_VARIABLES followed by the name of a given variable with dot notation (for example, $env.USER_DEFINED_VARIABLES.MY_VARIABLE).

注意

User-defined environment variables are not sanitized from logs. Consider using the secure credentials feature for sensitive information.

Custom node modules

Custom node modules are provided in both CPM and SJM. They allow you to create a customized set of node modules and use them in scripted monitors (scripted API and scripted browser) for synthetic monitoring.

Set up your custom modules directory

Create a directory with a package.json file following npm official guidelines in the root folder. The SJM will install any dependencies listed in the package.json's dependencies field. These dependencies will be available when running monitors on the private synthetics job manager. See an example of this below.

Example

In this example, a custom module directory is used with the following structure:

/example-custom-modules-dir/
├── counter
│ ├── index.js
│ └── package.json
└── package.json ⇦ the only mandatory file

The package.json defines dependencies as both a local module (for example, counter) and any hosted modules (for example, smallest version 1.0.1):

{
"name": "custom-modules",
"version": "1.0.0", ⇦ optional
"description": "example custom modules directory", ⇦ optional
"dependencies": {
"smallest": "1.0.1", ⇦ hosted module
"counter": "file:./counter" ⇦ local module
}
}

Add your custom modules directory to the SJM for Docker, Podman, or Kubernetes

To check if the modules were installed correctly or if any errors occurred, look for the following lines in the synthetics-job-manager container or pod logs:

2024-06-29 03:51:28,407{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Detected mounted path for custom node modules
2024-06-29 03:51:28,408{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Validating permission for custom node modules package.json file
2024-06-29 03:51:28,409{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Installing custom node modules...
2024-06-29 03:51:44,670{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Custom node modules installed successfully.

Now you can add "require('smallest');" into the script of monitors you send to this private location.

Change package.json for custom modules

In addition to local and hosted modules, you can utilize Node.js modules as well. To update the custom modules used by your SJM, make changes to the package.json file, and restart the SJM. During the reboot process, the SJM will recognize the configuration change and automatically perform cleanup and re-installation operations to ensure the updated modules are applied.

注意

Local modules: While your package.json can include any local module, these modules must reside inside the tree under your custom module directory. If stored outside the tree, the initialization process will fail and you will see an error message in the docker logs after launching SJM.

Permanent data storage

Users may want to use permanent data storage to provide the user_defined_variables.json file or support custom node modules.

Docker

To set permanent data storage on Docker:

  1. Create a directory on the host where you are launching the Job Manager. This is your source directory.

  2. Launch the Job Manager, mounting the source directory to the target directory /var/lib/newrelic/synthetics.

    Example:

    bash
    $
    docker run ... -v /sjm-volume:/var/lib/newrelic/synthetics:rw ...

Podman

To set permanent data storage on Podman:

  1. Create a directory on the host where you are launching the Job Manager. This is your source directory.
  2. Launch the Job Manager, mounting the source directory to the target directory /var/lib/newrelic/synthetics.

Example:

bash
$
podman run ... -v /sjm-volume:/var/lib/newrelic/synthetics:rw,z ...

Kubernetes

To set permanent data storage on Kubernetes, the user has two options:

  1. Provide an existing PersistentVolumeClaim (PVC) for an existing PersistentVolume (PV), setting the synthetics.persistence.existingClaimName configuration value. Example:

    bash
    $
    helm install ... --set synthetics.persistence.existingClaimName=sjm-claim ...
  2. Provide an existing PersistentVolume (PV) name, setting the synthetics.persistence.existingVolumeName configuration value. Helm will generate a PVC for the user. The user may optionally set the following values as well:

  • synthetics.persistence.storageClass: The storage class of the existing PV. If not provided, Kubernetes will use the default storage class.

  • synthetics.persistence.size: The size for the claim. If not set, the default is currently 2Gi.

    bash
    $
    helm install ... --set synthetics.persistence.existingVolumeName=sjm-volume --set synthetics.persistence.storageClass=standard ...

Sizing considerations for Docker and Podman

To ensure your private location runs efficiently, you must provision enough CPU resources on your host to handle your monitoring workload. Many factors impact sizing, but you can quickly estimate your needs. You'll need 1 CPU core for each heavyweight monitor (i.e., simple browser, scripted browser, or scripted API monitor). Below are two formulas to help you calculate the number of cores you need, whether you're diagnosing a current setup or planning for a future one.

Formula 1: Diagnosing an Existing Location

If your current private location is struggling to keep up and you suspect jobs are queuing, use this formula to find out how many cores you actually need. It's based on the observable performance of your system.

Cest=(Rproc+Rgrowth)Davg,mC_{est} = (R_{proc} + R_{growth}) \cdot D_{avg,m}
  • CestC_{est} = Estimated CPU Cores.
  • RprocR_{proc} = The rate of heavyweight jobs being processed per minute.
  • RgrowthR_{growth} = The rate your jobManagerHeavyweightJobs queue is growing per minute.
  • Davg,mD_{avg,m} = The average duration of heavyweight jobs in minutes.

This formula calculates your true job arrival rate by adding the jobs your system is processing to the jobs that are piling up in the queue. Multiplying this total load by the average job duration tells you exactly how many cores you need to clear all the work without queuing.

Formula 2: Forecasting a New or Future Location

If you're setting up a new private location or planning to add more monitors, use this formula to forecast your needs ahead of time.

Cest=NmonDavg,m1Pavg,mC_{est} = N_{mon} \cdot D_{avg,m} \cdot \frac{1}P_{avg,m}
  • CestC_{est} = Estimated CPU Cores.
  • NmonN_{mon} = The total number of heavyweight monitors you plan to run.
  • Davg,mD_{avg,m} = The average duration of a heavyweight job in minutes.
  • Pavg,mP_{avg,m} = The average period for heavyweight monitors in minutes (e.g., a monitor that runs every 5 minutes has Pavg,m=5P_{avg,m} = 5).

This calculates your expected workload from first principles: how many monitors you have, how often they run, and how long they take.

Important sizing factors

When using these formulas, remember to account for these factors:

  • Job duration (Davg,mD_{avg,m}): Your average should include jobs that time out (often ~3 minutes), as these hold a core for their entire duration.
  • Job failures and retries: When a monitor fails, it's automatically retried. These retries are additional jobs that add to the total load. A monitor that consistently fails and retries effectively multiplies its period, significantly impacting throughput.
  • Scaling out: In addition to adding more cores to a host (scaling up), you can deploy additional synthetics job managers with the same private location key to load balance jobs across multiple environments (scaling out).

It's important to note that a single Synthetics Job Manager (SJM) has a throughput limit of approximately 15 heavyweight jobs per minute. This is due to an internal threading strategy that favors the efficient competition of jobs across multiple SJMs over the raw number of jobs processed per SJM. If your calculations indicate a need for higher throughput, you must scale out by deploying additional SJMs. You can check if your job queue is growing to determine if more SJMs are needed.

Adding more SJMs with the same private location key provides several advantages:

  • Load balancing: Jobs for the private location are distributed across all available SJMs.
  • Failover protection: If one SJM instance goes down, others can continue processing jobs.
  • Higher total throughput: The total throughput for your private location becomes the sum of the throughput from each SJM (e.g., two SJMs provide up to ~30 jobs/minute).

NRQL queries for diagnosis

You can run these queries in the query builder to get the inputs for the diagnostic formula. Make sure to set the time range to a long enough period to get a stable average.

1. Find the rate of jobs processed per minute (RprocR_{proc}): This query counts the number of non-ping (heavyweight) jobs completed over the last day and shows the average rate per minute.

FROM SyntheticCheck
SELECT rate(uniqueCount(id), 1 minute) AS 'job rate per minute'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE'
SINCE 1 day ago

2. Find the rate of queue growth per minute (RgrowthR_{growth}): This query calculates the average per-minute growth of the jobManagerHeavyweightJobs queue on a time series chart. A line above zero indicates the queue is growing, while a line below zero means it's shrinking.

FROM SyntheticsPrivateLocationStatus
SELECT derivative(jobManagerHeavyweightJobs, 1 minute) AS 'queue growth rate per minute'
WHERE name = 'YOUR_PRIVATE_LOCATION'
TIMESERIES SINCE 1 day ago

ヒント

Make sure to select the account where the private location exists. It's best to view this query as a time series because the derivative function can vary wildly. The goal is to get an estimate of the rate of queue growth per minute. Play with different time ranges to see what works best.

3. Find total number of heavyweight monitors (NmonN_{mon}): This query finds the unique count of heavyweight monitors.

FROM SyntheticCheck
SELECT uniqueCount(monitorId) AS 'monitor count'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE'
SINCE 1 day ago

4. Find average job duration in minutes (Davg,mD_{avg,m}): This query finds the average execution duration of completed non-ping jobs and converts the result from milliseconds to minutes. executionDuration represents the time the job took to execute on the host.

FROM SyntheticCheck
SELECT average(executionDuration)/60e3 AS 'avg job duration (m)'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE'
SINCE 1 day ago

5. Find average heavyweight monitor period (Pavg,mP_{avg,m}): If the private location's jobManagerHeavyweightJobs queue is growing, it isn't accurate to calculate the average monitor period from existing results. This will need to be estimated from the list of monitors on the Synthetic Monitors page. Make sure to select the correct New Relic account and you may need to filter by privateLocation.

ヒント

Synthetic monitors may exist in multiple sub accounts. If you have more sub accounts than can be selected in the query builder, choose the accounts with the most monitors.

Note about ping monitors and the pingJobs queue

Ping monitors are different. They are lightweight jobs that do not consume a full CPU core each. Instead, they use a separate queue (pingJobs) and run on a pool of worker threads.

While they are less resource-intensive, a high volume of ping jobs, especially failing ones, can still cause performance issues. Keep these points in mind:

  • Resource model: Ping jobs utilize worker threads, not dedicated CPU cores. The core-per-job calculation does not apply to them.
  • Timeout and retry: A failing ping job can occupy a worker thread for up to 60 seconds. It first attempts an HTTP HEAD request (30-second timeout). If that fails, it immediately retries with an HTTP GET request (another 30-second timeout).
  • Scaling: Although the sizing formula is different, the same principles apply. To handle a large volume of ping jobs and keep the pingJobs queue from growing, you may need to scale up and/or scale out. Scaling up means increasing cpu and memory resources per host or namespace. Scaling out means adding more instances of the ping runtime. This can be done by deploying more job managers on more hosts, in more namespaces, or even within the same namespace. Alternatively, the ping-runtime in Kubernetes allows you to set a larger number of replicas per deployment.

Sizing considerations for Kubernetes and OpenShift

Each runtime used by the Kubernetes and OpenShift synthetic job manager can be sized independently by setting values in the helm chart. The node-api-runtime and node-browser-runtime are sized independently using a combination of the parallelism and completions settings.

  • The parallelism setting controls how many pods of a particular runtime run concurrently.
  • The completions setting controls how many pods must complete before the CronJob starts another Kubernetes Job for that runtime.

Best practices for sizing your deployment

It's often not possible to precisely calculate the needed parallelism and completions values because the average duration as seen in New Relic might not be accurate, especially if the existing private location is not working well. Follow this practical approach to dial in parallelism and completions. The equations below can be used to get ballpark values to start from.

1. Estimate completions and parallelism

Do your best to estimate the average execution duration and number of jobs per 5 minutes. This provides you with a ballpark starting point for the next step, which will involve trial and error to tune the parallelism and completions values in a working cluster. Make sure to scale them proportionally, for example, going from the defaults of 1 and 6 to 10 and 60.

Estimated Completions: This determines how long your 5-minute job load will take to complete.

-- Get average execution duration in minutes
FROM SyntheticCheck
SELECT average(executionDuration / 60e3) AS 'Avg Duration (min)'
WHERE type != 'SIMPLE' AND location = 'YOUR_PRIVATE_LOCATION'
SINCE 1 hour ago
Completions=5Davg,mCompletions = \frac{5}D_{avg,m}

Where Davg,mD_{avg,m} is your average job execution duration in minutes.

Estimated Parallelism: This determines how many workers (pods) you need running concurrently to handle your 5-minute job load.

-- Get jobs per 5 minutes
FROM SyntheticCheck
SELECT rate(uniqueCount(id), 5 minutes) AS 'Number of monitor jobs per 5 minutes'
WHERE type != 'SIMPLE' AND location = 'YOUR_PRIVATE_LOCATION'
SINCE 1 hour ago
Pest=NmCompletionsP_{est} = \frac{N_m}{Completions}

Where NmN_m is your number of jobs per 5 minutes. This PestP_{est} value is your estimated parallelism.

2. Perform a Helm deploy

Perform a Helm deploy with estimated parallelism and completions values, and your best guess for ping-runtime.replicaCount given the number of cpu cores per node and the number of ping monitors that need to run per minute.

3. Monitor queue growth

With the synthetic monitors configured to send jobs to the private location, check for queue growth on a timeseries line chart for pingJobs and jobManagerHeavyweightJobs.

  • If the pingJobs queue has a positive slope, increase ping-runtime.replicaCount and redeploy.
  • If the jobManagerHeavyweightJobs queue has a positive slope, increase parallelism and completions proportionally until the queue is no longer growing (negative slope).

A negative slope indicates that the job manager has enough parallelism to handle the job demand. It will eventually reach zero with a negative slope.

FROM SyntheticsPrivateLocationStatus
SELECT average(jobManagerHeavyweightJobs) AS 'Heavyweight Queue Growth', average(pingJobs) AS 'Ping Queue Growth'
WHERE name = 'YOUR_PRIVATE_LOCATION'
SINCE 1 day ago TIMESERIES

4. Tune based on pod running state

With the queue decreasing or at zero, check for node-api-runtime and node-browser-runtime pods that are in a "running" state for 10+ minutes. This indicates that parallelism is set too high and there are more pods than needed.

To avoid wasting resources unnecessarily, decrease parallelism and completions to reduce the age of each "running" runtime pod. If targeting a Kubernets job age of 5 minutes, runtime pods should be in a running state for less than 5 minutes, meaning the pod was created, it quickly received a job to run and completed.

5. Scale out if necessary

If the queue is not decreasing, yet there are many pods in a "running" state for 10+ minutes, it's likely that the job manager is hitting its performance bottleneck. The next thing to do is decrease parallelism and scale out with one or more additional deployments.

For example, with parallelism: 100, completions: 600 the queue is still growing yet there are many pods in a "running" state for 10+ minutes, and the Kubernetes Job age is 20 minutes ... set parallelism: 50, completions: 200 and scale horizontally (out) by adding 2 additional deployments. This yields a total of 150 parallel pods and should reduce the K8s job age to less than 20 minutes while also reducing the number of long-lived "running" pods. Aim for a K8s job age of 5-10 minutes.

For more information on adding deployments, see Scaling out with multiple SJM deployments.

ヒント

You can use the following query to help determine if you need to scale out.

Note: Monitors can exist in multiple sub-accounts.

-- monitors per minute per SJM
FROM SyntheticCheck SELECT
round(rate(uniqueCount(id), 1 minute)/uniqueCount(minionId),0.1) AS 'heavy jobs per minute per SJM',
uniqueCount(minionId) AS 'number of SJMs (namespaces)',
round(rate(uniqueCount(id), 1 minute),0.1) AS 'heavy jobs per minute total'
WHERE minionContainerSystem = 'KUBERNETES' AND minionDeploymentMode = 'private' AND location = 'YOUR_PRIVATE_LOCATION' AND type != 'SIMPLE' FACET location SINCE 1 hour ago TIMESERIES

ヒント

Reducing the number of K8s job cycles can also improve performance. As each cycle reaches the set number of completions, there are fewer and fewer "running" pods to take on new Synthetics jobs. For example, with completions set to 200 and parallelism set to 50, we initially have 50 running pods, but this starts to decrease as we pass 150 completions. At 199 completions, only 1 running pod remains.

Setting a larger value for completions is not a bad idea, but it can lead to warning events in K8s about TooManyMissedTimes for the cronjob.

Scaling out with multiple SJM deployments

To scale beyond the ~15 jobs/minute throughput of a single SJM, you must install multiple, separate SJM Helm releases.

重要

Do not use replicas to scale the job manager pod. The SJM architecture requires a 1:1 relationship between a runtime pod and its parent SJM pod. If runtime pods send results back to the wrong SJM replica (e.g., through a Kubernetes service), those results will be lost. However, ping-runtime.replicaCount is okay to use.

The correct strategy is to deploy multiple SJM instances, each as its own Helm release. Each SJM will compete for jobs from the same private location, providing load balancing, failover protection, and an increased total job throughput.

Horizontal Scaling Strategy

If you need to scale out, you can simplify maintenance by treating each SJM deployment as a fixed-capacity unit.

  1. Set Parallelism: For each SJM, set parallelism to the same maximum that a single SJM can handle without creating too many long-lived "running" runtime pods. This maximizes the potential throughput of each SJM without wasting resources.
  2. Set Completions: For each SJM, set completions to the same fixed value as well. Adjust as needed to target a 5 minute Kubernetes job age per runtime, i.e., node-browser-runtime and node-api-runtime.
  3. Install Releases: Install as many separate Helm releases as you need to handle your total job demand, i.e., get the queue to zero or line chart to a negative slope.
  4. Monitor and Add: Monitor the private location job queue. If it starts to grow (positive slope), simply install another Helm release (e.g., sjm-delta) using the same fixed configuration.

By fixing parallelism and completions to static values, increasing or decreasing capacity becomes a simpler process of adding or removing Helm releases. This helps to avoid wasting cluster resources on a parallelism value that is higher than the SJM can effectively utilize.

Installation Example

When installing multiple SJM releases, you must provide a unique name for each release. All instances must be configured with the same private location key.

Setting the fullnameOverride is highly recommended to create shorter, more manageable resource names. For example, to install two SJMs named sjm-alpha and sjm-beta into the newrelic namespace (both using the same values.yaml with your fixed parallelism and completions):

bash
$
# Install the first SJM deployment
$
helm upgrade --install sjm-alpha newrelic/synthetics-job-manager \
>
-n newrelic \
>
-f values.yaml \
>
--set fullnameOverride=sjm-alpha \
>
--set ping-runtime.fullnameOverride=sjm-alpha-ping \
>
--set node-api-runtime.fullnameOverride=sjm-alpha-api \
>
--set node-browser-runtime.fullnameOverride=sjm-alpha-browser
bash
$
# Install the second SJM deployment to add capacity
$
helm upgrade --install sjm-beta newrelic/synthetics-job-manager \
>
-n newrelic \
>
-f values.yaml \
>
--set fullnameOverride=sjm-beta
>
--set ping-runtime.fullnameOverride=sjm-beta-ping \
>
--set node-api-runtime.fullnameOverride=sjm-beta-api \
>
--set node-browser-runtime.fullnameOverride=sjm-beta-browser

You can continue this pattern (sjm-charlie, sjm-delta, etc.) for as many SJMs as needed to keep the job queue from growing.

Copyright © 2026 New Relic株式会社。

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.