• /
  • EnglishEspañolFrançais日本語한국어Português
  • Log inStart now

Synthetics job manager configuration

This doc will guide you through configuring your synthetics job manager by showing you how to:

Configuration using environment variables

Environmental variables allow you to fine-tune the synthetics job manager configuration to meet your specific environmental and functional needs.

User-defined variables for scripted monitors

Private synthetics job managers let you configure environment variables for scripted monitors. These variables are managed locally on the SJM and can be accessed via $env.USER_DEFINED_VARIABLES. You can set user-defined variables in two ways. You can mount a JSON file or you can supply an environment variable to the SJM on launch. If both are provided, the SJM will only use values provided by the environment.

Accessing user-defined environment variables from scripts

To reference a configured user-defined environment variable, use the reserved $env.USER_DEFINED_VARIABLES followed by the name of a given variable with dot notation (for example, $env.USER_DEFINED_VARIABLES.MY_VARIABLE).

Caution

User-defined environment variables are not sanitized from logs. Consider using the secure credentials feature for sensitive information.

Custom node modules

Custom node modules are provided in both CPM and SJM. They allow you to create a customized set of node modules and use them in scripted monitors (scripted API and scripted browser) for synthetic monitoring.

Set up your custom modules directory

Create a directory with a package.json file following npm official guidelines in the root folder. The SJM will install any dependencies listed in the package.json's dependencies field. These dependencies will be available when running monitors on the private synthetics job manager. See an example of this below.

Example

In this example, a custom module directory is used with the following structure:

/example-custom-modules-dir/
├── counter
│ ├── index.js
│ └── package.json
└── package.json ⇦ the only mandatory file

The package.json defines dependencies as both a local module (for example, counter) and any hosted modules (for example, smallest version 1.0.1):

{
"name": "custom-modules",
"version": "1.0.0", ⇦ optional
"description": "example custom modules directory", ⇦ optional
"dependencies": {
"smallest": "1.0.1", ⇦ hosted module
"counter": "file:./counter" ⇦ local module
}
}

Add your custom modules directory to the SJM for Docker, Podman, or Kubernetes

To check if the modules were installed correctly or if any errors occurred, look for the following lines in the synthetics-job-manager container or pod logs:

2024-06-29 03:51:28,407{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Detected mounted path for custom node modules
2024-06-29 03:51:28,408{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Validating permission for custom node modules package.json file
2024-06-29 03:51:28,409{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Installing custom node modules...
2024-06-29 03:51:44,670{UTC} [main] INFO c.n.s.j.p.options.CustomModules - Custom node modules installed successfully.

Now you can add "require('smallest');" into the script of monitors you send to this private location.

Change package.json for custom modules

In addition to local and hosted modules, you can utilize Node.js modules as well. To update the custom modules used by your SJM, make changes to the package.json file, and restart the SJM. During the reboot process, the SJM will recognize the configuration change and automatically perform cleanup and re-installation operations to ensure the updated modules are applied.

Caution

Local modules: While your package.json can include any local module, these modules must reside inside the tree under your custom module directory. If stored outside the tree, the initialization process will fail and you will see an error message in the docker logs after launching SJM.

Permanent data storage

Users may want to use permanent data storage to provide the user_defined_variables.json file or support custom node modules.

Docker

To set permanent data storage on Docker:

  1. Create a directory on the host where you are launching the Job Manager. This is your source directory.

  2. Launch the Job Manager, mounting the source directory to the target directory /var/lib/newrelic/synthetics.

    Example:

    bash
    $
    docker run ... -v /sjm-volume:/var/lib/newrelic/synthetics:rw ...

Podman

To set permanent data storage on Podman:

  1. Create a directory on the host where you are launching the Job Manager. This is your source directory.
  2. Launch the Job Manager, mounting the source directory to the target directory /var/lib/newrelic/synthetics.

Example:

bash
$
podman run ... -v /sjm-volume:/var/lib/newrelic/synthetics:rw,z ...

Kubernetes

To set permanent data storage on Kubernetes, the user has two options:

  1. Provide an existing PersistentVolumeClaim (PVC) for an existing PersistentVolume (PV), setting the synthetics.persistence.existingClaimName configuration value. Example:

    bash
    $
    helm install ... --set synthetics.persistence.existingClaimName=sjm-claim ...
  2. Provide an existing PersistentVolume (PV) name, setting the synthetics.persistence.existingVolumeName configuration value. Helm will generate a PVC for the user. The user may optionally set the following values as well:

  • synthetics.persistence.storageClass: The storage class of the existing PV. If not provided, Kubernetes will use the default storage class.

  • synthetics.persistence.size: The size for the claim. If not set, the default is currently 2Gi.

    bash
    $
    helm install ... --set synthetics.persistence.existingVolumeName=sjm-volume --set synthetics.persistence.storageClass=standard ...

Sizing considerations for Docker and Podman

To ensure your private location runs efficiently, you must provision enough CPU resources on your host to handle your monitoring workload. Many factors impact sizing, but you can quickly estimate your needs. You'll need 1 CPU core for each heavyweight monitor (i.e., simple browser, scripted browser, or scripted API monitor). Below are two formulas to help you calculate the number of cores you need, whether you're diagnosing a current setup or planning for a future one.

Formula 1: Diagnosing an Existing Location

If your current private location is struggling to keep up and you suspect jobs are queuing, use this formula to find out how many cores you actually need. It's based on the observable performance of your system.

Creq=(Rproc+Rgrowth)Davg,mC_{req} = (R_{proc} + R_{growth}) \cdot D_{avg,m}
  • CreqC_{req} = Required CPU Cores.
  • RprocR_{proc} = The rate of heavyweight jobs being processed per minute.
  • RgrowthR_{growth} = The rate your jobManagerHeavyweightJobs queue is growing per minute.
  • Davg,mD_{avg,m} = The average duration of heavyweight jobs in minutes.

This formula calculates your true job arrival rate by adding the jobs your system is processing to the jobs that are piling up in the queue. Multiplying this total load by the average job duration tells you exactly how many cores you need to clear all the work without queuing.

Formula 2: Forecasting a New or Future Location

If you're setting up a new private location or planning to add more monitors, use this formula to forecast your needs ahead of time.

Creq=NmonDavg,m1Pavg,mC_{req} = N_{mon} \cdot D_{avg,m} \cdot \frac{1}P_{avg,m}
  • CreqC_{req} = Required CPU Cores.
  • NmonN_{mon} = The total number of heavyweight monitors you plan to run.
  • Davg,mD_{avg,m} = The average duration of a heavyweight job in minutes.
  • Pavg,mP_{avg,m} = The average period for heavyweight monitors in minutes (e.g., a monitor that runs every 5 minutes has Pavg,m=5P_{avg,m} = 5).

This calculates your expected workload from first principles: how many monitors you have, how often they run, and how long they take.

Important sizing factors

When using these formulas, remember to account for these factors:

  • Job duration (Davg,mD_{avg,m}): Your average should include jobs that time out (often ~3 minutes), as these hold a core for their entire duration.
  • Job failures and retries: When a monitor fails, it's automatically retried. These retries are additional jobs that add to the total load. A monitor that consistently fails and retries effectively multiplies its period, significantly impacting throughput.
  • Scaling out: In addition to adding more cores to a host (scaling up), you can deploy additional synthetics job managers with the same private location key to load balance jobs across multiple environments (scaling out).

It's important to note that a single Synthetics Job Manager (SJM) has a throughput limit of approximately 15 heavyweight jobs per minute. This is due to an internal threading strategy that favors the efficient competition of jobs across multiple SJMs over the raw number of jobs processed per SJM. If your calculations indicate a need for higher throughput, you must scale out by deploying additional SJMs. You can check if your job queue is growing to determine if more SJMs are needed.

Adding more SJMs with the same private location key provides several advantages:

  • Load balancing: Jobs for the private location are distributed across all available SJMs.
  • Failover protection: If one SJM instance goes down, others can continue processing jobs.
  • Higher total throughput: The total throughput for your private location becomes the sum of the throughput from each SJM (e.g., two SJMs provide up to ~30 jobs/minute).

NRQL queries for diagnosis

You can run these queries in the query builder to get the inputs for the diagnostic formula. Make sure to set the time range to a long enough period to get a stable average.

1. Find the rate of jobs processed per minute (RprocR_{proc}): This query counts the number of non-ping (heavyweight) jobs completed over the last day and shows the average rate per minute.

FROM SyntheticCheck
SELECT rate(uniqueCount(id), 1 minute) AS 'job rate per minute'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND typeLabel != 'Ping'
SINCE 1 day ago

2. Find the rate of queue growth per minute (RgrowthR_{growth}): This query calculates the average per-minute growth of the jobManagerHeavyweightJobs queue on a time series chart. A line above zero indicates the queue is growing, while a line below zero means it's shrinking.

FROM SyntheticsPrivateLocationStatus
SELECT derivative(jobManagerHeavyweightJobs, 1 minute) AS 'queue growth rate per minute'
WHERE name = 'YOUR_PRIVATE_LOCATION'
TIMESERIES SINCE 1 day ago

Tip

Make sure to select the account where the private location exists. It's best to view this query as a time series because the derivative function can vary wildly. The goal is to get an estimate of the rate of queue growth per minute. Play with different time ranges to see what works best.

3. Find total number of heavyweight monitors (NmonN_{mon}): This query finds the unique count of heavyweight monitors.

FROM SyntheticCheck
SELECT uniqueCount(monitorId) AS 'monitor count'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND typeLabel != 'Ping'
SINCE 1 day ago

4. Find average job duration in minutes (Davg,mD_{avg,m}): This query finds the average execution duration of completed non-ping jobs and converts the result from milliseconds to minutes. executionDuration represents the time the job took to execute on the host.

FROM SyntheticCheck
SELECT average(executionDuration)/60e3 AS 'avg job duration (m)'
WHERE location = 'YOUR_PRIVATE_LOCATION' AND typeLabel != 'Ping'
SINCE 1 day ago

5. Find average heavyweight monitor period (Pavg,mP_{avg,m}): If the private location's jobManagerHeavyweightJobs queue is growing, it isn't accurate to calculate the average monitor period from existing results. This will need to be estimated from the list of monitors on the Synthetic Monitors page. Make sure to select the correct New Relic account and you may need to filter by privateLocation.

Tip

Synthetic monitors may exist in multiple sub accounts. If you have more sub accounts than can be selected in the query builder, choose the accounts with the most monitors.

Note about ping monitors and the pingJobs queue

Ping monitors are different. They are lightweight jobs that do not consume a full CPU core each. Instead, they use a separate queue (pingJobs) and run on a pool of worker threads.

While they are less resource-intensive, a high volume of ping jobs, especially failing ones, can still cause performance issues. Keep these points in mind:

  • Resource model: Ping jobs utilize worker threads, not dedicated CPU cores. The core-per-job calculation does not apply to them.
  • Timeout and retry: A failing ping job can occupy a worker thread for up to 60 seconds. It first attempts an HTTP HEAD request (30-second timeout). If that fails, it immediately retries with an HTTP GET request (another 30-second timeout).
  • Scaling: Although the sizing formula is different, the same principles apply. To handle a large volume of ping jobs and keep the pingJobs queue from growing, you may need to scale up and/or scale out. Scaling up means increasing cpu and memory resources per host or namespace. Scaling out means adding more instances of the ping runtime. This can be done by deploying more job managers on more hosts, in more namespaces, or even within the same namespace. Alternatively, the ping-runtime in Kubernetes allows you to set a larger number of replicas per deployment.

Sizing considerations for Kubernetes and OpenShift

Each runtime used by the Kubernetes and OpenShift synthetic job manager can be sized independently by setting values in the helm chart. The node-api-runtime and node-browser-runtime are sized independently using a combination of the parallelism and completions settings.

  • The parallelism setting controls how many pods of a particular runtime run concurrently.
  • The completions setting controls how many pods must complete before the CronJob starts another Kubernetes Job for that runtime.

How to Size Your Deployment: A Step-by-Step Guide

Your goal is to configure enough parallelism to handle your job load without exceeding the throughput limit of your SJM instances.

Step 1: Estimate Your Required Workload

Completions: This determines how many runtime pods should complete before a new Kubernetes Job is started.

First, determine your private location's average job execution duration and job rate. Use executionDuration as it most accurately reflects the pod's active runtime.

-- Get average job execution duration (in seconds)
FROM SyntheticCheck
SELECT average(executionDuration / 60e3) AS 'D_avg_m'
WHERE typeLabel != 'Ping' AND location = 'YOUR_PRIVATE_LOCATION'
FACET typeLabel SINCE 1 hour ago
Completions=5Davg,mCompletions = \frac{5}D_{avg,m}

Where Davg,mD_{avg,m} is your average job execution duration in seconds.

Required Parallelism: This determines how many workers (pods) you need running concurrently to handle your 5-minute job load.

-- Get jobs per 5 minutes
FROM SyntheticCheck
SELECT rate(uniqueCount(id), 5 minutes) AS 'N_m'
WHERE typeLabel != 'Ping' AND location = 'YOUR_PRIVATE_LOCATION'
FACET typeLabel SINCE 1 hour ago
Preq=NmCompletionsP_{req} = \frac{N_m}{Completions}

Where NmN_m is your number of jobs per 5 minutes. This PreqP_{req} value is your target total parallelism.

Step 2: Check Against the Single-SJM Throughput Limit

Max Parallelism: This determines how many workers (pods) your SJM can effectively utilize.

Pmax15Davg,mP_{max} \approx 15 \cdot D_{avg,m}

This PmaxP_{max} value is your system limit for one SJM Helm deployment.

Tip

The above queries are based on current results. If your private location does not have any results or the job manager is not performing at its best, query results may not be accurate. In that case, start with the examples in the table below and adjust until your queue is stable.

Tip

A key consideration is that a single SJM instance has a maximum throughput of approximately 15 heavyweight jobs per minute. You can calculate the maximum effective parallelism (PmaxP_{max}) a single SJM can support before hitting this ceiling.

Step 3: Compare, Configure, and Scale

Compare your required parallelism (PreqP_{req}) from Step 1 to the maximum parallelism (PmaxP_{max}) from Step 2.

Scenario A: PreqPmaxP_{req} \le P_{max}

  • Diagnosis: Your job load is within the limit of a single SJM instance.

  • Action:

    1. You will deploy one SJM Helm release.
    2. In your Helm chart values.yaml, set parallelism to your calculated PreqP_{req}.
    3. Set completions to your calculated Completions. For improved efficiency, this value should typically be 6-10x your parallelism setting.

Scenario B: Preq>PmaxP_{req} > P_{max}

  • Diagnosis: Your job load exceeds the ~15 jobs/minute limit of a single SJM.

  • Action:

    1. You must scale out by deploying multiple, separate SJM Helm releases.
    2. See the Scaling Out with Multiple SJM Deployments section below for the correct procedure.
    3. Do not increase the replicaCount in your Helm chart.

Step 4: Monitor Your Queue

After applying your changes, you must verify that your job queue is stable and not growing. A consistently growing queue means your location is still under-provisioned.

Run this query to check the queue's growth rate:

-- Check for queue growth (a positive value means the queue is growing)
SELECT derivative(jobManagerHeavyweightJobs, 1 minute) AS 'Heavyweight Queue Growth Rate (per min)'
FROM SyntheticsPrivateLocationStatus
WHERE name = 'YOUR_PRIVATE_LOCATION'
SINCE 1 hour ago TIMESERIES

If the "Queue Growth Rate" is consistently positive, you need to install more SJM Helm deployments (Scenario B) or re-check your parallelism settings (Scenario A).

Configuration Examples and Tuning

The parallelism setting directly affects how many synthetics jobs per minute can be run. Too small a value and the queue may grow. Too large a value and nodes may become resource constrained.

Example

Description

parallelism=1 completions=1

The runtime will execute 1 synthetics job per minute. After 1 job completes, the CronJob configuration will start a new job at the next minute. Throughput will be extremely limited with this configuration.

parallelism=1 completions=6

The runtime will execute 1 synthetics job at a time. After the job completes, a new job will start immediately. After 6 jobs complete, the CronJob configuration will start a new Kubernetes Job. Throughput will be limited. A single long-running synthetics job will block the processing of any other synthetics jobs of this type.

parallelism=3 completions=24

The runtime will execute 3 synthetics jobs at once. After any of these jobs complete, a new job will start immediately. After 24 jobs complete, the CronJob configuration will start a new Kubernetes Job. Throughput is much better with this or similar configurations.

If your parallelism setting is working well (keeping the queue at zero), setting a higher completions value (e.g., 6-10x parallelism) can improve efficiency by:

  • Accommodating variability in job durations.
  • Reducing the number of completion cycles to minimize the "nearing the end of completions" inefficiency where the next batch can't start until the final job from the current batch completes.

It's important to note that the completions value should not be too large or the CronJob will experience warning events like the following:

bash
$
8m40s Warning TooManyMissedTimes cronjob/synthetics-node-browser-runtime too many missed start times: 101. Set or decrease .spec.startingDeadlineSeconds or check clock skew

Tip

New Relic is not liable for any modifications you make to the synthetics job manager files.

Scaling out with multiple SJM deployments

To scale beyond the ~15 jobs/minute throughput of a single SJM, you must install multiple, separate SJM Helm releases.

Important

Do not use replicaCount to scale the job manager pod. You cannot scale by increasing the replicaCount for a single Helm release. The SJM architecture requires a 1:1 relationship between a runtime pod and its parent SJM pod. If runtime pods send results back to the wrong SJM replica (e.g., through a Kubernetes service), those results will be lost.

The correct strategy is to deploy multiple SJM instances, each as its own Helm release. Each SJM will compete for jobs from the same private location, providing load balancing, failover protection, and an increased total job throughput.

Simplified Scaling Strategy

Assuming Preq>PmaxP_{req} > P_{max} and you need to scale out, you can simplify maintenance by treating each SJM deployment as a fixed-capacity unit.

  1. Set Max Parallelism: For each SJM, set parallelism to the same PmaxP_{max} value. This maximizes the potential throughput of each SJM.

  2. Set Completions: For each SJM, set completions to a fixed value as well. The PreqP_{req} formula from Step 1 can be modified to estimate completions by substituting in the PmaxP_{max} value:

    Completions=NmPmaxCompletions = \frac{N_m}P_{max}

    Where NmN_m is your number of jobs per 5 minutes. Adjust as needed after deploying to target a 5 minute Kubernetes job age per runtime, i.e., node-browser-runtime and node-api-runtime.

  3. Install Releases: Install as many separate Helm releases as you need to handle your total PreqP_{req}. For example, if your total PreqP_{req} is 60 and you've fixed each SJM's parallelism at 20 (PmaxP_{max} from Step 2), you would need three separate Helm deployments to meet the required job demand.

  4. Monitor and Add: Monitor your job queue (see Step 4). If it starts to grow, simply install another Helm release (e.g., sjm-delta) using the same fixed configuration.

By fixing parallelism and completions to static values based on PmaxP_{max}, increasing or decreasing capacity becomes a simpler process of adding or removing Helm releases. This helps to avoid wasting cluster resources on a parallelism value that is higher than the SJM can effectively utilize.

Installation Example

When installing multiple SJM releases, you must provide a unique name for each release. All instances must be configured with the same private location key.

Setting the fullnameOverride is highly recommended to create shorter, more manageable resource names. For example, to install two SJMs named sjm-alpha and sjm-beta into the newrelic namespace (both using the same values.yaml with your fixed parallelism and completions):

bash
$
# Install the first SJM deployment
$
helm upgrade --install sjm-alpha newrelic/synthetics-job-manager \
>
-n newrelic \
>
-f values.yaml \
>
--set fullnameOverride=sjm-alpha \
>
--set ping-runtime.fullnameOverride=sjm-alpha-ping \
>
--set node-api-runtime.fullnameOverride=sjm-alpha-api \
>
--set node-browser-runtime.fullnameOverride=sjm-alpha-browser
bash
$
# Install the second SJM deployment to add capacity
$
helm upgrade --install sjm-beta newrelic/synthetics-job-manager \
>
-n newrelic \
>
-f values.yaml \
>
--set fullnameOverride=sjm-beta
>
--set ping-runtime.fullnameOverride=sjm-beta-ping \
>
--set node-api-runtime.fullnameOverride=sjm-beta-api \
>
--set node-browser-runtime.fullnameOverride=sjm-beta-browser

You can continue this pattern (sjm-charlie, sjm-delta, etc.) for as many SJMs as needed to keep the job queue from growing.

Copyright © 2025 New Relic Inc.

This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.