Problem
When the .NET agent is attached to high throughput .NET Framework applications, a few customers reported observing high thread contention. Upon reviewing the process dumps and stack traces, many threads may be blocked by calls to AppDomain.GetData()
.
Possible solution
In version 9.7 of the .NET agent, a new environment variable was introduced that disables the use of AppDomain
storage by the .NET agent:
NEW_RELIC_DISABLE_APPDOMAIN_CACHING=true
Caution
While this environment variable does eliminate lock contention from agent calls to AppDomain.GetData()
, the total overhead of the .NET agent is increased when this environment variable is enabled. In our testing this resulted in less locking, but lower maximum application throughput with the .NET agent attached to a test application.
We are extremely interested in any feedback from customers experiencing this issue who try this new configuration option. If you try this configuration option out, please create an issue in our GitHub repository for .NET describing your experience.
Cause
The .NET agent needs access to method signature information to instrument methods. By default for .NET Framework applications, the agent loads this method information via reflection and caches it in the AppDomain
for future use. This is intended to be a general optimization, but several customers have experienced high thread lock contention surrounding this behavior and believe it is the root cause of services becoming slow or non-responsive.
After inspecting Microsoft source code, calls to AppDomain.GetData()
do in fact result in a global lock.
Since the .NET agent does not use a method information caching scheme at all for .NET Core applications, and no customers have reported similar thread lock contention issues, we decided to expose an environment variable to make the agent work the same way in .NET Framework applications.