Debugging endless loop caused by missing SignalR performance counters

I recently experienced a problem with an IIS website:

After changing and building code, the site would hang indefinitely at startup.

It seemed like an endless loop, because CPU activity was high.

 

One workaround was to reset IIS with:

iisreset

Then the site became functional until the next change + build.

 

However I decided to find the cause of the problem and a real solution.

 

I checked the process activity for w3wp.exe with Process Monitor:

These parts of the registry were continually accessed:

process_monitor_w3wp_registry_signalr_performance

HKLM
HKLM\SYSTEM\CurrentControlSet\Services\SignalR\Performance
HKLM\System\CurrentControlSet\Services\SignalR\Performance

 

At this point I should have checked if the system actually had SignalR performance counters installed…

 

However I decided to examine the situation further by attaching WinDbg (x64) to the w3wp.exe process.

 

Because I was suspecting an endless loop I decided to check the thread CPU activity with:

!runaway

Part of the result was:

User Mode Time
Thread       Time
6:26cc      0 days 0:05:39.750
15:2e00      0 days 0:00:11.187
13:27b0      0 days 0:00:06.718

 

Thread 6 was definitely the one with the most activity, so I switched to that with:

~6s

 

Then checked the .NET call stack with:

!clrstack

Part of the result was:

OS Thread Id: 0x26cc (6)
Child SP               IP Call Site
000000f1a0c6b158 00007ff98b60b1a6 [HelperMethodFrame: 000000f1a0c6b158] System.Security.SecurityRuntime.GetSecurityObjectForFrame(System.Threading.StackCrawlMark ByRef, Boolean)
000000f1a0c6b260 00007ff98a4b584c System.Security.SecurityRuntime.RevertAssert(System.Threading.StackCrawlMark ByRef)
000000f1a0c6b290 00007ff98a4b9e3d System.Security.PermissionSet.RevertAssert()
000000f1a0c6b2c0 00007ff989cd5e6d System.Diagnostics.PerformanceCounterLib.FindCustomCategory(System.String, System.Diagnostics.PerformanceCounterCategoryType ByRef)
000000f1a0c6b310 00007ff989cd5db8 System.Diagnostics.PerformanceCounterLib.FindCustomCategory(System.String, System.Diagnostics.PerformanceCounterCategoryType ByRef)
000000f1a0c6b380 00007ff989cd735c System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String, System.String)
000000f1a0c6b3e0 00007ff989cdd1af System.Diagnostics.PerformanceCounter.InitializeImpl()
000000f1a0c6b450 00007ff989cdc35f System.Diagnostics.PerformanceCounter..ctor(System.String, System.String, System.String, Boolean)
000000f1a0c6b4a0 00007ff92f81b7b5 Microsoft.AspNet.SignalR.Infrastructure.PerformanceCounterManager.LoadCounter(System.String, System.String, System.String, Boolean)
000000f1a0c6b550 00007ff92f81b606 Microsoft.AspNet.SignalR.Infrastructure.PerformanceCounterManager.SetCounterProperties()
000000f1a0c6b5c0 00007ff92f81b2d7 Microsoft.AspNet.SignalR.Infrastructure.PerformanceCounterManager.Initialize(System.String, System.Threading.CancellationToken)
000000f1a0c6b640 00007ff92f81a23b Microsoft.AspNet.SignalR.Hosting.HostDependencyResolverExtensions.InitializeHost(Microsoft.AspNet.SignalR.IDependencyResolver, System.String, System.Threading.CancellationToken)
000000f1a0c6b680 00007ff92f81859b Owin.OwinExtensions.UseSignalRMiddleware[[System.__Canon, mscorlib]](Owin.IAppBuilder, System.Object[])
000000f1a0c6b770 00007ff92f818147 Owin.OwinExtensions+c__DisplayClass1.b__0(Owin.IAppBuilder)
000000f1a0c6b7b0 00007ff92f816c22 Owin.MapExtensions.Map(Owin.IAppBuilder, Microsoft.Owin.PathString, System.Action`1)
000000f1a0c6b800 00007ff92f816993 Owin.MapExtensions.Map(Owin.IAppBuilder, System.String, System.Action`1)
000000f1a0c6b850 00007ff92f816814 Owin.OwinExtensions.MapSignalR(Owin.IAppBuilder)

 

After continuing and breaking execution a few times I was seeing similar call stacks.

I needed to identify the endless loop.

 

Experimented by setting breakpoints from the above call stack.

 

Set a single breakpoint at:

00007ff989cd735c System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String, System.String)

With:

bc *
bp 00007ff989cd735c

Result: Breakpoint was continually hit.

 

Set a single breakpoint at:

00007ff989cdd1af System.Diagnostics.PerformanceCounter.InitializeImpl()

With:

bc *
bp 00007ff989cdd1af

Result: breakpoint was not hit within 1 minute.

 

This indicated that the endless loop was in:

System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String, System.String)

 

I used ILSpy to decompile the method and noticed this loop:

for (CultureInfo cultureInfo = CultureInfo.CurrentCulture; cultureInfo != CultureInfo.InvariantCulture; cultureInfo = cultureInfo.Parent)
{
    performanceCounterLib = PerformanceCounterLib.GetPerformanceCounterLib(machine, cultureInfo);
    if (performanceCounterLib.IsCustomCategory(category))
    {
        return true;
    }
}

 

I noted that the loop condition likely continued to be true.

And that performanceCounterLib.IsCustomCategory(category) most likely continued to be false.

This was likely the endless loop I was looking for.

 

I set an additional breakpoint with:

!bpmd   System.dll   System.Diagnostics.PerformanceCounterLib.IsCustomCategory

Messages from WinDbg:

Found 2 methods in module 00007ff989421000...
MethodDesc = 00007ff989524020
MethodDesc = 00007ff989524040
Setting breakpoint: bp 00007FF989CD73EB [System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String)]
Setting breakpoint: bp 00007FF989CD727A [System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String, System.String)]
Adding pending breakpoints...

 

However when continuing only breakpoint 0 was hit, further indicating that the endless loop occured inside PerformanceCounterLib.IsCustomCategory(…).

 

Decided to look for the CultureInfo object in the loop.

 

Checked the call stack including locals and parameters with:

clrstack -a

Top part of the call stack was:

OS Thread Id: 0x26cc (6)
Child SP               IP Call Site
000000f1a0c6b380 00007ff989cd735c System.Diagnostics.PerformanceCounterLib.IsCustomCategory(System.String, System.String)
PARAMETERS:
machine (<CLR reg>) = 0x000000f4a155df18
category (<CLR reg>) = 0x000000f4a1584228
LOCALS:
<CLR reg> = 0x000000f1a158f130

 

Examined the local reference with:

!DumpObj /d 000000f1a158f130

Part of the result was:

Name:        System.Globalization.CultureInfo
MethodTable: 00007ff98a701420
EEClass:     00007ff98a111e48
Size:        128(0x80) bytes
File:        C:\Windows\Microsoft.Net\assembly\GAC_64\mscorlib\v4.0_4.0.0.0__b77a5c561934e089\mscorlib.dll
Fields:
MT    Field   Offset                 Type VT     Attr            Value Name
00007ff98a6ed6f8  4001208       74       System.Boolean  1 instance                1 m_isReadOnly
00007ff98a6e0698  4001209        8 ...ation.CompareInfo  0 instance 000000f1a1593b40 compareInfo
...
...
00007ff98a6ed6f8  4001212       75       System.Boolean  1 instance                0 m_isInherited
00007ff98a6ed6f8  4001213       76       System.Boolean  1 instance                1 m_isSafeCrossDomain
00007ff98a7003d0  4001214       70         System.Int32  1 instance                2 m_createdDomainID
00007ff98a701420  4001215       40 ...ation.CultureInfo  0 instance 0000000000000000 m_consoleFallbackCulture
00007ff98a6fda88  4001216       48        System.String  0 instance 000000f1a1551420 m_name
00007ff98a6fda88  4001217       50        System.String  0 instance 000000f1a1551420 m_nonSortName
00007ff98a6fda88  4001218       58        System.String  0 instance 000000f1a1551420 m_sortName
00007ff98a701420  4001223       60 ...ation.CultureInfo  0 instance 000000f1a158f130 m_parent
...
...

 

Noticed that m_parent pointed to itself.

 

Cleared all breakpoints with:

bc *

Set another breakpoint with:

!bpmd   System.dll   System.Diagnostics.PerformanceCounterLib.FindCustomCategory

 

This breakpoint was continually hit as expected.

 

Then checked the PerformanceCounterLib.FindCustomCategory(…) method with ILSpy.

Noticed that the method accessed the registry and that the return value depended on what was found.

 

Searched online for: HKLM\System\CurrentControlSet\Services\SignalR\Performance

And found this article: http://www.asp.net/signalr/overview/performance/signalr-performance

 

Based on the article I decided to install SignalR performance counters by:

1. Installing the NuGet package: signalr.utils

2. Then installing the SignalR performance counters with:

SignalR.exe ipc

 

This was the solution!

After continuing execution, the loop ended and site initialization continued normally.

(The issue occured with version 2.2.1 of SignalR and version 4.6.1 of .NET, but targeting version 4.5)

Conclusion

A site using SignalR can hang with an endless loop when initializing, if the SignalR performance counters are not installed.

A combination of tools like Process Monitor, ILSpy and WinDbg can be used to discover the cause of an endless loop in 3rd party or framework code.