Debugging #Error in SSRS report caused by attempts to convert non-numeric strings

While testing another modified SSRS report I noticed that certain values were not shown, only the message:

#Error

 

Checked the ReportServerService__*.log and ExecutionLog* database views, but they contained no clues about the error.

 

Then I used procdump to log exception messages from ReportingServicesService.exe with:

procdump.exe -f "" -l -e 1 ReportingServicesService.exe

 

Part of the result was:

[13:15:59] Exception: E0434F4D.System.FormatException ("Input string was not in a correct format.")
[13:15:59] Exception: E0434F4D.System.FormatException ("Input string was not in a correct format.")
[13:15:59] Exception: E0434F4D.System.InvalidCastException ("Conversion from string " norm" to type 'Decimal' is not valid.")

 

I extracted and ran the dataset query in SQL Server Management Studio.

Then looked for “norm” strings and found them in a column normally used to store numbers.

Attempts to convert these caused the problem.

 

My solution was to exclude them from the dataset query by adding:

AND ISNUMERIC(ValueString) = 1

Debugging #Error in SSRS report caused by missing function overload

While testing a modified SSRS report I noticed that certain values were not shown, only the message:

#Error

 

Neither the ReportServerService__*.log or ExecutionLog* database views contained any clues about the cause.

 

I used procdump to log exception messages from ReportingServicesService.exe with:

procdump.exe -f "" -l -e 1 ReportingServicesService.exe

Part of the result was:

[13:46:39] Exception: E0434F4D.System.InvalidOperationException ("Nullable object must have a value.")
[13:46:39] Exception: E0434F4D.System.FormatException ("Input string was not in a correct format.")
[13:46:39] Exception: E0434F4D.System.FormatException ("Input string was not in a correct format.")
[13:46:39] Exception: E0434F4D.System.InvalidCastException ("Conversion from string "Nullable object must have a valu" to type 'Decimal' is not valid.")

 

These exceptions were unexpected and puzzling. Additional examination was required.

 

I attached WinDbg (x64) to ReportingServicesService.exe and configured event filters to break on CLR exceptions.

 

Refreshed the report again and the first exception was thrown.

Verified the exception with:

!PrintException

The result was:

Exception object: 000000030d4c2f90
Exception type: System.InvalidOperationException
Message: Nullable object must have a value.
InnerException: <none>
StackTrace (generated):
<none>
StackTraceString: <none>
HResult: 80131509

 

Checked the .NET call stack with:

!clrstack

Part of the result was:

Child-SP         RetAddr          Call Site
0000000016d1b150 00007ff98347a650 System.ThrowHelper.ThrowInvalidOperationException(System.ExceptionResource)
0000000016d1b190 00007ff9271033ef System.Nullable`1[[System.Double, mscorlib]].op_Explicit(System.Nullable`1<Double>)
0000000016d1b1c0 00007ff927103310 ReportExprHostImpl+CustomCodeProxy.UnitConvertSlim(Double, System.String, System.String, System.String, System.Nullable`1<Double>, System.Nullable`1<Double>)
0000000016d1b290 00007ff92622c452 ReportExprHostImpl+Textbox9_TextBoxExprHost+Paragraph00_ParagraphExprHost+TextRun00_TextRunExprHost.get_ValueExpr()
0000000016d1b360 00007ff92622bedf Microsoft.ReportingServices.RdlExpressions.ReportRuntime.EvaluateTextRunValueExpression(Microsoft.ReportingServices.ReportIntermediateFormat.TextRun)

 

Also checked the .NET call stack including parameters and locals with:

!clrstack -a

However the majority of these were optimized away, only showing: <no data>

 

It’s relevant to mention that the report code in question contains multiple UnitConvert(…) functions with different parameter overloads.

 

At this point I made a mistake by reading UnitConvertSlim as UnitConvert. I mistakenly thought that the right UnitConvert(…) function was called.

 

I wanted to set a breakpoint to inspect the status before the exception occured, but this was complicated by the fact that ReportExprHostImpl+CustomCodeProxy is generated code.

(Managed to set breakpoints, but they were not hit, most likely because they were on previous versions of the generated code)

 

Being challenged by this, I decided to do something else.

I compared the old and new version of the report and realised that UnitConvertSlim(…) was another function than expected.

 

Therefore I decided to run the VB code in a simple, separate VB application, to enable normal debugging.

This required simulating certain SSRS classes and properties, in this case Report.Parameters(…).

 

By debugging the separate VB application I realized that the expected UnitConvert(…) function did not exist!

Instead another UnitConvert(…) function was found and called. Some parameters were converted to other types and this was the root cause of the problem…

 

I implemented the missing UnitConvert(…) function and this solved the problem, both in the separate VB application and in SSRS.

Conclusion

SSRS VB code may fail in unexpected ways, if a required function overload does not exist.

If debugging VB code in SSRS is problematic, try debugging it outside of SSRS in a separate program.

Blank SSRS report caused by ScriptManager timeout

I was testing a fairly complex and long running report for SQL Server Reporting Services, but the results were blank when using the Report Viewer component.

 

Status in the ExecutionLog* views was rsSuccess in the ReportServer database.

 

The ReportServerService*.log showed no errors, only this:

library!ReportServer_0-1!a78!09/23/2016-11:19:31:: i INFO: RenderForNewSession('/Test report')
webserver!ReportServer_0-1!a78!09/23/2016-11:21:59:: i INFO: Processed report. Report='/Test report', Stream=''

 

I used procdump to log exception messages from the IIS worker process with:

procdump.exe -f "" -l -e 1 w3wp.exe

Part of the result was:

[11:21:01] Exception: E0434F4D.System.Net.WebException ("The request was aborted: The request was canceled.")
[11:21:01] Exception: E0434F4D.System.Net.WebException ("The request was aborted: The request was canceled.")
[11:21:01] Exception: E0434F4D.Microsoft.Reporting.WebForms.ReportServerException ("The operation was canceled.")
[11:21:01] Exception: E0434F4D.Microsoft.Reporting.WebForms.ReportServerException ("The operation was canceled.")

 

So the request was actively getting cancelled.

This definitely seemed like a timeout issue, but which timeout setting was causing the problem?

 

Checked the default value for ReportViewer.ServerReport.Timeout, which was 600 seconds.

 

Then I googled for: ReportServerException (“The operation was canceled.”)

And found a solution here: Report Viewer Control returns OperationCanceledException OnError event after 90 seconds

 

The solution was to modify the .aspx page with the ReportViewer control:

From:

<asp:ScriptManager ID="ScriptManager1" runat="server">

To:

<asp:ScriptManager ID="ScriptManager1" runat="server" AsyncPostBackTimeout="600">

Debugging blank SSRS report caused by invalid XML characters

I was testing a modified report for SQL Server Reporting Services, but the results were blank and no relevant error messages were logged by SSRS.

Only messages like these that didn’t reveal the root cause:

ERROR: Throwing Microsoft.ReportingServices.Diagnostics.Utilities.ReportServerHttpRuntimeInternalException: RsWorkerRequest::FlushResponse., Microsoft.ReportingServices.Diagnostics.Utilities.ReportServerHttpRuntimeInternalException: An internal or system error occurred in the HTTP Runtime object for application domain ReportServer_MSSQLSERVER_0-1-131184109058523378.  ---> System.Runtime.InteropServices.COMException (0x800703E3): The I/O operation has been aborted because of either a thread exit or an application request. (Exception from HRESULT: 0x800703E3)

 

Started troubleshooting by comparing the modified report with the original version, but I could see no obvious problems in either version.

 

When testing the report through the Report Viewer component resulted in this exception:

[XmlException: '♫', hexadecimal value 0x0E, is an invalid character. Line 1, position 6512.]
   System.Xml.XmlTextReaderImpl.Throw(Exception e) +88
   System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32 startPos, Boolean expand, StringBuilder internalSubsetBuilder, Int32& charCount, EntityType& entityType) +817
   System.Xml.XmlTextReaderImpl.ParseCharRefInline(Int32 startPos, Int32& charCount, EntityType& entityType) +98
   System.Xml.XmlTextReaderImpl.ParseText(Int32& startPos, Int32& endPos, Int32& outOrChars) +816
   System.Xml.XmlTextReaderImpl.ParseText() +126
   System.Xml.XmlTextReaderImpl.ParseElementContent() +438
   System.Xml.XmlReader.ReadElementString() +84
   Microsoft.Xml.Serialization.GeneratedAssembly.XmlSerializationReaderServerReportSoapProxy.Read5_ValidValue(Boolean isNullable, Boolean checkType) +489

 

I wanted to verify that it was the same problem in pure SSRS, so I used ProcDump to log exception messages with:

procdump.exe -f "" -l -e 1 ReportingServicesService.exe

 

The result was:

[17:47:58] Exception: E0434F4D.System.Xml.XmlException ("'♫', hexadecimal value 0x0E, is an invalid character. Line 1, position 6508.")
[17:47:58] Exception: E0434F4D.System.InvalidOperationException ("There is an error in XML document (1, 6508).")
[17:47:58] Exception: E0434F4D.MissingEndpointException ("The attempt to connect to the report server failed. Check your connection information and that the report server is a compatible version.")
[17:47:58] Exception: E0434F4D.System.Runtime.InteropServices.COMException ("The I/O operation has been aborted because of either a thread exit or an application request. (Exception from HRESULT: 0x800703E3)")
[17:47:58] Exception: E0434F4D.System.Runtime.InteropServices.COMException ("The I/O operation has been aborted because of either a thread exit or an application request. (Exception from HRESULT: 0x800703E3)")
[17:47:58] Exception: E0434F4D.Microsoft.ReportingServices.Diagnostics.Utilities.ReportServerHttpRuntimeInternalException ("An internal or system error occurred in the HTTP Runtime object for application domain ReportServer_MSSQLSERVER_0-1-131184109058523378. ")

 

That confirmed that the problem was caused by invalid XML characters somewhere.

 

I wanted to see the XML that contained the invalid characters, so I attached WinDbg to ReportingServicesService.exe and enabled the CLR Exception event filter.

 

Refreshed the report to reproduce the problem.

Tried looking at the .NET call stack including parameters and locals with:

!clrstack -a

First part of the stack was:

OS Thread Id: 0x1d84 (26)
        Child SP               IP Call Site
00000011d267d1d8 00007ffef2638a5c [HelperMethodFrame: 00000011d267d1d8] 
00000011d267d2c0 00007ffedc73e488 System.Xml.XmlTextReaderImpl.Throw(System.Exception)
    PARAMETERS:
        this = <no data>
        e = <no data>
    LOCALS:
        <no data>

00000011d267d300 00007ffedc3cfd11 System.Xml.XmlTextReaderImpl.ParseNumericCharRefInline(Int32, Boolean, System.Text.StringBuilder, Int32 ByRef, EntityType ByRef)
    PARAMETERS:
        this (0x00000011d267d380) = 0x0000000fae5d0c00
        startPos = <no data>
        expand (0x00000011d267d390) = 0x0000000000000001
        internalSubsetBuilder (0x00000011d267d398) = 0x0000000000000000
        charCount (0x00000011d267d3a0) = 0x00000011d267d3f0
        entityType = <no data>
    LOCALS:
        <no data>
        0x00000011d267d340 = 0x0000000000000a6b
        0x00000011d267d330 = 0x0000000fae5d1328
        <no data>
        <no data>
        <no data>
        <no data>
        <no data>
        <CLR reg> = 0x000000000000000e
        <no data>
        <no data>

 

I examined the local object 0x0000000fae5d1328 with:

!DumpObj /d 0000000fae5d1328

Which seemed to be used as a buffer for XML data:

Name:        System.Char[]
MethodTable: 00007ffee08debd0
EEClass:     00007ffee02f10d0
Size:        8218(0x201a) bytes
Array:       Rank 1, Number of elements 4097, Type Char (Print Array)
Content:     MultiValue>true</MultiValue><QueryParameter>true</QueryParameter><Prompt>Device Type</Prompt><PromptUser>true</PromptUser><Depen
Fields:
None

 

I decided to find the adresses involved by dumping the array with:

!DumpArray /d 0000000fae5d1328

Part of the result was:

Name:        System.Char[]
MethodTable: 00007ffee08debd0
EEClass:     00007ffee02f10d0
Size:        8218(0x201a) bytes
Array:       Rank 1, Number of elements 4097, Type Char
Element Methodtable: 00007ffee08dec38
[0] 0000000fae5d1338
[1] 0000000fae5d133a
...
...
[4095] 0000000fae5d3336
[4096] 0000000fae5d3338

 

Now knowing the address range the unicode characters in the buffer could be read with:

du 0000000fae5d1338 0000000fae5d3338

Part of the result was:

0000000f`ae5d1338  "MultiValue>true</MultiValue><Que"
0000000f`ae5d1378  "ryParameter>true</QueryParameter"
...
...
0000000f`ae5d27b8  "/Value></ValidValue><ValidValue>"
0000000f`ae5d27f8  "<Label>♫♫113897</Label><"
0000000f`ae5d2838  "Value>52b3bc3a-32a8-49fc-a658-66"
0000000f`ae5d2878  "c67d3b29f6</Value></ValidValue><"
...
...

 

Checked the XML for invalid characters and eventually found some.

In a more readable form:

<ValidValue>
  <Label>♫♫113897</Label>
  <Value>52b3bc3a-32a8-49fc-a658-66c67d3b29f6</Value>
</ValidValue>

 

I thought it would be interesting to see the last query before the exception occured, so I started a trace with SQL Server Profiler and found:

SELECT cast(cast(0 as binary) as uniqueidentifier) AS Id, 'UNKNOWN' AS Username
UNION
SELECT Id, Username FROM Users order by 2

 

Tried running the query in SQL Server Management Studio and part of the result was:

query_result_usernames_with_invalid_xml_characters

 

Certain usernames actually contained characters that are invalid in XML.

This was the underlying cause behind the problem.

Conclusion

When SSRS reports fail without a clear indication of the cause, I recommend using a variety of tools and techniques to debug the problem similar to the described approach.

Using ProcDump itself to debug programs

ProcDump is an excellent tool for capturing memory dumps.

It actually attaches to processes as a debugger and it can also be used to debug certain problems without using other tools.

 

This can be useful when debugging problems on systems that are currently in use.

(Maybe policies prevent installing and attaching a traditional debugger or live debugging will simply cause too much disruption)

 

ProcDump can be used to output exceptions and debug messages from a process with:

procdump.exe -f "" -l -e 1 Process_name_or_id

The options used are:

-f “” Exception filter with empty filter makes ProcDump output exception messages without creating memory dumps.
-l Shows debug messages from process.
-e 1 Monitors both handled and unhandled exceptions.

 

Of course ProcDump can be used normally to generate memory dumps, which can be examined on another system with a debugger.

Be aware that processes are paused when ProcDump takes a memory dump and also be aware of the disk usage, especially when taking full memory dumps.

 

Recently using this technique helped me debug a problem on a fresh installation of SQL Server Reporting Services 2012.

All reports were returning blank results and no error messages were shown or logged.

 

I started monitoring the ReportingServicesService.exe process with:

procdump.exe -f "" -l -e 1 ReportingServicesService.exe

 

After refreshing the report the output from ProcDump was:

ProcDump v8.0 - Writes process dump files
Copyright (C) 2009-2016 Mark Russinovich
Sysinternals - www.sysinternals.com
With contributions from Andrew Richards

Process:               ReportingServicesService.exe (11180)
CPU threshold:         n/a
Performance counter:   n/a
Commit threshold:      n/a
Threshold seconds:     n/a
Hung window check:     Disabled
Log debug strings:     Enabled
Exception monitor:     First Chance+Unhandled
Exception filter:      Display Only
Terminate monitor:     Disabled
Cloning type:          Disabled
Concurrent limit:      n/a
Avoid outage:          n/a
Number of dumps:       1
Dump folder:           C:\bin\Procdump\
Dump filename/mask:    PROCESSNAME_YYMMDD_HHMMSS


Press Ctrl-C to end monitoring without terminating the process.

CLR Version: v2.0.50727

[18:22:24] Exception: E0434F4D.System.IndexOutOfRangeException ("QC")
[18:22:26] Exception: E0434F4D.System.Net.Sockets.SocketException ("No such host is known")
[18:22:26] Exception: E0434F4D.System.Net.WebException ("The remote name could not be resolved: 'reportingutility'")
[18:22:26] Exception: E0434F4D.System.Net.WebException ("The remote name could not be resolved: 'reportingutility'")
[18:22:26] Exception: E0434F4D.System.Net.WebException ("The remote name could not be resolved: 'reportingutility'")
[18:22:26] Exception: E0434F4D.System.Net.WebException ("The remote name could not be resolved: 'reportingutility'")

 

The report was trying to call a web service from VB code, however the host name was unknown.

The problem was resolved by specifying the host name in the system hosts file.

Conclusion

ProcDump can also be used on its own to debug programs.

WinDbg: Problems attaching to process

Sometimes various problems prevent WinDbg from attaching to a process.

I experienced some recently, that I have described here.

 

Trying to attach 32-bit WinDbg (x86) to a 64-bit process

WinDbg_Could_not_attach_to_process_NTSTATUS_0xC00000BB

Could not attach to process 11576, NTSTATUS 0xC00000BB

The request is not supported.

 

If you get this error message, please check if the process is 32-bit or 64-bit.

Then try attaching again with the right version of WinDbg.

 

Attaching 64-bit WinDbg (x64) to a 32-bit process

Attaching 64-bit WinDbg (x64) to a 32-bit process is possible, but it’s usually not very useful. That’s because it shows a WOW64 view of the process, which limits the debugging options.

 

Trying to attach to process with insufficient user permissions

WinDbg_Could_not_attach_to_process_error_0n5

Could not attach to process 4160, Win32 error 0n5

Access is denied.

 

This error may occur if WinDbg has insufficient permissions.

Try running WinDbg “As Administrator” and try attaching to the process again.

 

Trying to attach to process when it is already being debugged

WinDbg_Could_not_attach_to_process_NTSTATUS_0xC0000048

Could not attach to process 6128, NTSTATUS 0xC0000048

An attempt to set a process's DebugPort or ExceptionPort was made, but a port already exists in the process or an attempt to set a file's CompletionPort made, but a port was already set in the file or an attempt to set an ALPC port's associated completion port was made, but it is already set.

 

This problem usually occurs if the process is already being debugged. The other debugger could be Visual Studio, ProcDump or another tool.

Check if other debuggers are attached, detach them and then try attaching to the process again with WinDbg.

Debugging Reporting Services with WinDbg

I recently needed to use classes and methods from MSXML6.dll in Reporting Services reports (to make web service calls).

Tried to instantiate XMLHTTP and DOMDocument from MSXML6.dll with late binding like this:

Request = CreateObject("MSXML2.XMLHTTP.6.0")
Doc = CreateObject("MSXML2.DOMDocument.6.0")

However it failed with a SecurityException, but the exact cause was unknown.

 

I decided to identify the cause by attaching WinDbg (x86) to RsReportHost.exe

 

On each report refresh I noticed a number of exceptions like:

(2a54.34d4): CLR exception - code e0434352 (first chance)
(2a54.34d4): CLR exception - code e0434352 (first chance)
(2a54.34d4): C++ EH exception - code e06d7363 (first chance)
(2a54.34d4): C++ EH exception - code e06d7363 (first chance)

 

I configured WinDbg to break on first chance exceptions under: Debug -> Event Filters…

Debug_Event_Filters

 

And loaded the SOS extension with:

.loadby sos clr

 

Typical (top part) of the call stack was:

0:007> !clrstack
OS Thread Id: 0x55f4 (7)
Child SP       IP Call Site
00bbb28c 7652dae8 [HelperMethodFrame: 00bbb28c]
00bbb33c 72f25d5a System.Security.CodeAccessSecurityEngine.ThrowSecurityException(...)
00bbb36c 72f25cad System.Security.CodeAccessSecurityEngine.ThrowSecurityException(...)
00bbb394 72f2136c System.Security.CodeAccessSecurityEngine.CheckSetHelper(...)
00bbb3e4 73581acf System.Security.CodeAccessSecurityEngine.CheckSetHelper(...)
00bbb50c 73e41376 [GCFrame: 00bbb50c]
00bbbd08 73e41376 [GCFrame: 00bbbd08]
00bbbdc0 73e41376 [GCFrame: 00bbbdc0]
00bbbf58 73e41376 [DebuggerSecurityCodeMarkFrame: 00bbbf58]
00bbbee4 73e41376 [HelperMethodFrame: 00bbbee4]
00bbbf7c 09a16859 Microsoft.VisualBasic.Interaction.CreateObject(System.String, System.String)
00bbbfa4 09a180ba ReportExprHostImpl+CustomCodeProxy.CallServiceTest2()
00bbbfcc 09a1807b ReportExprHostImpl+Description_TextBoxExprHost+Paragraph00_ParagraphExprHost+TextRun00_TextRunExprHost.get_ValueExpr()

(parameters trimmed for readability)

 

I checked the mscorlib.dll code for System.Security.CodeAccessSecurityEngine.CheckSetHelper(…) in ILSpy.

Noticed something interesting: The local variable permThatFailed could show which permission was missing.

This was also used when calling System.Security.SecurityException.MakeSecurityException, so that was an ideal place to set a breakpoint.

 

Set a breakpoint with !bpmd:

0:015> !bpmd mscorlib.dll System.Security.SecurityException.MakeSecurityException
Found 1 methods in module 72bf1000...
MethodDesc = 72c88e84
Setting breakpoint: bp 72F33813 [System.Security.SecurityException.MakeSecurityException(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Security.PermissionSet, System.Security.PermissionSet, System.RuntimeMethodHandleInternal, System.Security.Permissions.SecurityAction, System.Object, System.Security.IPermission)]
Adding pending breakpoints...

 

Continued execution, refreshed the report and then checked the call stack with parameters:

0:014> !clrstack -p
OS Thread Id: 0x54c0 (14)
Child SP       IP Call Site
0103bd88 72f33813 System.Security.SecurityException.MakeSecurityException(System.Reflection.AssemblyName, System.Security.Policy.Evidence, System.Security.PermissionSet, System.Security.PermissionSet, System.RuntimeMethodHandleInternal, System.Security.Permissions.SecurityAction, System.Object, System.Security.IPermission)
PARAMETERS:
asmName (<CLR reg>) = 0x036c4bd0
asmEvidence (<CLR reg>) = 0x036c5b64
granted (0x0103bda0) = 0x00000000
refused (0x0103bd9c) = 0x036c4bd0
rmh (0x0103bd98) = 0x036c5b64
action (0x0103bd94) = 0x00000000
demand (0x0103bd90) = 0x00000000
permThatFailed (0x0103bd8c) = 0x00000002

 

The permThatFailed parameter consistently had the value 2.

I looked up the values for the SecurityPermissionFlag enumeration in ILSpy:

And found this:

/// <summary>Ability to call unmanaged code.</summary>
UnmanagedCode = 2,

 

Conclusion

The SecurityException occured because permission to execute unmanaged code was missing.

Examine WHEA_UNCORRECTABLE_ERROR (124) BSOD with WinDbg

One of my computers recently crashed with a BSOD.

This occurs very rarely so I decided to identify the cause.

Troubleshooting

I checked the system event log for a bugcheck and found this:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
...
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000124 (0x0000000000000000, 0xfffffa800d91b038, 0x00000000b2004000, 0x0000000029000175). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 040116-21216-01.

 

I decided to examine the C:\Windows\MEMORY.DMP crash dump with WinDbg. (In this case the x64 version of WinDbg)

WinDbg’s !analyze command usually reveals relevant information about a BSOD, so that’s what I checked first:

Run: !analyze -v

0: kd> !analyze -v

...
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa800d91b038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000b2004000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000029000175, Low order 32-bits of the MCi_STATUS value.

...

PRIMARY_PROBLEM_CLASS:  X64_0x124_AuthenticAMD_PROCESSOR_CACHE

 

It seemed to be a hardware error related to the processor cache.

For more details I looked at the WHEA_ERROR_RECORD information:

(Only section 2 with the actual error shown)

0: kd> !errrec fffffa800d91b038
===============================================================================
Common Platform Error Record @ fffffa800d91b038
-------------------------------------------------------------------------------
Record Id     : 01d17af4b6b560a4
Severity      : Fatal (1)
Length        : 928
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 4/1/2016 18:19:44 (UTC)
Flags         : 0x00000000

...

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ fffffa800d91b148
Section       @ fffffa800d91b2d0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

Error         : DCACHEL1_EVICT_ERR (Proc 0 Bank 0)
Status      : 0xb200400029000175

 

Apparently a hardware error releated to the level 1 data cache caused the system crash.

 

The computer in question has an AMD Athlon II X2 280 CPU.

Using CPU-Z I noticed that the core voltage seemed a little low for this CPU.

I remembered that I had undervolted the CPU to save power.

(Did not have reliability problems with it for years until now)

 

I checked the BIOS settings and discovered that the CPU was undervolted by -0,15 volts.

I decided to change it to -0,1 volts.

If any other reliability problems occur, I will change it back to standard voltage.

Conclusion

If hardware is running out of specifications and system crashes occur, then adjust settings closer to specifications.

(Examples of running out of spec: Undervolting, overvolting and overclocking)

Get .NET call stack when files are written with WinDbg

I was interested in learning how Neuron ESB performed logging internally.

(To see if it would be relevant to use the same classes and methods for logging in custom C# process steps and possibly other situations)

 

At first I used Process Monitor.

I found a WriteFile event, where a log file was updated.

Neuron_ESB_WriteFile_Process_Monitor_Event

I checked the stack under event properties.

Neuron_ESB_WriteFile_Process_Monitor_Event_Properties_Stack

The native stack trace didn’t tell me which .NET classes or methods that were in use, but it did give me an idea:

Perhaps I could use WinDbg to get the .NET call stack for each call to WriteFile in kernel32.dll

 

Indeed this was possible by:

1. Starting WinDbg (x64) as Administrator.

2. Attaching to the ESBService.exe process.

3. Loading the SOS extension with:

.loadby sos clr

4. Setting a breakpoint with a command to show the .NET call stack and continue execution with:

bp kernel32!WriteFile "!clrstack; g"

5. Continuing execution with:

g

 

This showed the .NET call stack every time files were written.

Most WriteFile calls were for logging (as expected).

An example:

OS Thread Id: 0x2354 (35)
Child SP               IP Call Site
000000001836e028 00000000775a1fa0 [InlinedCallFrame: 000000001836e028] Microsoft.Win32.Win32Native.WriteFile(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte*, Int32, Int32 ByRef, IntPtr)
000000001836e028 000007feea284cf4 [InlinedCallFrame: 000000001836e028] Microsoft.Win32.Win32Native.WriteFile(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte*, Int32, Int32 ByRef, IntPtr)
000000001836dff0 000007feea284cf4 DomainNeutralILStubClass.IL_STUB_PInvoke(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte*, Int32, Int32 ByRef, IntPtr)
000000001836e0d0 000007feea265984 System.IO.FileStream.WriteFileNative(Microsoft.Win32.SafeHandles.SafeFileHandle, Byte[], Int32, Int32, System.Threading.NativeOverlapped*, Int32 ByRef)
000000001836e130 000007feea2658d2 System.IO.FileStream.WriteCore(Byte[], Int32, Int32)
000000001836e1a0 000007feea265847 System.IO.FileStream.FlushInternalBuffer()
000000001836e1e0 000007feea2657d4 System.IO.FileStream.Flush(Boolean)
000000001836e220 000007feea27128c System.IO.StreamWriter.Flush(Boolean, Boolean)
000000001836e280 000007fec064f2f7 Microsoft.VisualBasic.Logging.FileLogTraceListener+ReferencedStream.Flush()
000000001836e2d0 000007fec064e065 Microsoft.VisualBasic.Logging.FileLogTraceListener.Flush()
000000001836e300 000007fee92e0859 System.Diagnostics.TraceInternal.WriteLine(System.String)
000000001836e370 000007fe8bcc780d Neuron.Esb.ESBMasterService.Trace(System.Diagnostics.TraceListener, System.String)
000000001836e3c0 000007fe8bcc7bec Neuron.Esb.ESBMasterService.TraceInfo(System.Diagnostics.TraceSwitch, System.String, System.Diagnostics.TraceListener)
000000001836e420 000007fe8bfa72a9 Neuron.Esb.ESBMasterService.TraceInfo(System.Diagnostics.TraceSwitch, System.String)
000000001836e460 000007fe8c1c831e Neuron.Esb.EsbService.AmqpService.Trace(System.String, System.Object[])
000000001836e4a0 000007fe8cbbbc2e Neuron.Esb.EsbService.AmqpService.UpdateServiceStatus(Neuron.Esb.EsbService.ServiceState, Newtonsoft.Json.Linq.JToken)
000000001836e530 000007fe8c8baa0b Neuron.Esb.EsbService.AmqpService.UpdateExchangeStatus(System.Object, System.Timers.ElapsedEventArgs)
000000001836e5d0 000007fee9802586 System.Timers.Timer.MyTimerCallback(System.Object)
000000001836e650 000007feea1c2490 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001836e720 000007feea1c2327 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001836e750 000007feea1f0b9b System.Threading.TimerQueueTimer.CallCallback()
000000001836e7b0 000007feea1f0998 System.Threading.TimerQueueTimer.Fire()
000000001836e800 000007feea281a9f System.Threading.TimerQueue.FireNextTimers()
000000001836ed08 000007feeb303753 [DebuggerU2MCatchHandlerFrame: 000000001836ed08]
000000001836ee98 000007feeb303753 [ContextTransitionFrame: 000000001836ee98]
000000001836f0b8 000007feeb303753 [DebuggerU2MCatchHandlerFrame: 000000001836f0b8]

 

Having years of experience with log4net, I decided to use and recommend that.

Neuron ESB service may hang due to RabbitMQ service

I recently experienced a situation where the Neuron ESB service would hang when restarting.

ESBService.exe was stuck at Stopping and didn’t finish within 5 minutes, so it seemed to be permanently stuck.

Troubleshooting

Resource Monitor’s “Analyze Wait Chain” feature showed that 3 threads were waiting to finish network I/O.

ESBService_Analyze_Wait_Chain_waiting_to_finish_network_IO

 

I took 5 memory dumps using procdump:

procdump -ma EsbService.exe

 

WinDbg showed similar call stacks for the waiting threads:

0:060> !clrstack
OS Thread Id: 0x4e58 (60)
Child SP IP Call Site
000000001cd6dd88 000000007757d3fa [InlinedCallFrame: 000000001cd6dd88] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)
000000001cd6dd88 000007fee9217e61 [InlinedCallFrame: 000000001cd6dd88] System.Net.UnsafeNclNativeMethods+OSSOCK.recv(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)
000000001cd6dd60 000007fee9217e61 DomainNeutralILStubClass.IL_STUB_PInvoke(IntPtr, Byte*, Int32, System.Net.Sockets.SocketFlags)
000000001cd6de10 000007fee919c501 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags, System.Net.Sockets.SocketError ByRef)
000000001cd6de90 000007fee919c3b8 System.Net.Sockets.Socket.Receive(Byte[], Int32, Int32, System.Net.Sockets.SocketFlags)
000000001cd6def0 000007fee919c132 System.Net.Sockets.NetworkStream.Read(Byte[], Int32, Int32)
000000001cd6df50 000007feea1479bf System.IO.BufferedStream.ReadByte()
000000001cd6df90 000007feea13444f System.IO.BinaryReader.ReadByte()
000000001cd6dfc0 000007fe8c0ad406 RabbitMQ.Client.Impl.Frame.ReadFrom(RabbitMQ.Util.NetworkBinaryReader)
000000001cd6e020 000007fe8c0ad374 RabbitMQ.Client.Impl.SocketFrameHandler_0_9.ReadFrame()
000000001cd6e070 000007fe8c0ad27c RabbitMQ.Client.Impl.ConnectionBase.MainLoopIteration()
000000001cd6e0b0 000007fe8c0acfe2 RabbitMQ.Client.Impl.ConnectionBase.MainLoop()
000000001cd6e110 000007feea0c2490 System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001cd6e1e0 000007feea0c2327 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
000000001cd6e210 000007feea0c22e2 System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
000000001cd6e260 000007feea162362 System.Threading.ThreadHelper.ThreadStart()
000000001cd6e548 000007feeb203753 [GCFrame: 000000001cd6e548]
000000001cd6e898 000007feeb203753 [DebuggerU2MCatchHandlerFrame: 000000001cd6e898]
000000001cd6ea28 000007feeb203753 [ContextTransitionFrame: 000000001cd6ea28]
000000001cd6ec48 000007feeb203753 [DebuggerU2MCatchHandlerFrame: 000000001cd6ec48]

 

The call stacks indicated that the problem was related to RabbitMQ.

 

I decided to forcefully stop the ESBService.exe process.

 

When starting ESBService.exe it was still trying to communicate with RabbitMQ and hanging.

I noticed this error in the Neuron ESB Event Log:

Event Info: An error occurred trying to update the status for RabbitMQ exchange "DEFAULT.Enterprise.FileContent": This request operation sent to net.tcp://localhost:50004/MasterControl/ did not receive a reply within the configured timeout (00:01:00). The time allotted to this operation may have been a portion of a longer timeout. This may be because the service is still processing the operation or because the service was unable to send a reply message. Please consider increasing the operation timeout (by casting the channel/proxy to IContextChannel and setting the OperationTimeout property) and ensure that the service is able to connect to the client.. Attempted to Query 'localhost' on port '15672'.

 

I decided to restart the RabbitMQ service, which solved the problem.

After this the Neuron ESBService.exe was able to start and work normally.

 

If the problem occurs again, I will take and examine memory dumps of the RabbitMQ service, in an attempt to learn the cause of the problem.

This occured with Neuron ESB version 3.5.3.411 and RabbitMQ version 3.2.2 (bundled with Neuron ESB).

 

I notice that the latest RabbitMQ version is currently 3.6.1:

https://www.rabbitmq.com/changelog.html

It seems relevant to update RabbitMQ, but compatibility with the current Neuron ESB version is currently unknown.