Adjust Power Options to prevent DRIVER_POWER_STATE_FAILURE (9f) BSOD

I recently updated the NIC driver on my work computer in an attempt to fix a BSOD problem.

Unfortunately it did not help. The computer would still regularly hang and eventually crash when shutting down to sleep mode with:

The computer has rebooted from a bugcheck. The bugcheck was: 0x0000009f (0x0000000000000003, 0xfffffa800d1eba10, 0xfffff80004d3a3d8, 0xfffffa802eba1c60).
DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.

 

Decided to investigate the settings under: Control Panel -> Power Options

 

Noticed multiple settings called Link State Power Management and decided to disable them.

PCI Express -> Link State Power Management: Off

Low Power Active Mode profile -> Link State Power Management: No Power Saving

Idle mode optimization profile -> Link State Power Management: No Power Saving

(Expecting that the first setting under PCI Express was the important one, but disabled the others just in case)

 

It seems that none of the standard power plans disable these settings, not even High performance.

If you switch between multiple power plans, you will have to modify them all.

 

After modifying the power options I have shut down the computer to sleep mode successfully 15 times without a single hang.

Disabling PCIe Link State Power Management seems to have fixed the DRIVER_POWER_STATE_FAILURE (9f) BSOD problems.

Examining DRIVER_POWER_STATE_FAILURE (9f) BSOD

My work computer recently hanged when shutting down to sleep mode.

When booting the computer the next day I realized that it had eventually crashed.

This happened again after disabling hyper-threading, so I decided to investigate.

 

Checked Event Viewer and found:

Log Name: System
Source: Microsoft-Windows-WER-SystemErrorReporting
Event ID: 1001
Task Category: None
Level: Error
Keywords: Classic
Description:
The computer has rebooted from a bugcheck. The bugcheck was: 0x0000009f (0x0000000000000003, 0xfffffa800cd5e060, 0xfffff80000b9a3d8, 0xfffffa802e232590). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Searched online for BSOD 0x0000009f and found:

https://docs.microsoft.com/en-us/windows-hardware/drivers/debugger/bug-check-0x9f–driver-power-state-failure

 

Examined the memory dump with WinDbg (x64).

Checked for details about the crash with:

!analyze -v

Part of the result:

DRIVER_POWER_STATE_FAILURE (9f)
A driver has failed to complete a power IRP within a specific time.
Arguments:
Arg1: 0000000000000003, A device object has been blocking an Irp for too long a time
Arg2: fffffa800cd5e060, Physical Device Object of the stack
Arg3: fffff80000b9a3d8, nt!TRIAGE_9F_POWER on Win7 and higher, otherwise the Functional Device Object of the stack
Arg4: fffffa802e232590, The blocked IRP

FAILURE_BUCKET_ID: X64_0x9F_3_POWER_DOWN_Rt64win7_IMAGE_pci.sys

BUCKET_ID: X64_0x9F_3_POWER_DOWN_Rt64win7_IMAGE_pci.sys

FAILURE_ID_HASH_STRING: km:x64_0x9f_3_power_down_rt64win7_image_pci.sys

 

Checked for information about the I/O request packet with:

!irp fffffa802e232590

Result:

Irp is active with 5 stacks 4 is current (= 0xfffffa802e232738)
No Mdl: No System Buffer: Thread 00000000: Irp stack trace.
cmd flg cl Device File Completion-Context
[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
[ 0, 0] 0 0 00000000 00000000 00000000-00000000

Args: 00000000 00000000 00000000 00000000
>[ 16, 2] 0 e1 fffffa800da84050 00000000 fffff800032ca230-fffffa800eb3bb50 Success Error Cancel pending
*** ERROR: Module load completed but symbols could not be loaded for Rt64win7.sys
\Driver\RTL8167 nt!PopSystemIrpCompletion
Args: 00015400 00000000 00000005 00000003
[ 0, 0] 0 0 00000000 00000000 00000000-fffffa800eb3bb50

Args: 00000000 00000000 00000000 00000000

 

Examined available information about Rt64win7 with:

lmvm Rt64win7

Result:

start end module name
fffff880`05c3a000 fffff880`05d39000 Rt64win7 (no symbols)
Loaded symbol image file: Rt64win7.sys
Image path: \SystemRoot\system32\DRIVERS\Rt64win7.sys
Image name: Rt64win7.sys
Timestamp: Fri Oct 07 11:27:12 2016 (57F76A70)
CheckSum: 0010CDA5
ImageSize: 000FF000
Translations: 0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Found the driver file in Windows explorer under:

C:\Windows\system32\DRIVERS\Rt64win7.sys

Then checked Properties. The details view confirmed that the driver was for the Realtek network interface card, version: 7.103.1007.2016.

 

Decided to look for a newer driver from Realtek.

Found, downloaded and installed the current latest version 7.109.

Unfortunately this was not the solution, the problems continued.

However, later on I discovered that adjusting power options seemed effective.

Disable hyper-threading on Intel Skylake & Kaby Lake CPU

In 4½ months I have experienced 16 BSOD system crashes on a new work computer:

Crash Date Bug Check String Bug Check Code Caused By Address
21-06-2017 DRIVER_POWER_STATE_FAILURE 0x0000009f ntoskrnl.exe+70e40
12-06-2017 NTFS_FILE_SYSTEM 0x00000024 Ntfs.sys+4211
23-05-2017 IRQL_NOT_LESS_OR_EQUAL 0x0000000a ntoskrnl.exe+6f4c0
10-05-2017 IRQL_NOT_LESS_OR_EQUAL 0x0000000a ntoskrnl.exe+6f440
01-05-2017 BAD_POOL_HEADER 0x00000019 win32k.sys+f13b2
24-03-2017 BAD_POOL_CALLER 0x000000c2 ntoskrnl.exe+6f440
17-03-2017 SYSTEM_SERVICE_EXCEPTION 0x0000003b afd.sys+41448
14-03-2017 MEMORY_MANAGEMENT 0x0000001a ntoskrnl.exe+70400
13-03-2017 PAGE_FAULT_IN_NONPAGED_AREA 0x00000050 VBoxDrv.sys+1f037
10-03-2017 PFN_LIST_CORRUPT 0x0000004e ntoskrnl.exe+70400
02-03-2017 SYSTEM_SERVICE_EXCEPTION 0x0000003b ntoskrnl.exe+70400
22-02-2017 BAD_POOL_CALLER 0x000000c2 TDI.SYS+10be
17-02-2017 BAD_POOL_HEADER 0x00000019 ntoskrnl.exe+70400
16-02-2017 SYSTEM_THREAD_EXCEPTION_NOT_HANDLED 0x1000007e iusb3xhc.sys+7dfb0
08-02-2017 PAGE_FAULT_IN_NONPAGED_AREA 0x00000050 ntoskrnl.exe+70400
07-02-2017 PFN_LIST_CORRUPT 0x0000004e ntoskrnl.exe+70400

 

Until now I have:

  • Performed multiple memory tests.
  • Checked SSD health.
  • Checked system files.
  • Examined multiple memory dumps with WinDbg.
  • Installed all relevant firmware and driver updates.
  • Scanned for malware.

 

However this has not been successful or revealed the real cause behind the problems.

 

I eventually decided to replace the original memory modules:

2 x 8 GB DDR4-2133 CL15, Kingston KVR21N15D8K2/16

With:

2 x 8 GB DDR4-2133 CL15, Crucial CT8G4DFS8213.C8FDR1

 

This seemed to help somewhat.

System crashes used to be a semiweekly event.

After replacing the memory modules it became a semimonthly event.

 

The system has an Intel Skylake CPU (Core i7-6700)

 

It has recently been discovered that some Intel Skylake and Kaby Lake CPU’s have a hardware bug related to hyper-threading.

The bug is described in: 6th Generation Intel® Processor Family – Specification Update

Quote:

“Under complex micro-architectural conditions, short loops of less than 64 instructions that use AH, BH, CH or DH registers as well as their corresponding wider register (e.g. RAX, EAX or AX for AH) may cause unpredictable system behavior. This can only happen when both logical processors on the same physical processor are active.”

 

Until system vendors include microcode fixes in firmware/UEFI updates, the only workaround is to disable hyper-threading.

 

The stability problems I have experienced could be caused by this CPU hardware bug.

So I have disabled hyper-threading in BIOS/UEFI setup and will await firmware updates. I hope that the system will finally be stable and reliable.

Conclusion

If you have an Intel Skylake or Kaby Lake CPU, it is recommend to disable hyper-threading for now.

Examining BAD_POOL_CALLER (c2) BSOD

My work computer recently crashed again with another BSOD.

 

Checked Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x000000c2 (0x0000000000000007, 0x000000000000109b, 0x0000000000000000, 0xfffffa800cd9d010). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Examined the memory dump with WinDbg (x64).

Checked for details about the crash with:

!analyze -v

Part of the result:

BAD_POOL_CALLER (c2)
The current thread is making a bad pool request.  Typically this is at a bad IRQL level or double freeing the same allocation, etc.
Arguments:
Arg1: 0000000000000007, Attempt to free pool which was already freed
Arg2: 000000000000109b, (reserved)
Arg3: 0000000000000000, Memory contents of the pool block
Arg4: fffffa800cd9d010, Address of the block of pool being deallocated

Debugging Details:
------------------

POOL_ADDRESS:  fffffa800cd9d010 Nonpaged pool

BUGCHECK_STR:  0xc2_7

DEFAULT_BUCKET_ID:  WIN7_DRIVER_FAULT

PROCESS_NAME:  vlc.exe

CURRENT_IRQL:  2

MODULE_NAME: avgtdia

IMAGE_NAME:  avgtdia.sys

 

Examined the call stack with:

kp

Result:

Child-SP          RetAddr           Call Site
fffff880`0db9b1f8 fffff800`031c3bf9 nt!KeBugCheckEx
fffff880`0db9b200 fffff880`01f729c5 nt!ExAllocatePoolWithTag+0x1951
fffff880`0db9b2b0 fffff880`04272775 avgtdia+0xb9c5
fffff880`0db9b330 fffff880`042407bb afd! ?? ::GFJBLGFE::`string'+0xd64c
fffff880`0db9b550 fffff800`033b028e afd!AfdFastIoDeviceControl+0x7ab
fffff880`0db9b8c0 fffff800`033b0896 nt!IopXxxControlFile+0x6be
fffff880`0db9ba00 fffff800`0308c693 nt!NtDeviceIoControlFile+0x56
fffff880`0db9ba70 00000000`73b12e09 nt!KiSystemServiceCopyEnd+0x13
00000000`045af0f8 00000000`00000000 0x73b12e09

 

The driver avgtdia.sys seemed to cause the crash.

 

Examined information about the avgtdia driver with:

lm v m avgtdia

Result:

start             end                 module name
fffff880`01f67000 fffff880`01fad000   avgtdia    (no symbols)
Loaded symbol image file: avgtdia.sys
Image path: \SystemRoot\system32\DRIVERS\avgtdia.sys
Image name: avgtdia.sys
Timestamp:        Wed Jul 27 15:24:36 2016 (5798B614)
CheckSum:         00053AED
ImageSize:        00046000
Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Discovered that avgtdia.sys was: AVG Network connection watcher

 

This made me suspect that other BSOD crashes were also caused by AVG Internet Security:

Examining PFN_LIST_CORRUPT (4e) and PAGE_FAULT_IN_NONPAGED_AREA (50) BSOD

 

I decided to uninstall AVG Internet Security using: AVG Remover

Installed replacement: Avira Antivirus

 

I used to experience 2 BSOD crashes per week on this computer.

After uninstalling AVG Internet Security, the computer has been running for 1 week without any crashes…

I hope that the root cause has been identified and that the computer will finally be stable and reliable.

Conclusion

Common causes for computer stability problems are failing hard disks, defective memory and buggy drivers.

It seems that some antivirus products can also cause stability problems, possibly combined with specific drivers or other system level software.

Examining SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e) BSOD

My work computer recently crashed with a BSOD just after inserting a USB 3.0 memory stick.

Considering the circumstances I suspected that a USB driver bug caused the crash.

 

Checked Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000007e (0xffffffffc0000005, 0xfffff88001e685fe, 0xfffff8800394e5a8, 0xfffff8800394de00). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Examined the memory dump with WinDbg (x64).

Checked for details about the crash with:

!analyze -v

Part of the result:

SYSTEM_THREAD_EXCEPTION_NOT_HANDLED (7e)
This is a very common bugcheck.  Usually the exception address pinpoints
the driver/function that caused the problem.  Always note this address
as well as the link date of the driver/image that contains this address.
Arguments:
Arg1: ffffffffc0000005, The exception code that was not handled
Arg2: fffff88001e685fe, The address that the exception occurred at
Arg3: fffff8800394e5a8, Exception Record Address
Arg4: fffff8800394de00, Context Record Address

Debugging Details:
------------------

EXCEPTION_CODE: (NTSTATUS) 0xc0000005 - The instruction at 0x%08lx referenced memory at 0x%08lx. The memory could not be %s.

FAULTING_IP:
iusb3hub+235fe
fffff880`01e685fe 4c8b00          mov     r8,qword ptr [rax]

EXCEPTION_RECORD:  fffff8800394e5a8 -- (.exr 0xfffff8800394e5a8)
ExceptionAddress: fffff88001e685fe (iusb3hub+0x00000000000235fe)
ExceptionCode: c0000005 (Access violation)
ExceptionFlags: 00000000
NumberParameters: 2
Parameter[0]: 0000000000000000
Parameter[1]: 0000000000000000
Attempt to read from address 0000000000000000

MODULE_NAME: iusb3hub

IMAGE_NAME:  iusb3hub.sys

 

Examined the call stack with:

kp

Result:

Child-SP          RetAddr           Call Site
fffff880`0394d5d8 fffff800`0344cf24 nt!KeBugCheckEx
fffff880`0394d5e0 fffff800`0340a745 nt!PspUnhandledExceptionInSystemThread+0x24
fffff880`0394d620 fffff800`03101cb4 nt! ?? ::NNGAKEGL::`string'+0x21dc
fffff880`0394d650 fffff800`0310172d nt!_C_specific_handler+0x8c
fffff880`0394d6c0 fffff800`03100505 nt!RtlpExecuteHandlerForException+0xd
fffff880`0394d6f0 fffff800`03111a05 nt!RtlDispatchException+0x415
fffff880`0394ddd0 fffff800`030d5a82 nt!KiDispatchException+0x135
fffff880`0394e470 fffff800`030d45fa nt!KiExceptionDispatch+0xc2
fffff880`0394e650 fffff880`01e685fe nt!KiPageFault+0x23a
fffff880`0394e7e0 fffff880`01e4a2b6 iusb3hub+0x235fe
fffff880`0394e840 fffff880`01e4a055 iusb3hub+0x52b6
fffff880`0394e8b0 fffff880`01e4a7fd iusb3hub+0x5055
fffff880`0394e920 fffff880`01e5c9a7 iusb3hub+0x57fd
fffff880`0394e980 fffff880`01e5c3e4 iusb3hub+0x179a7
fffff880`0394ea90 fffff880`01e69b3b iusb3hub+0x173e4
fffff880`0394eb10 fffff800`033d2413 iusb3hub+0x24b3b
fffff880`0394eb40 fffff800`030df355 nt!IopProcessWorkItem+0x23
fffff880`0394eb70 fffff800`03371236 nt!ExpWorkerThread+0x111
fffff880`0394ec00 fffff800`030c7706 nt!PspSystemThreadStartup+0x5a
fffff880`0394ec40 00000000`00000000 nt!KxStartSystemThread+0x16

 

Apparently iusb3hub.sys caused an access violation by reading from address 0 (null pointer bug).

 

Examined information about the iusb3hub driver with:

lmv m iusb3hub

Result:

start             end                 module name
fffff880`01e45000 fffff880`01eaa000   iusb3hub   (no symbols)
Loaded symbol image file: iusb3hub.sys
Image path: \SystemRoot\system32\DRIVERS\iusb3hub.sys
Image name: iusb3hub.sys
Timestamp:        Fri Dec 18 16:59:07 2015 (56742D4B)
CheckSum:         0006D07A
ImageSize:        00065000
Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Noticed that the driver was more than 1 year old.

Found details about “Intel(R) USB 3.0 Root Hub” in Device Manager.

 

Decided to search for an updated driver.

Installed and ran Intel Driver Update Utility, which found a newer USB 3.0 driver (5.0.0.32)

Installed the updated driver and rebooted the system.

Hoping that this will prevent the computer from crashing in the future.

Examining PFN_LIST_CORRUPT (4e) and PAGE_FAULT_IN_NONPAGED_AREA (50) BSOD

I recently experienced stability problems on a new work computer, which crashed with a BSOD.

 

I looked for clues in Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000004e (0x0000000000000099, 0x00000000003def55, 0x0000000000000000, 0x0000000000000001). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Examined the memory dump with WinDbg (x64).

Checked for details about the crash with:

!analyze -v

Part of the result:

PFN_LIST_CORRUPT (4e)
Typically caused by drivers passing bad memory descriptor lists (ie: calling
MmUnlockPages twice with the same list, etc).  If a kernel debugger is
available get the stack trace.
Arguments:
Arg1: 0000000000000099, A PTE or PFN is corrupt
Arg2: 00000000003def55, page frame number
Arg3: 0000000000000000, current page state
Arg4: 0000000000000001, 0

 

Examined the call stack with:

kp

Result:

Child-SP          RetAddr           Call Site
fffff880`030b34f8 fffff800`0311c37c nt!KeBugCheckEx
fffff880`030b3500 fffff800`03038c17 nt!MiBadShareCount+0x4c
fffff880`030b3540 fffff800`030bc057 nt! ?? ::FNODOBFM::`string'+0x2cf6d
fffff880`030b36f0 fffff800`030bda09 nt!MiDeleteVirtualAddresses+0x41f
fffff880`030b38b0 fffff800`033a9f21 nt!MiRemoveMappedView+0xd9
fffff880`030b39d0 fffff800`033aa323 nt!MiUnmapViewOfSection+0x1b1
fffff880`030b3a90 fffff800`03089693 nt!NtUnmapViewOfSection+0x5f
fffff880`030b3ae0 00000000`76febfda nt!KiSystemServiceCopyEnd+0x13
00000000`0a8df5d8 00000000`00000000 0x76febfda

 

Memory problems are typically caused by failing memory modules, so I tested the memory with Memtest86+.

Only had time for running it for a short time, but it passed the test once.

However the next day the computer crashed again with another BSOD…

 

I found this in Event Viewer:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000050 (0xfffff8a0384b1280, 0x0000000000000000, 0xfffff800031fe133, 0x0000000000000000). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Examined the new memory dump with:

!analyze -v

Part of the result:

PAGE_FAULT_IN_NONPAGED_AREA (50)
Invalid system memory was referenced.  This cannot be protected by try-except,
it must be protected by a Probe.  Typically the address is just plain bad or it
is pointing at freed memory.
Arguments:
Arg1: fffff8a0384b1280, memory referenced.
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation.
Arg3: fffff800031fe133, If non-zero, the instruction address which referenced the bad memory
address.
Arg4: 0000000000000000, (reserved)

 

Examined the call stack with:

kp

Result:

Child-SP          RetAddr           Call Site
fffff880`031735f8 fffff800`031442be nt!KeBugCheckEx
fffff880`03173600 fffff800`030c552e nt! ?? ::FNODOBFM::`string'+0x3bc5f
fffff880`03173760 fffff800`031fe133 nt!KiPageFault+0x16e
fffff880`031738f0 fffff800`030af3b1 nt!ExFreePoolWithTag+0x43
fffff880`031739a0 fffff880`018450c6 nt!FsRtlUninitializeBaseMcb+0x41
fffff880`031739d0 fffff800`030d0355 Ntfs!NtfsMcbCleanupLruQueue+0xf6
fffff880`03173b70 fffff800`03362236 nt!ExpWorkerThread+0x111
fffff880`03173c00 fffff800`030b8706 nt!PspSystemThreadStartup+0x5a
fffff880`03173c40 00000000`00000000 nt!KxStartSystemThread+0x16

 

Another BSOD related to memory access strongly indicated problems with the memory modules.

Ran Memtest86+ overnight for 15+ hours.

The next day Memtest86+ had found 160 memory errors…

 

I decided to reseat the memory modules.

Then ran Memtest86+ overnight again for almost 16 hours.

The next day no memory errors were found.

Hoping that the cause and solution for the BSOD crashes has been found. Time will tell.

Examining DRIVER_CORRUPTED_EXPOOL (c5) BSOD

My work computer recently crashed with a BSOD, when disconnecting or reconnecting Cisco AnyConnect Secure Mobility Client.

Considering the circumstances I suspected that Cisco AnyConnect was the culprit, but I wanted to confirm this.

 

I started looking for information in Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x000000c5 (0x00000000760e0002, 0x0000000000000002, 0x0000000000000000, 0xfffff802fba61850). A dump was saved in: C:\WINDOWS\MEMORY.DMP.

 

Needed to examine the memory dump, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck C5, {760e0002, 2, 0, fffff802fba61850}

*** ERROR: Module load completed but symbols could not be loaded for acsock64.sys
Probably caused by : memory_corruption

 

Checked for more details with:

!analyze -v

Part of the result:

DRIVER_CORRUPTED_EXPOOL (c5)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is
caused by drivers that have corrupted the system pool.  Run the driver
verifier against any new (or suspect) drivers, and if that doesn't turn up
the culprit, then use gflags to enable special pool.
Arguments:
Arg1: 00000000760e0002, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, value 0 = read operation, 1 = write operation
Arg4: fffff802fba61850, address which referenced memory

 

The crash definitely seemed to be caused by a driver bug.

Now I wanted to identify which driver caused the problem.

 

Checked the call stack with:

kc

Result:

# Call Site
00 nt!KeBugCheckEx
01 nt!KiBugCheckDispatch
02 nt!KiPageFault
03 nt!ExDeferredFreePool
04 nt!ExFreePoolWithTag
05 acsock64
06 acsock64
07 acsock64
08 NETIO!ProcessCallout
09 NETIO!ProcessFastCalloutClassify
0a NETIO!KfdClassify
0b tcpip!AleNotifyEndpointTeardown
0c tcpip!UdpCleanupEndpointWorkQueueRoutine
0d tcpip!UdpCloseEndpoint
0e afd!AfdTLCloseEndpoint
0f afd!AfdCloseTransportEndpoint
10 afd!AfdCleanupCore
11 afd!AfdDispatch
12 nt!IopCloseFile
13 nt!ObCloseHandleTableEntry
14 nt!NtClose
15 nt!KiSystemServiceCopyEnd
16 0x0

 

Noticed that the acsock64 calls occured just before the crash.

 

Examined available information about acsock64 with:

lmv m acsock64

Part of the result:

start             end                 module name
fffff807`453d0000 fffff807`45406000   acsock64   (no symbols)
Loaded symbol image file: acsock64.sys
Image path: \SystemRoot\system32\DRIVERS\acsock64.sys
Image name: acsock64.sys
Timestamp:        Thu Oct 08 17:12:56 2015 (561687F8)
CheckSum:         0003DD8B
ImageSize:        00036000
Translations:     0000.04b0 0000.04e4 0409.04b0 0409.04e4

 

Noticed that the driver was around 14 months old at the time of writing.

 

Found the driver file in Windows explorer under:

C:\Windows\system32\DRIVERS\acsock64.sys

Then checked Properties. The details view revealed that the driver was:

Cisco AnyConnect Kernel Driver Framework Socket Layer Interceptor

Version: 4.2.1009.0

 

This confirmed my suspicion, Cisco AnyConnect most likely caused the BSOD.

 

At this point I would have updated to the latest version, if updates to Cisco AnyConnect were freely available.

Instead I decided to install and use the Windows port of OpenConnect as an alternative to Cisco AnyConnect.

Examining MEMORY_MANAGEMENT (1a) BSOD

A Lenovo Thinkpad T440p computer recently crashed with a BSOD.

I started looking for clues in Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000001a (0x0000000000041792, 0xffff808000082f70, 0x0004000000000000, 0x0000000000000000). A dump was saved in: C:\WINDOWS\MEMORY.DMP.

 

Decided to examine the memory dump, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck 1A, {41792, ffff808000082f70, 4000000000000, 0}

Probably caused by : memory_corruption

 

Checked for more details with:

!analyze -v

Part of the result:

*************************************************************
*                                                           *
*                    Bugcheck Analysis                      *
*                                                           *
*************************************************************

MEMORY_MANAGEMENT (1a)
# Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000041792, A corrupt PTE has been detected. Parameter 2 contains the address of
the PTE. Parameters 3/4 contain the low/high parts of the PTE.
Arg2: ffff808000082f70
Arg3: 0004000000000000
Arg4: 0000000000000000

 

This issue indicated hardware failure, most likely defective memory.

So I booted Memtest86+ from a USB drive.

Within few minutes it found multiple errors.

 

Tried cleaning the contacts on the memory modules, but it had no effect.

 

Then I tested each memory module separately in both sockets.

In every case the memory test found errors.

 

Decided to test another 8 GB memory module.

Ran Memtest86+ all night and it found no errors on the replacement memory module.

Conclusion

A computer that crashes with a MEMORY_MANAGEMENT (1a) BSOD likely has defective memory.

Test the memory with Memtest86+ or another testprogram.

Then replace any identified defective memory modules.

Examining UNEXPECTED_KERNEL_MODE_TRAP (7f) BSOD with WinDbg

Recently one of my computers was performing slowly and I decided to close some programs.

When closing all Windows explorers the computer crashed with a BSOD

 

After reboot this message was displayed:

windows_has_recovered_from_an_unexpected_shutdown_bsod_7f_08

Windows has recovered from an unexpected shutdown

Problem signature:
Problem Event Name:    BlueScreen
OS Version:    6.1.7601.2.1.0.768.3
Locale ID:    1030

Additional information about the problem:
BCCode:    7f
BCP1:    0000000000000008
BCP2:    0000000080050031
BCP3:    00000000000006F8
BCP4:    FFFFF9600025C28C
OS Version:    6_1_7601
Service Pack:    1_0
Product:    768_1

 

A similar message was logged in Event Viewer:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000007f (0x0000000000000008, 0x0000000080050031, 0x00000000000006f8, 0xfffff9600025c28c). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Decided to examine this further, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck 7F, {8, 80050031, 6f8, fffff9600025c28c}

 

Checked for more details with:

!analyze -v

Part of the result:

*****************************************************************
*                                                               *
*                      Bugcheck Analysis                        *
*                                                               *
*****************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050031
Arg3: 00000000000006f8
Arg4: fffff9600025c28c

 

Decided to search online for: UNEXPECTED_KERNEL_MODE_TRAP (7f) exception double fault

And found this article: Bug Check 0x7F: UNEXPECTED_KERNEL_MODE_TRAP

 

It mentioned that a common cause for this error was kernel stack overflow.

So I checked the stack usage with:

!stackusage

The result was:

Stack Usage By Function
===================================================================

Size     Count  Module
0x00001500        14  win32k!xxxDesktopWndProcWorker
0x00000DD0        13  win32k!xxxInterSendMsgEx
0x00000D20        14  win32k!xxxReceiveMessage
0x000009C0        13  win32k!xxxRealSleepThread
0x000008F0        13  win32k!xxxSendMessageTimeout
0x000004E0        13  win32k!xxxBeginPaint
0x00000410        13  win32k!xxxSendNCPaint
0x00000400         2  win32k!RGNOBJAPI::bSubtractComplex
0x00000380         1  win32k!CalcVisRgnWorker
0x000002A0        14  win32k!xxxDesktopWndProc
0x00000270        13  win32k!xxxSleepThread
0x000000E0         1  win32k!xxxRealInternalGetMessage
0x00000090         1  win32k!xxxHandleDesktopMessages
0x00000090         1  win32k!GetDCEx
0x00000080         1  win32k!xxxDesktopThread
0x00000080         1  win32k!RGNOBJAPI::bSubtract
0x00000060         1  win32k!xxxBeginPaint
0x00000050         1  win32k!GreSubtractRgnRectList
0x00000040         1  win32k!xxxInternalGetMessage
0x00000030         1  win32k!xxxCreateSystemThreads
0x00000030         1  win32k!CalcVisRgn
0x00000030         1  win32k!NtUserCallNoParam

Total Size: 0x00005CA0


Stack Usage By Module
===================================================================

Size     Count  Module
0x00005CA0       134  win32k

Total Size: 0x00005CA0

 

Noticed that 0x5CA0 = 23712 bytes is very close to the 24K kernel stack limit for 64-bit Windows.

The most likely cause for the BSOD was indeed kernel stack overflow.

 

Examined the call stack with:

kf

Part of the result:

#   Memory  Child-SP          RetAddr           Call Site
00           fffff800`0449ed28 fffff800`030889e9 nt!KeBugCheckEx
01         8 fffff800`0449ed30 fffff800`03086eb2 nt!KeSynchronizeExecution+0x3d39
02       140 fffff800`0449ee70 fffff960`0025c28c nt!KeSynchronizeExecution+0x2202
03   20f4110 fffff880`06592f80 fffff960`0025c1c1 win32k!RGNOBJAPI::bSubtractComplex+0x44
04       400 fffff880`06593380 fffff960`001d3379 win32k!RGNOBJAPI::bSubtract+0x871
05        80 fffff880`06593400 fffff960`0014d7c4 win32k!GreSubtractRgnRectList+0x49
06        50 fffff880`06593450 fffff960`000c5c47 win32k!CalcVisRgnWorker+0x314
07       380 fffff880`065937d0 fffff960`000d23be win32k!CalcVisRgn+0x97
08        30 fffff880`06593800 fffff960`00056cf6 win32k!GetDCEx+0x50e
09        90 fffff880`06593890 fffff960`0009ede7 win32k!xxxBeginPaint+0x192
0a        60 fffff880`065938f0 fffff960`0009ead5 win32k!xxxDesktopWndProcWorker+0x2f7
0b       180 fffff880`06593a70 fffff960`00094013 win32k!xxxDesktopWndProc+0x55
0c        30 fffff880`06593aa0 fffff960`000def3f win32k!xxxReceiveMessage+0x6cb

 

Noticed the very large stack frame difference at:

win32k!RGNOBJAPI::bSubtractComplex+0x44

 

Examined details about win32k.sys with:

lmDvm win32k

Part of the result:

Image path: \SystemRoot\System32\win32k.sys
Image name: win32k.sys
Timestamp:        Mon Sep 12 22:37:06 2016 (57D711F2)
CheckSum:         00314AE8
ImageSize:        00326000
File version:     6.1.7601.23545
Product version:  6.1.7601.23545
File flags:       0 (Mask 3F)
File OS:          40004 NT Win32
File type:        3.7 Driver
File date:        00000000.00000000
Translations:     0409.04b0
CompanyName:      Microsoft Corporation
ProductName:      Microsoft® Windows® Operating System
InternalName:     win32k.sys
OriginalFilename: win32k.sys
ProductVersion:   6.1.7601.23545
FileVersion:      6.1.7601.23545 (win7sp1_ldr.160912-1137)
FileDescription:  Multi-User Win32 Driver
LegalCopyright:   © Microsoft Corporation. All rights reserved.

 

Noticed that win32k.sys had been updated recently.

Knowing that win32k.sys interacts with the graphics driver, I decided to update this to the latest version.

Examine WHEA_UNCORRECTABLE_ERROR (124) BSOD with WinDbg

One of my computers recently crashed with a BSOD.

This occurs very rarely so I decided to identify the cause.

Troubleshooting

I checked the system event log for a bugcheck and found this:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
...
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x00000124 (0x0000000000000000, 0xfffffa800d91b038, 0x00000000b2004000, 0x0000000029000175). A dump was saved in: C:\Windows\MEMORY.DMP. Report Id: 040116-21216-01.

 

I decided to examine the C:\Windows\MEMORY.DMP crash dump with WinDbg. (In this case the x64 version of WinDbg)

WinDbg’s !analyze command usually reveals relevant information about a BSOD, so that’s what I checked first:

Run: !analyze -v

0: kd> !analyze -v

...
WHEA_UNCORRECTABLE_ERROR (124)
A fatal hardware error has occurred. Parameter 1 identifies the type of error
source that reported the error. Parameter 2 holds the address of the
WHEA_ERROR_RECORD structure that describes the error conditon.
Arguments:
Arg1: 0000000000000000, Machine Check Exception
Arg2: fffffa800d91b038, Address of the WHEA_ERROR_RECORD structure.
Arg3: 00000000b2004000, High order 32-bits of the MCi_STATUS value.
Arg4: 0000000029000175, Low order 32-bits of the MCi_STATUS value.

...

PRIMARY_PROBLEM_CLASS:  X64_0x124_AuthenticAMD_PROCESSOR_CACHE

 

It seemed to be a hardware error related to the processor cache.

For more details I looked at the WHEA_ERROR_RECORD information:

(Only section 2 with the actual error shown)

0: kd> !errrec fffffa800d91b038
===============================================================================
Common Platform Error Record @ fffffa800d91b038
-------------------------------------------------------------------------------
Record Id     : 01d17af4b6b560a4
Severity      : Fatal (1)
Length        : 928
Creator       : Microsoft
Notify Type   : Machine Check Exception
Timestamp     : 4/1/2016 18:19:44 (UTC)
Flags         : 0x00000000

...

===============================================================================
Section 2     : x86/x64 MCA
-------------------------------------------------------------------------------
Descriptor    @ fffffa800d91b148
Section       @ fffffa800d91b2d0
Offset        : 664
Length        : 264
Flags         : 0x00000000
Severity      : Fatal

Error         : DCACHEL1_EVICT_ERR (Proc 0 Bank 0)
Status      : 0xb200400029000175

 

Apparently a hardware error releated to the level 1 data cache caused the system crash.

 

The computer in question has an AMD Athlon II X2 280 CPU.

Using CPU-Z I noticed that the core voltage seemed a little low for this CPU.

I remembered that I had undervolted the CPU to save power.

(Did not have reliability problems with it for years until now)

 

I checked the BIOS settings and discovered that the CPU was undervolted by -0,15 volts.

I decided to change it to -0,1 volts.

If any other reliability problems occur, I will change it back to standard voltage.

Conclusion

If hardware is running out of specifications and system crashes occur, then adjust settings closer to specifications.

(Examples of running out of spec: Undervolting, overvolting and overclocking)