Examining MEMORY_MANAGEMENT (1a) BSOD

A Lenovo Thinkpad T440p computer recently crashed with a BSOD.

I started looking for clues in Event Viewer and found:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000001a (0x0000000000041792, 0xffff808000082f70, 0x0004000000000000, 0x0000000000000000). A dump was saved in: C:\WINDOWS\MEMORY.DMP.

 

Decided to examine the memory dump, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck 1A, {41792, ffff808000082f70, 4000000000000, 0}

Probably caused by : memory_corruption

 

Checked for more details with:

!analyze -v

Part of the result:

*************************************************************
*                                                           *
*                    Bugcheck Analysis                      *
*                                                           *
*************************************************************

MEMORY_MANAGEMENT (1a)
# Any other values for parameter 1 must be individually examined.
Arguments:
Arg1: 0000000000041792, A corrupt PTE has been detected. Parameter 2 contains the address of
the PTE. Parameters 3/4 contain the low/high parts of the PTE.
Arg2: ffff808000082f70
Arg3: 0004000000000000
Arg4: 0000000000000000

 

This issue indicated hardware failure, most likely defective memory.

So I booted Memtest86+ from a USB drive.

Within few minutes it found multiple errors.

 

Tried cleaning the contacts on the memory modules, but it had no effect.

 

Then I tested each memory module separately in both sockets.

In every case the memory test found errors.

 

Decided to test another 8 GB memory module.

Ran Memtest86+ all night and it found no errors on the replacement memory module.

Conclusion

A computer that crashes with a MEMORY_MANAGEMENT (1a) BSOD likely has defective memory.

Test the memory with Memtest86+ or another testprogram.

Then replace any identified defective memory modules.

Cleaning computer cooling system to improve performance

While using a laptop computer I noticed high noise levels, caused by the cooling fan.

I decided to check the CPU temperatures using HWMonitor.

The idle temperatures were around 70° C.

And load temperatures were around 85°-94° C.

 

These temperatures could be high enough to cause thermal throttling, which would affect system performance.

Therefore I measured performance with 7-Zip and CPU-Z benchmarks.

 

I decided to disassemble the laptop computer and clean the cooling system, using compressed air.

(Be aware that most compressed air cans contain harmful chemicals, so use them outside or in a well ventilated area)

 

Cleaning the cooling system had a significant effect.

Now idle temperatures were much lower, around 55° C.

And load temperatures were also lower, around 74°-85° C.

Also noticed much lower fan speeds, so the computer wasn’t as noisy as before.

 

Table of thermal readings:

Idle Load
Before cleaning 70° C 85°-94°
After cleaning 55° C 74°-85° C
Improvement 15° C 9°-11° C

 

The system had been affected by thermal throttling, because the 7-Zip and CPU-Z benchmarks improved.

 

Table of benchmark results:

7-Zip score CPU-Z single CPU-Z multi
Before cleaning 12491 1321 4572
After cleaning 16572 1510 5494
Improvement 32,6% 14,3% 20,2%

 

Conclusion

It can be relevant to clean a computers cooling system.

It may improve noise levels, temperatures and performance.

Examining UNEXPECTED_KERNEL_MODE_TRAP (7f) BSOD with WinDbg

Recently one of my computers was performing slowly and I decided to close some programs.

When closing all Windows explorers the computer crashed with a BSOD

 

After reboot this message was displayed:

windows_has_recovered_from_an_unexpected_shutdown_bsod_7f_08

Windows has recovered from an unexpected shutdown

Problem signature:
Problem Event Name:    BlueScreen
OS Version:    6.1.7601.2.1.0.768.3
Locale ID:    1030

Additional information about the problem:
BCCode:    7f
BCP1:    0000000000000008
BCP2:    0000000080050031
BCP3:    00000000000006F8
BCP4:    FFFFF9600025C28C
OS Version:    6_1_7601
Service Pack:    1_0
Product:    768_1

 

A similar message was logged in Event Viewer:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000007f (0x0000000000000008, 0x0000000080050031, 0x00000000000006f8, 0xfffff9600025c28c). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Decided to examine this further, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck 7F, {8, 80050031, 6f8, fffff9600025c28c}

 

Checked for more details with:

!analyze -v

Part of the result:

*****************************************************************
*                                                               *
*                      Bugcheck Analysis                        *
*                                                               *
*****************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050031
Arg3: 00000000000006f8
Arg4: fffff9600025c28c

 

Decided to search online for: UNEXPECTED_KERNEL_MODE_TRAP (7f) exception double fault

And found this article: Bug Check 0x7F: UNEXPECTED_KERNEL_MODE_TRAP

 

It mentioned that a common cause for this error was kernel stack overflow.

So I checked the stack usage with:

!stackusage

The result was:

Stack Usage By Function
===================================================================

Size     Count  Module
0x00001500        14  win32k!xxxDesktopWndProcWorker
0x00000DD0        13  win32k!xxxInterSendMsgEx
0x00000D20        14  win32k!xxxReceiveMessage
0x000009C0        13  win32k!xxxRealSleepThread
0x000008F0        13  win32k!xxxSendMessageTimeout
0x000004E0        13  win32k!xxxBeginPaint
0x00000410        13  win32k!xxxSendNCPaint
0x00000400         2  win32k!RGNOBJAPI::bSubtractComplex
0x00000380         1  win32k!CalcVisRgnWorker
0x000002A0        14  win32k!xxxDesktopWndProc
0x00000270        13  win32k!xxxSleepThread
0x000000E0         1  win32k!xxxRealInternalGetMessage
0x00000090         1  win32k!xxxHandleDesktopMessages
0x00000090         1  win32k!GetDCEx
0x00000080         1  win32k!xxxDesktopThread
0x00000080         1  win32k!RGNOBJAPI::bSubtract
0x00000060         1  win32k!xxxBeginPaint
0x00000050         1  win32k!GreSubtractRgnRectList
0x00000040         1  win32k!xxxInternalGetMessage
0x00000030         1  win32k!xxxCreateSystemThreads
0x00000030         1  win32k!CalcVisRgn
0x00000030         1  win32k!NtUserCallNoParam

Total Size: 0x00005CA0


Stack Usage By Module
===================================================================

Size     Count  Module
0x00005CA0       134  win32k

Total Size: 0x00005CA0

 

Noticed that 0x5CA0 = 23712 bytes is very close to the 24K kernel stack limit for 64-bit Windows.

The most likely cause for the BSOD was indeed kernel stack overflow.

 

Examined the call stack with:

kf

Part of the result:

#   Memory  Child-SP          RetAddr           Call Site
00           fffff800`0449ed28 fffff800`030889e9 nt!KeBugCheckEx
01         8 fffff800`0449ed30 fffff800`03086eb2 nt!KeSynchronizeExecution+0x3d39
02       140 fffff800`0449ee70 fffff960`0025c28c nt!KeSynchronizeExecution+0x2202
03   20f4110 fffff880`06592f80 fffff960`0025c1c1 win32k!RGNOBJAPI::bSubtractComplex+0x44
04       400 fffff880`06593380 fffff960`001d3379 win32k!RGNOBJAPI::bSubtract+0x871
05        80 fffff880`06593400 fffff960`0014d7c4 win32k!GreSubtractRgnRectList+0x49
06        50 fffff880`06593450 fffff960`000c5c47 win32k!CalcVisRgnWorker+0x314
07       380 fffff880`065937d0 fffff960`000d23be win32k!CalcVisRgn+0x97
08        30 fffff880`06593800 fffff960`00056cf6 win32k!GetDCEx+0x50e
09        90 fffff880`06593890 fffff960`0009ede7 win32k!xxxBeginPaint+0x192
0a        60 fffff880`065938f0 fffff960`0009ead5 win32k!xxxDesktopWndProcWorker+0x2f7
0b       180 fffff880`06593a70 fffff960`00094013 win32k!xxxDesktopWndProc+0x55
0c        30 fffff880`06593aa0 fffff960`000def3f win32k!xxxReceiveMessage+0x6cb

 

Noticed the very large stack frame difference at:

win32k!RGNOBJAPI::bSubtractComplex+0x44

 

Examined details about win32k.sys with:

lmDvm win32k

Part of the result:

Image path: \SystemRoot\System32\win32k.sys
Image name: win32k.sys
Timestamp:        Mon Sep 12 22:37:06 2016 (57D711F2)
CheckSum:         00314AE8
ImageSize:        00326000
File version:     6.1.7601.23545
Product version:  6.1.7601.23545
File flags:       0 (Mask 3F)
File OS:          40004 NT Win32
File type:        3.7 Driver
File date:        00000000.00000000
Translations:     0409.04b0
CompanyName:      Microsoft Corporation
ProductName:      Microsoft® Windows® Operating System
InternalName:     win32k.sys
OriginalFilename: win32k.sys
ProductVersion:   6.1.7601.23545
FileVersion:      6.1.7601.23545 (win7sp1_ldr.160912-1137)
FileDescription:  Multi-User Win32 Driver
LegalCopyright:   © Microsoft Corporation. All rights reserved.

 

Noticed that win32k.sys had been updated recently.

Knowing that win32k.sys interacts with the graphics driver, I decided to update this to the latest version.

Recommending heatsink for Raspberry Pi 3

When experimenting with a Raspberry Pi 3 I noticed that the CPU could get pretty warm.

I decided to measure the temperature during idle and load with:

cat /sys/class/thermal/thermal_zone0/temp

And load tested the CPU with 7-zip benchmark:

7zr b

 

Measured these temperatures:

Idle Load
Without heatsink 50-52 °C 77-79 °C
With heatsink 47-49 °C 65-67 °C
Improvement 3 °C 12 °C

 

I used a small aluminium heatsink with thermal tape:

raspberry_pi_3_with_heatsink

 

Conclusion

The 12 degree reduction in load temperature is well worth the effort.

It should prevent thermal throttling and thereby ensure optimal performance.

Linux hang at boot due to authentication problems with Samba mount in /etc/fstab

A Raspberry Pi 3 had a problem after I had performed a number of system changes.

When booting it would hang for around 2 minutes showing:

systemd-hostnamed.service

 

I examined system messages from the boot process with:

dmesg

Part of the result was:

[   98.748722] CIFS VFS: Send error in SessSetup = -13
[   98.749092] CIFS VFS: cifs_mount failed w/return code = -13
[  125.427742] usb 1-1.5.2.3: reset high-speed USB device number 9 using dwc_otg
[  188.946712] Status code returned 0xc000006d NT_STATUS_LOGON_FAILURE

 

I remembered that I had experimented with various ways to mount a Samba share in: /etc/fstab

This didn’t work because of authentication problems and (as a side effect) it delayed startup significantly.

 

I decided to solve the problem by:

1. Removing the non-functional line from /etc/fstab

2. Creating a script to be run on demand:

mount_RPServer_shared.sh

With content similar to this:

#/bin/bash
sudo mount -t cifs -o user=pi //server/shared /mnt/server_shared/

 

The script relies on the mount.cifs command.

For Debian based Linux distributions this is part of the cifs-utils package.

Unattended Windows setup may fail due to long computer name

While testing unattended Windows deployment I encountered another problem.

After the first installation reboot, Windows setup would fail with the error message:

Windows could not parse or process unattend answer file [C:\Windows\Panther\unattend.xml] for pass [specialize]. The answer file is invalid.

 

When closing the dialogbox the computer restarted.

 

Then it failed with this error message:

The computer restarted unexpectedly or encountered an unexpected error. Windows instalation cannot proceed. To install Windows, click "OK" to restart the computer, and then restart the installation.

 

Started troubleshooting by booting into Windows PE.

Started examining the log files under:

C:\Windows\Panther

 

Found these error messages in setupact.log:

IBS    The provided unattend file is not valid; hrResult = 0x80220005
IBS    Callback_Unattend_InitEngine:The provided unattend file [C:\Windows\Panther\unattend.xml] is not a valid unattended Setup answer file; hr = 0x1, hrSearched = 0x1, hrDeserialized = 0x0, hrImplicitCtx = 0x0, hrValidated = 0x1, hrResult = 0x80220005
IBS    UnattendErrorFromResults: Error text = Windows could not parse or process unattend answer file [C:\Windows\Panther\unattend.xml] for pass [specialize]. The answer file is invalid.
IBS    Callback_Unattend_InitEngine:An error occurred while finding/loading the unattend file; hr = 0x1, hrResult = 0x80220005

 

This didn’t reveal the exact cause of the error, but it inspired me to examine unattend.xml.

I transferred unattend.xml to my work computer and then tried opening the unattend.xml file with Windows System Image Manager.

 

However this failed with:

windows_sim_validation_error_waspassprocessed_attribute_is_not_declared

Validation error on D:\ToBeDeleted\Fail_Info\Panther\unattend.xml, line 18, column 32.

Details: The 'wasPassProcessed' attribute is not declared.

 

Opened the unattend.xml file in an editor and removed all instances of:

 wasPassProcessed="true"

 

After that the file could be opened in Windows SIM, which now displayed the real error:

The 'ComputerName' element is invalid - The value 'DeployTestPhysical' is invalid according to its datatype 'ComputerNameType' - The actual length is greater than the MaxLength value.

 

So the problem was that computer name was too long…

 

The unattend.xml file was modified by a custom program just before deployment on the target computer.

This custom program did not prevent computer names longer than 15 characters.

The problem was fixed by updating the custom program.

Conclusion

Special care should be taken when modifying unattend.xml by scripts or programs.

An invalid unattend.xml file may cause problems during Windows setup, which are not immediately obvious.