Examining UNEXPECTED_KERNEL_MODE_TRAP (7f) BSOD with WinDbg

Recently one of my computers was performing slowly and I decided to close some programs.

When closing all Windows explorers the computer crashed with a BSOD

 

After reboot this message was displayed:

windows_has_recovered_from_an_unexpected_shutdown_bsod_7f_08

Windows has recovered from an unexpected shutdown

Problem signature:
Problem Event Name:    BlueScreen
OS Version:    6.1.7601.2.1.0.768.3
Locale ID:    1030

Additional information about the problem:
BCCode:    7f
BCP1:    0000000000000008
BCP2:    0000000080050031
BCP3:    00000000000006F8
BCP4:    FFFFF9600025C28C
OS Version:    6_1_7601
Service Pack:    1_0
Product:    768_1

 

A similar message was logged in Event Viewer:

Log Name:      System
Source:        Microsoft-Windows-WER-SystemErrorReporting
Event ID:      1001
Task Category: None
Level:         Error
Keywords:      Classic
Description:
The computer has rebooted from a bugcheck.  The bugcheck was: 0x0000007f (0x0000000000000008, 0x0000000080050031, 0x00000000000006f8, 0xfffff9600025c28c). A dump was saved in: C:\Windows\MEMORY.DMP.

 

Decided to examine this further, so started WinDbg (x64) and opened:

C:\Windows\Memory.dmp

This message was displayed:

BugCheck 7F, {8, 80050031, 6f8, fffff9600025c28c}

 

Checked for more details with:

!analyze -v

Part of the result:

*****************************************************************
*                                                               *
*                      Bugcheck Analysis                        *
*                                                               *
*****************************************************************

UNEXPECTED_KERNEL_MODE_TRAP (7f)
This means a trap occurred in kernel mode, and it's a trap of a kind
that the kernel isn't allowed to have/catch (bound trap) or that
is always instant death (double fault).  The first number in the
bugcheck params is the number of the trap (8 = double fault, etc)
Consult an Intel x86 family manual to learn more about what these
traps are. Here is a *portion* of those codes:
If kv shows a taskGate
use .tss on the part before the colon, then kv.
Else if kv shows a trapframe
use .trap on that value
Else
.trap on the appropriate frame will show where the trap was taken
(on x86, this will be the ebp that goes with the procedure KiTrap)
Endif
kb will then show the corrected stack.
Arguments:
Arg1: 0000000000000008, EXCEPTION_DOUBLE_FAULT
Arg2: 0000000080050031
Arg3: 00000000000006f8
Arg4: fffff9600025c28c

 

Decided to search online for: UNEXPECTED_KERNEL_MODE_TRAP (7f) exception double fault

And found this article: Bug Check 0x7F: UNEXPECTED_KERNEL_MODE_TRAP

 

It mentioned that a common cause for this error was kernel stack overflow.

So I checked the stack usage with:

!stackusage

The result was:

Stack Usage By Function
===================================================================

Size     Count  Module
0x00001500        14  win32k!xxxDesktopWndProcWorker
0x00000DD0        13  win32k!xxxInterSendMsgEx
0x00000D20        14  win32k!xxxReceiveMessage
0x000009C0        13  win32k!xxxRealSleepThread
0x000008F0        13  win32k!xxxSendMessageTimeout
0x000004E0        13  win32k!xxxBeginPaint
0x00000410        13  win32k!xxxSendNCPaint
0x00000400         2  win32k!RGNOBJAPI::bSubtractComplex
0x00000380         1  win32k!CalcVisRgnWorker
0x000002A0        14  win32k!xxxDesktopWndProc
0x00000270        13  win32k!xxxSleepThread
0x000000E0         1  win32k!xxxRealInternalGetMessage
0x00000090         1  win32k!xxxHandleDesktopMessages
0x00000090         1  win32k!GetDCEx
0x00000080         1  win32k!xxxDesktopThread
0x00000080         1  win32k!RGNOBJAPI::bSubtract
0x00000060         1  win32k!xxxBeginPaint
0x00000050         1  win32k!GreSubtractRgnRectList
0x00000040         1  win32k!xxxInternalGetMessage
0x00000030         1  win32k!xxxCreateSystemThreads
0x00000030         1  win32k!CalcVisRgn
0x00000030         1  win32k!NtUserCallNoParam

Total Size: 0x00005CA0


Stack Usage By Module
===================================================================

Size     Count  Module
0x00005CA0       134  win32k

Total Size: 0x00005CA0

 

Noticed that 0x5CA0 = 23712 bytes is very close to the 24K kernel stack limit for 64-bit Windows.

The most likely cause for the BSOD was indeed kernel stack overflow.

 

Examined the call stack with:

kf

Part of the result:

#   Memory  Child-SP          RetAddr           Call Site
00           fffff800`0449ed28 fffff800`030889e9 nt!KeBugCheckEx
01         8 fffff800`0449ed30 fffff800`03086eb2 nt!KeSynchronizeExecution+0x3d39
02       140 fffff800`0449ee70 fffff960`0025c28c nt!KeSynchronizeExecution+0x2202
03   20f4110 fffff880`06592f80 fffff960`0025c1c1 win32k!RGNOBJAPI::bSubtractComplex+0x44
04       400 fffff880`06593380 fffff960`001d3379 win32k!RGNOBJAPI::bSubtract+0x871
05        80 fffff880`06593400 fffff960`0014d7c4 win32k!GreSubtractRgnRectList+0x49
06        50 fffff880`06593450 fffff960`000c5c47 win32k!CalcVisRgnWorker+0x314
07       380 fffff880`065937d0 fffff960`000d23be win32k!CalcVisRgn+0x97
08        30 fffff880`06593800 fffff960`00056cf6 win32k!GetDCEx+0x50e
09        90 fffff880`06593890 fffff960`0009ede7 win32k!xxxBeginPaint+0x192
0a        60 fffff880`065938f0 fffff960`0009ead5 win32k!xxxDesktopWndProcWorker+0x2f7
0b       180 fffff880`06593a70 fffff960`00094013 win32k!xxxDesktopWndProc+0x55
0c        30 fffff880`06593aa0 fffff960`000def3f win32k!xxxReceiveMessage+0x6cb

 

Noticed the very large stack frame difference at:

win32k!RGNOBJAPI::bSubtractComplex+0x44

 

Examined details about win32k.sys with:

lmDvm win32k

Part of the result:

Image path: \SystemRoot\System32\win32k.sys
Image name: win32k.sys
Timestamp:        Mon Sep 12 22:37:06 2016 (57D711F2)
CheckSum:         00314AE8
ImageSize:        00326000
File version:     6.1.7601.23545
Product version:  6.1.7601.23545
File flags:       0 (Mask 3F)
File OS:          40004 NT Win32
File type:        3.7 Driver
File date:        00000000.00000000
Translations:     0409.04b0
CompanyName:      Microsoft Corporation
ProductName:      Microsoft® Windows® Operating System
InternalName:     win32k.sys
OriginalFilename: win32k.sys
ProductVersion:   6.1.7601.23545
FileVersion:      6.1.7601.23545 (win7sp1_ldr.160912-1137)
FileDescription:  Multi-User Win32 Driver
LegalCopyright:   © Microsoft Corporation. All rights reserved.

 

Noticed that win32k.sys had been updated recently.

Knowing that win32k.sys interacts with the graphics driver, I decided to update this to the latest version.

Unattended Windows setup may fail due to long computer name

While testing unattended Windows deployment I encountered another problem.

After the first installation reboot, Windows setup would fail with the error message:

Windows could not parse or process unattend answer file [C:\Windows\Panther\unattend.xml] for pass [specialize]. The answer file is invalid.

 

When closing the dialogbox the computer restarted.

 

Then it failed with this error message:

The computer restarted unexpectedly or encountered an unexpected error. Windows instalation cannot proceed. To install Windows, click "OK" to restart the computer, and then restart the installation.

 

Started troubleshooting by booting into Windows PE.

Started examining the log files under:

C:\Windows\Panther

 

Found these error messages in setupact.log:

IBS    The provided unattend file is not valid; hrResult = 0x80220005
IBS    Callback_Unattend_InitEngine:The provided unattend file [C:\Windows\Panther\unattend.xml] is not a valid unattended Setup answer file; hr = 0x1, hrSearched = 0x1, hrDeserialized = 0x0, hrImplicitCtx = 0x0, hrValidated = 0x1, hrResult = 0x80220005
IBS    UnattendErrorFromResults: Error text = Windows could not parse or process unattend answer file [C:\Windows\Panther\unattend.xml] for pass [specialize]. The answer file is invalid.
IBS    Callback_Unattend_InitEngine:An error occurred while finding/loading the unattend file; hr = 0x1, hrResult = 0x80220005

 

This didn’t reveal the exact cause of the error, but it inspired me to examine unattend.xml.

I transferred unattend.xml to my work computer and then tried opening the unattend.xml file with Windows System Image Manager.

 

However this failed with:

windows_sim_validation_error_waspassprocessed_attribute_is_not_declared

Validation error on D:\ToBeDeleted\Fail_Info\Panther\unattend.xml, line 18, column 32.

Details: The 'wasPassProcessed' attribute is not declared.

 

Opened the unattend.xml file in an editor and removed all instances of:

 wasPassProcessed="true"

 

After that the file could be opened in Windows SIM, which now displayed the real error:

The 'ComputerName' element is invalid - The value 'DeployTestPhysical' is invalid according to its datatype 'ComputerNameType' - The actual length is greater than the MaxLength value.

 

So the problem was that computer name was too long…

 

The unattend.xml file was modified by a custom program just before deployment on the target computer.

This custom program did not prevent computer names longer than 15 characters.

The problem was fixed by updating the custom program.

Conclusion

Special care should be taken when modifying unattend.xml by scripts or programs.

An invalid unattend.xml file may cause problems during Windows setup, which are not immediately obvious.

Unattended Windows setup may fail due to wrongly encoded password

While testing unattended Windows deployment I encountered a problem.

After the first installation reboot, Windows setup would fail with an empty dialog box:

install_windows_empty

 

Followed by the error message:

install_windows_windows_could_not_complete_the_installation

Windows could not complete the installation. To install Windows on this computer, restart the installation.

 

Then the machine would countinually reboot and show the last message.

 

Started troubleshooting by booting into Windows PE.

Then examined setuperr.log and setupact.log under:

C:\Windows\Panther\

However these files contained no useful clues.

 

I checked the other files left by Windows setup, and examined the files under:

C:\Windows\Panther\UnattendGC\

Found this in setuperr.log:

[oobeldr.exe] [Action Queue] : Unattend action failed with exit code 4
[oobeldr.exe] Execution of unattend GCs failed; hr = 0x0; pResults->hrResult = 0x8030000b
[oobeldr.exe] User input error was detected in unattend file. Error: [0x0]

[windeploy.exe] Command [%windir%\system32\oobe\oobeldr.exe /system] failed with exit code [0x8030000b]
[windeploy.exe] Failure occured during online installation.  Online installation cannot complete at this time.; hr = 0x8030000b

 

But it was a warning in setupact.log which revealed the cause behind the error:

[Shell Unattend] Failed to decode password (0x8007000d)

 

My initial understanding of the user password format in AutoUnattend.xml turned out to be wrong.

This problem was not discovered by Windows System Image Manager, because AutoUnattend.xml was updated just before deployment by a custom program on the target computer.

 

Local user passwords are encoded like this in AutoUnattend.xml:

1. Text is initially encoded as unicode / UTF-16.

2. Then a “Password” string is appended to the password.
Example: The password “1234” is represented as “1234Password”

3. Finally the password string is base64 encoded.

Problems when reusing AutoUnattend.xml with new Windows image

I recently experienced problems when reusing AutoUnattend.xml, after having upgraded the Windows image:

From: Windows 10 Enterprise 2015 LTSB

To: Windows 10 Enterprise 2016 LTSB

 

The unattended installation would start and run, but eventually failed with:

windows_setup_windows_could_not_apply_unattend_settings_during_pass_offlineservicing

Windows could not apply unattend settings during pass [offlineServicing].

 

Examined the installation logfiles under:

\$WINDOWS.~BT\Sources\Panther

 

setuperr.log only contained:

2016-10-20 15:20:20, Error      [0x0606ae] IBS    [SetupCl library] Required profile hive does not exist: [\??\D:\WINDOWS\system32\config\systemprofile\NTUSER.DAT].
2016-10-20 15:20:37, Error      [0x0604a7] IBS    InstantiateCBSUnattendPass: dism.exe returned with failing exit code -2146498555
2016-10-20 15:20:37, Error      [0x060431] IBS    Callback_UnattendInitiatePass: An error occurred while initiating unattend passes; hr = 0x80004005

 

setupact.log contained no additional useful information.

 

However cbs_unattend.log contained an explanation:

2016-10-20 15:20:36, Error                 DISM   DISM Package Manager: PID=2348 TID=2368 Failed opening package Microsoft-Windows-Foundation-Package~31bf3856ad364e35~amd64~~10.0.10240.16384. - CDISMPackageManager::Internal_CreatePackageByName(hr:0x800f0805)
2016-10-20 15:20:36, Error                 DISM   DISM Package Manager: PID=2348 TID=2368 Failed to get the underlying cbs package. - CDISMPackageManager::OpenPackageByName(hr:0x800f0805)
2016-10-20 15:20:36, Error                 DISM   DISM Package Manager: PID=2348 TID=2368 The specified package is not valid Windows package. - GetCbsErrorMsg

 

I wondered why the Microsoft-Windows-Foundation-Package could not be opened, so I went back to Windows System Image Manager.

windows_system_image_manager_unknown_package

There I noticed that the package was unknown, because the version numbers had changed.

Be aware that I had run: Tools -> Validate Answer File

(Which did not show warnings about this)

 

Added the new Microsoft-Windows-Foundation-Package

Then copied all the settings

After verifying that all settings had been copied, I deleted the unknown package.

(It would probably have been easier to update the version number in AutoUnattend.xml, which I recommend trying first)

 

Then ran: Tools -> Validate Answer File

Which now showed these warnings:

Windows Feature is enabled but one or more of its dependencies have not been enabled in the answer file.

Packages/Foundation/amd64_Microsoft-Windows-Foundation-Package_10.0.14393.0__31bf3856ad364e35_/Client-DeviceLockdown/Client-EmbeddedBootExp
Packages/Foundation/amd64_Microsoft-Windows-Foundation-Package_10.0.14393.0__31bf3856ad364e35_/Client-DeviceLockdown/Client-EmbeddedLogon
Packages/Foundation/amd64_Microsoft-Windows-Foundation-Package_10.0.14393.0__31bf3856ad364e35_/Client-DeviceLockdown/Client-EmbeddedShellLauncher

 

I was challenged by this until I found the answer here:

http://blog.theatticnetwork.net/2014/08/windows-unattend-file-notes/

 

I had to right-click the feature in question and choose: Enable Parent Features

windows_system_image_manager_enable_parent_features

 

After making these changes new configuration sets / deployment images could be installed successfully.

Conclusion

When reusing AutoUnattend.xml with a new Windows image, please look for unknown packages in Windows SIM in addition to validating the answer file.

If any unknown packages are found, please update the version numbers in AutoUnattend.xml.

Separate, physical trackpoint buttons on Lenovo Thinkpad T440p

The Lenovo Thinkpad T440p (and other models of that generation) is delivered with a touchpad, without separate physical left, middle and right buttons.

Instead the entire pad clicks and reacts depending on the area touched.

 

In my subjective opinion these buttons feel spongy and imprecise.

In use it’s common to make mistakes by clicking another button than expected.

This makes the laptop less productive and frustrating to use.

 

However it’s possible to replace the touchpad with the one from the Lenovo Thinkpad T450, which has 3 separate, physical trackpoint buttons.

thinkpad_t440p_with_t450_trackpad

 

The first challenge is getting the right replacement part, with the dimensions 10 cm x 7,5 cm.

thinkpad_t440p_clickpad_horizontal

thinkpad_t440p_clickpad_vertical

It’s not available as a separate part from Lenovo, but is sold as part of the keyboard bezel.

The part number I found and used was: 00HN550

 

Be careful with online sellers claiming to sell touchpads that fit a long range of Thinkpad models.

They may fit electrically, but possibly not physically.

If you are considering performing this replacement, please verify that the part fits your particular Thinkpad model.

 

The next challenge is to disassemble the laptop and performing the replacement.

I refer to the hardware maintenance manual and online guides.

 

The final challenge is to solve driver problems on Windows.

The hardware ID for the touchpad is on the motherboard, which remains unchanged.

The default Synaptics Pointing Device drivers are not compatible and won’t work.

 

Simplest way to solve the driver problems on Windows:

1. Connect a USB mouse, because the trackpoint won’t work reliably until these steps have been completed.

2. Uninstall the Synaptics Pointer Device drivers using Programs and Features.

3. Restart the computer.

4. Remove any remaining Synaptics components by opening Control Panel -> Mouse

If asked: Do you want to uninstall the Synaptics driver now?

Then select yes and OK to the following dialogs:

synaptics_driver_uninstall1

synaptics_driver_uninstall2

synaptics_driver_uninstall3

5. Restart the computer.

6. Now the trackpoint and 3 physical buttons should work with a default mouse driver.

(Be aware that I have disabled the rest of the touchpad, so I don’t know if it works with the default mouse driver)

 

With Windows 10 extra steps are needed, because it can automatically install incompatible drivers.

This can be prevented by downloading and running the “Show or hide updates” program (wushowhide.diagcab) from:

https://support.microsoft.com/en-us/kb/3073930?utm_source=twitter

 

1. Click: Advanced

disable_driver_updates1

2. Deselect: Apply repairs automatically

disable_driver_updates2

3. Click: Next

4. Click: Hide updates

disable_driver_updates3

5. Select: Synaptics – Pointing Drawing – Synaptics Pointing Device

disable_driver_updates4

6. Click: Next

7. Confirm by clicking: Next

disable_driver_updates5

8. Click: Close the troubleshooter

disable_driver_updates6

 

Be aware that fully compatible drivers can be downloaded and installed from Lenovo, which will enable full touchpad functionality.

However I’m currently satisfied with a trackpoint and 3 physical buttons, so I have not found the correct drivers or procedure yet.

Windows Server may shut down if system time is wrong

I was testing some Reporting Services reports, but the test data was fairly old and some datasets were linked to the date, so I wasn’t getting proper results.

Instead of modifying the test data or the queries, the easy solution was to disable time synchronization and change the system time to 3 years in the past.

 

But changing the system time lead to an interesting problem:

Suddenly the server shut down automatically with no warning.

No chance was given to save open documents or to cancel shutdown.

 

I started up the virtual machine again and noticed that the system time had been reset.

Found this explanation for the behavior in Event Viewer:

Log Name:      System
Source:        User32
Date:          27-09-2013 11:14:19
Event ID:      1074
Task Category: None
Level:         Information
Keywords:      Classic
User:          SYSTEM
Computer:      TestServer
Description:
The process C:\Windows\system32\wlms\wlms.exe (TESTSERVER) has initiated the shutdown of computer TESTSERVER on behalf of user NT AUTHORITY\SYSTEM for the following reason: Other (Planned)
Reason Code: 0x80000000
Shutdown Type: shutdown
Comment: The license period for this installation of Windows has expired. The operating system is shutting down.

 

Be aware that Windows was activated and continued to be activated after it was restarted.

This happened with Windows Server 2012 R2 Standard, but I suspect it’s an issue with other versions and editions.

Conclusion

Apparently Windows doesn’t tolerate running with a wrong system time, even for testing purposes.

Fortunately wlms.exe is only present on evaluation versions of Windows, so this particular issue should never occur on production systems.

In any case, a correct system time is always desired for a number of things like logging dates, scheduling, synchronization, certificate validation and so on.

Laptop computer freezes when power supply is connected

I recently experienced a problem on a Lenovo Thinkpad T440p computer running Windows 10:

If the power supply was connected when the computer was running it would seemingly freeze: Mouse and keyboard became non-responsive.

However it was not a full freeze or crash, because music from a mediaplayer would continue.

If the power supply was disconnected, then the computer became responsive again.

 

One way to avoid the problem was to connect the power when the computer was sleeping or turned off.

 

It was an annoying problem and some people would assume the computer had crashed.

So I searched for a solution and found this:

https://www.ifixit.com/Answers/View/70872/How+to+fix+a+notebook+that+freezes+when+plugged+to+AC+power

 

The solution that worked for me was to:

  1. Open Device Manager
  2. Find Batteries -> Microsoft AC Adapter
  3. Right click and disable the Microsoft AC Adapter

2016-09-12_laptop_freezes_when_connected_to_power

Syntax errors due to wrong, but similar looking characters

I recently encountered problems when testing unattended Windows deployment.

A PowerShell script for setting up ShellLauncher did not seem to run.

 

Found this log message:

Encoding_issue_01

C:\Windows\system32>Powershell.exe –ExecutionPolicy RemoteSigned -NoProfile -NonInteractive c:\Install\PowerShell\ShellLauncher-setup.ps1
–ExecutionPolicy : The term '–ExecutionPolicy' is not recognized as the name of a cmdlet, function, script file, or ope
rable program. Check the spelling of the name, or if a path was included, verify that the path is correct and try again

 

This puzzled me because –ExecutionPolicy seemed correct.

 

When running the script manually from cmd.exe the cause of the problem became apparent:

Encoding_issue_02

ûExecutionPolicy : The term 'ûExecutionPolicy' is not recognized as the name of a cmdlet,
function, script file, or operable program. Check the spelling of the name, or if a path was
included, verify that the path is correct and try again.

 

I examined which character was actually used instead of minus. It was – (en dash, 150) in Windows code page 1252 Latin 1 (ANSI).

PowerShell expects minus for options and interprets en dash as part of a cmdlet, function, script or program name.

 

Conclusion

No matter what code page and encoding is used many characters look similar to human eyes, but not to a computer.

If you encounter syntax problems for code or data that looks correct, I recommend checking the encoding and the actual character bytes with a hex editor.

Enable ClearType on Windows 10

I noticed jagged text on a computer running Windows 10.

It was particularly noticeable with Firefox and Chrome.

 

I opened Advanced display settings and tried to open ClearType text.

ClearType_01_Display_Settings

 

However this failed with this error message:

ClearType_02_Windows_cannot_access_cttune

C:\Windows\system32\cttune.exe

Windows cannot access the specified device, path, or file. You may not have the appropriate permissions to access the item.

 

Instead I tried to start cttune.exe directly:

ClearType_03_Run_cttune

 

This was possible and I noticed that ClearType was disabled (as expected).

ClearType_04_ClearType_disabled

 

I selected: Turn on ClearType

ClearType_05_ClearType_enabled

 

And completed the guide to adjust ClearType optimally.

ClearType_06_Adjusting_ClearType

 

That solved the problem. After enabling and adjusting ClearType, text in all programs looked much better without jagged edges.

Error when exporting Hyper-V virtual machine to USB memory stick

Recently I needed to export a Hyper-V virtual machine to another computer.

I decided to export it directly to a USB memory stick.

However it failed consistently with this error message:

An_error_occured_while_attempting_to_export_the_virtual_machine

An error occurred while attempting to export the virtual machine.

Failed to copy file during export.

Failed to copy file from 'C:\Users\Public\Documents\Hyper-V\Virtual Hard Disks\DeploymentTest5New4.vhdx' to 'E:\VM\DeploymentTest5\Virtual Hard Disks\DeploymentTest5New4.vhdx': One or more arguments are invalid (0x80070057).

 

I examined the file system on the USB stick and discovered that it was FAT32.

FAT32 has a file size limitation of 4 GB.

Most virtual machines are likely to be bigger than that and that was also the case here.

 

I decided to empty the USB stick and reformat it to NTFS.

Then the virtual machine could be exported without problems.