Debugging One of the World’s Worst Error Messages

So, for a project for a friend with cancer, I was setting up something leveraging NextCloud (which another friend set up an account for me on their local instance for). It isn’t important who the friends were or what the project was but it’s slight backdrop into why I didn’t just use another program.

So, I installed the NextCloud desktop client on a Win10 (x64) machine and went to fire it up. Then came this gloriously (and mundanely) generic error message:

Markering_022

Very descriptive and actionable, yeah? (hint: LoadLibrary actually hints as to what’s going on, if you’ve run into assembly referencing before, but we’ll cover that later in this post.)

Well, the first thought is to go to the Event Viewer to see what happened. Only problem is: No events were logged. Anywhere. It never happened. It never even existed.

O.k., time to get medieval on this. Since the window never renders for the application and this seems to happen on initialisation, we’ll have to hook into the application to catch the event[s] in question.

Enter ADPlus. ADPlus has been around for hot-minute and isn’t, precisely, my go-to for debugging puposes; however, it’s definitely handy when you want to catch things happening on start-up.

So, we need to tell adplus to start and to monitor for an instance of the nextcloud application to loaded/referenced/scheduled by the kernel. (Littletid-bit here: The load happens before it’s reference is available in the process list, which is why we get the results you’ll see later-on.)

To tell adplus to “listen” for an instance of this application, we’ll do something like the following:

PS C:\Program Files (x86)\Windows Kits\10\Debuggers\x64> .\adplus.exe -Crash -pmn "nextcloud.exe" -o "C:\Dumps\NextCloud\"
Starting ADPlus
********************************************************
* *
* ADPLus Flash V 7.01.007 08/11/2011 *
* *
* For ADPlus documentation see ADPlus.doc *
* New command line options: *
* -pmn  - process monitor *
* waits for a process to start *
* -po  - optional process *
* won't fail if this process isn't running *
* -mss  *
* Sets Microsoft's symbol server *
* -r   *
* Runs -hang multiple times *
* *
* ADPlusManager - an additional tool to facilitate *
* the use of ADPlus in distributed environments like *
* computer clusters. *
* Learn about ADPlusManager in ADPlus.doc *
* *
********************************************************

Logs and memory dumps will be placed in C:\Dumps\NextCloud\\20190303_105927_Crash_Mode
Starting to monitor the following processes:
NEXTCLOUD.EXE
Press Enter to stop Monitoring...

Now, that we’ve set adplus to monitor, we fire-up nextcloud.exe. Adplus will attach and create a dump file for the first-chance exception (you can configure it for second-chance only).

So, now we have our dump file. We crack it open in Windbg and let’s look at the stack:

0:000> kncL
# Child-SP RetAddr Call Site
00 000000e2`704fdcc8 00007ff8`5c056db8 ntdll!NtTerminateProcess+0x14
01 000000e2`704fdcd0 00007ff8`5b35d3ba ntdll!RtlExitUserProcess+0xb8
02 000000e2`704fdd00 00007ff8`39a04e48 kernel32!ExitProcessImplementation+0xa
03 000000e2`704fdd30 00007ff8`39a0526e atig6pxx+0x4e48
04 000000e2`704fdef0 00007ff8`1289866c atig6pxx!DrvValidateVersion+0x16
05 000000e2`704fdf20 00007ff8`12898a7d opengl32!pgldrvLoadAndAllocDriverInfo+0x23c
06 000000e2`704fdfb0 00007ff8`128b2ca9 opengl32!LoadAvailableDrivers+0x39d
07 000000e2`704fe630 00007ff8`128b2055 opengl32!wglDescribePixelFormat+0x139
08 000000e2`704fe760 00007ff8`584d4c79 opengl32!wglChoosePixelFormat+0x85
09 000000e2`704fe810 00007ff8`1588eeb0 gdi32full!ChoosePixelFormat+0x39
0a 000000e2`704fe840 00007ff8`1588d540 qwindows+0x3eeb0
0b 000000e2`704fe990 00007ff8`1588eba6 qwindows+0x3d540
0c 000000e2`704fec50 00007ff8`1585c48b qwindows+0x3eba6
0d 000000e2`704fecf0 00007ff8`1585cdba qwindows+0xc48b
0e 000000e2`704fed50 00007ff8`1585ba5c qwindows+0xcdba
0f 000000e2`704fed80 00007ff8`11724d20 qwindows+0xba5c
10 000000e2`704fedf0 00007ff8`019b7770 Qt5Gui!QOpenGLContext::create+0x30
11 000000e2`704fee20 00007ff8`12d012da Qt5WebEngineCore!QtWebEngineCore::initialize+0x120
12 000000e2`704fee70 00007ff8`116ebd4d Qt5Core!QCoreApplicationPrivate::init+0x52a
13 000000e2`704fef30 00007ff8`131035cf Qt5Gui!QGuiApplicationPrivate::init+0x3d
14 000000e2`704ff190 00007ff8`130ff8bb Qt5Widgets!QApplicationPrivate::init+0xf
15 000000e2`704ff1c0 00007ff7`b1db1aa7 Qt5Widgets!QApplication::QApplication+0x5b
16 000000e2`704ff1f0 00007ff7`b1d039d4 nextcloud+0xc1aa7
17 000000e2`704ff300 00007ff7`b1cf1878 nextcloud+0x139d4
18 000000e2`704ff630 00007ff7`b1dc6524 nextcloud+0x1878
19 000000e2`704ff880 00007ff7`b1dc59c2 nextcloud+0xd6524
1a 000000e2`704ff910 00007ff8`5b3581f4 nextcloud+0xd59c2
1b 000000e2`704ff950 00007ff8`5c05a251 kernel32!BaseThreadInitThunk+0x14
1c 000000e2`704ff980 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Normally, you might have to go frame by frame to understand what’s going on but since we don’t have the symbols for nextcloud or qwindow, we can kind of wing it and still figure out what’s going.

As the frames are read from bottom to top (in terms of processing order), the frames of interest to us will be 09 to 02 (which are in bold above). This shows us that GDI and/or OpenGL are looking for a driver that maps to a specific pixel format. We hit a frame in the ATI driver code that we can’t resolve to a method but then we see the call to terminate the process; so, we can probably discern that this is where the error screen is shown (the one with the LoadLibrary error).

So, if we going perusing around the interwebs, we should be able to confirm our suspicions and we’ll find this on MSDN, which has the following notation: “If the function fails, the return value is NULL. To get extended error information, call GetLastError.” Since this happens in native code and we’re unable to ascertain the method called in the ATI driver, it’s safe for us to assume that the ATI driver attempts to load a specified assembly and that assembly doesn’t exist. Also, since the exception is trapped and throw into the window, this is why the .excr record doesn’t exist on the stack. (Yay for native and scopes.)

Easiest solution: Install the nextcloud client on a different machine and do what I need to do¬†or attempt to update the ATI video driver and hope that it resolves the issue. The former is a much more assured way to resolve it, though, so that’s what I did.

When developing programs, it’s important to consider that exceptions might occur in third-party dependencies. This mightn’t be the most productive error messages for users to figure out what’s going on. In fact, I whole-heartedly believe that if I hadn’t ever dealt with loading libraries or debugging, I’d have no clue as to why the program was broken.

This makes the error message one of the world’s crappiest error messages simply because it is inactionable by a user, if a user hasn’t the necessary tools and experience explained above.

So, then, what is the point of the error message?

I get the point, don’t get me wrong, in that lets someone know something went wrong. However, the general nature of the error message, itself, leads much to be desired. What library was it trying to load? Why did it fail? I’m sure I could eventually find a mapping for the NT Status errors that relate to this but why should I, as an end-user, have to resort to that level of depth just to figure out the problem?

The problem is shared between the application writers (nextcloud), GDI, OpenGL, and ATI. Each layer should expect an exception to be possible and to trap and raise it with the requisite information to make it actionable.

This is just… “Something happened. Good luck!”

Thanks for coming to this TEDTalk and I hope that it hasn’t bored you to death, despite it’s brevity.