How I Shot My Own Self in the Foot with Semaphores….

…and the lessons I’ve learned. (Seemed like it needed a continuation, yeah?)

So, one of my many (now backlogged) personal projects is to write a Windows Service that copies files from a source directory to a target directory (or directories) and then encrypts the files via the LSA account (if so configured in the app config file). The idea was bourne out of two premises:

  1. As someone who doesn’t currently have a cloud provider, I still need disaster recovery and, if I’m in this position, surely there’s at least one other person in the world in the same boat. Maybe.
  2. This is, perhaps, the more crucial part. We take for granted that we have pretty fat pipes with which to fit all of the cats that we could ever want into the tubes and still have room left over. Other areas of the world still don’t have this luxury. What opportunities are available to them, now that we’re in a mobile-first, cloud-first world?

So, since we’re dealing with disks, I wanted to prevent it being an attack surface and/or prevent disk queuing or thrashing. If you’ve ever seen my project for the Windows Service to download the Bing Images, then you’re already familiar with my preferred route of semaphore use, which is the SemaphoreSlim.

So, I write my service, compile it (via Visual Studio), and install on my machine (via InstallUtil). I fire it up and drop a file into the source folder and, when the timer hits, it copies and encrypts the files to the destinations. O.k., so far, so good. I add a second file and nothing happens. Wait a few more minutes, still nothing.

O.k., I shot myself in the foot, somewhere, but nowhere I need to solve the crime, where I’m also the murderer; which isn’t, necessarily, the easiest task in the world.

So, I check task manager. The service is still running, so we didn’t hit a second-chance exception and the service terminated. O.k., I’m not a total idiot, at least (I hope).

I check the Wait Chain Traversal in Task Manager and it says the process is running fine, nothing is blocked.

O.k…. That makes even less sense. Time to debug it. I create a dump of the process in task manager and crack it open in WinDbg.

I’m using a tool called PDE, which you can get from the public OneDrive shared by Andrew Richards here. I’ll be using it to look for a thread with the most amount of frames. Why? Well, just because nothing is happening, it doesn’t mean that nothing is actually happening.

And, wouldn’t you know it, it looks like a dead-lock:

0:000> !pde.deep 50

6 TID:0718 kb kbn kbnL kn knL kpn kPn
# Child-SP RetAddr Call Site
00 00000040`6d9fd158 00007ff8`a198c7ee ntdll!NtWaitForMultipleObjects+0x14
01 00000040`6d9fd160 00007ff8`9720264e KERNELBASE!WaitForMultipleObjectsEx+0xfe
02 00000040`6d9fd460 00007ff8`972024a8 clr!WaitForMultipleObjectsEx_SO_TOLERANT+0x62
03 00000040`6d9fd4c0 00007ff8`972022a1 clr!Thread::DoAppropriateWaitWorker+0x1e4
04 00000040`6d9fd5c0 00007ff8`971ed336 clr!Thread::DoAppropriateWait+0x7d
05 00000040`6d9fd640 00007ff8`9717703f clr!CLREventBase::WaitEx+0xc4
06 00000040`6d9fd6d0 00007ff8`9717700c clr!Thread::Block+0x27
07 00000040`6d9fd700 00007ff8`97176db1 clr!SyncBlock::Wait+0x19d
08 00000040`6d9fd840 00007ff8`8a05bd08 clr!ObjectNative::WaitTimeout+0xe1
09 00000040`6d9fd9c0 00007ff8`8a05bac9 mscorlib_ni!System.Threading.SemaphoreSlim.WaitUntilCountOrTimeout(Int32, UInt32, System.Threading.CancellationToken)+0x68
0a 00000040`6d9fda10 00007ff8`8a05b872 mscorlib_ni!System.Threading.SemaphoreSlim.Wait(Int32, System.Threading.CancellationToken)+0x149
0b 00000040`6d9fdaa0 00007ff8`37a41a12 mscorlib_ni!System.Threading.SemaphoreSlim.Wait()+0x12
0c 00000040`6d9fdad0 00007ff8`8a071333 Felsokning_Services_ShadowService!Felsokning.Services.ShadowService.ShadowService+<>c__DisplayClass35_2.b__4(System.String)+0x3d2
0d 00000040`6d9fdc40 00007ff8`8a060680 mscorlib_ni!System.Threading.Tasks.Parallel+<>c__DisplayClass17_0`1[[System.__Canon, mscorlib]].b__1()+0x293
0e 00000040`6d9fdd00 00007ff8`8a12a540 mscorlib_ni!System.Threading.Tasks.Task.InnerInvokeWithArg(System.Threading.Tasks.Task)+0x20
0f 00000040`6d9fdd30 00007ff8`896a5ab3 mscorlib_ni!System.Threading.Tasks.Task+<>c__DisplayClass176_0.b__0(System.Object)+0x110
10 00000040`6d9fdda0 00007ff8`896a5944 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x163
11 00000040`6d9fde70 00007ff8`896ef565 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x14
12 00000040`6d9fdea0 00007ff8`896eec93 mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)+0x215
13 00000040`6d9fdf50 00007ff8`896f3011 mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry(Boolean)+0x73
14 00000040`6d9fdf90 00007ff8`896f2eee mscorlib_ni!System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(System.Threading.Tasks.Task, Boolean)+0x31
15 00000040`6d9fdfe0 00007ff8`8a05f8d7 mscorlib_ni!System.Threading.Tasks.TaskScheduler.TryRunInline(System.Threading.Tasks.Task, Boolean)+0x19e
16 00000040`6d9fe080 00007ff8`8a06c938 mscorlib_ni!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler, Boolean)+0xa7
17 00000040`6d9fe110 00007ff8`8a06dbb7 mscorlib_ni!System.Threading.Tasks.Parallel.ForWorker[[System.__Canon, mscorlib]](Int32, Int32, System.Threading.Tasks.ParallelOptions, System.Action`1, System.Action`2, System.Func`4, System.Func`1, System.Action`1)+0x428
18 00000040`6d9fe1f0 00007ff8`8a06d46e mscorlib_ni!System.Threading.Tasks.Parallel.ForEachWorker[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1, System.Threading.Tasks.ParallelOptions, System.Action`1, System.Action`2, System.Action`3, System.Func`4, System.Func`5, System.Func`1, System.Action`1)+0xd7
19 00000040`6d9fe2b0 00007ff8`37a4155a mscorlib_ni!System.Threading.Tasks.Parallel.ForEach[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1, System.Action`1)+0x9e
1a 00000040`6d9fe350 00007ff8`8a071333 Felsokning_Services_ShadowService!Felsokning.Services.ShadowService.ShadowService.b__35_3(System.String)+0xea
1b 00000040`6d9fe3e0 00007ff8`8a060680 mscorlib_ni!System.Threading.Tasks.Parallel+<>c__DisplayClass17_0`1[[System.__Canon, mscorlib]].b__1()+0x293
1c 00000040`6d9fe4a0 00007ff8`8a12a540 mscorlib_ni!System.Threading.Tasks.Task.InnerInvokeWithArg(System.Threading.Tasks.Task)+0x20
1d 00000040`6d9fe4d0 00007ff8`896a5ab3 mscorlib_ni!System.Threading.Tasks.Task+<>c__DisplayClass176_0.b__0(System.Object)+0x110
1e 00000040`6d9fe540 00007ff8`896a5944 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x163
1f 00000040`6d9fe610 00007ff8`896ef565 mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x14
20 00000040`6d9fe640 00007ff8`896eec93 mscorlib_ni!System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)+0x215
21 00000040`6d9fe6f0 00007ff8`896f3011 mscorlib_ni!System.Threading.Tasks.Task.ExecuteEntry(Boolean)+0x73
22 00000040`6d9fe730 00007ff8`896f2eee mscorlib_ni!System.Threading.Tasks.ThreadPoolTaskScheduler.TryExecuteTaskInline(System.Threading.Tasks.Task, Boolean)+0x31
23 00000040`6d9fe780 00007ff8`8a05f8d7 mscorlib_ni!System.Threading.Tasks.TaskScheduler.TryRunInline(System.Threading.Tasks.Task, Boolean)+0x19e
24 00000040`6d9fe820 00007ff8`8a06c938 mscorlib_ni!System.Threading.Tasks.Task.InternalRunSynchronously(System.Threading.Tasks.TaskScheduler, Boolean)+0xa7
25 00000040`6d9fe8b0 00007ff8`8a06dc4f mscorlib_ni!System.Threading.Tasks.Parallel.ForWorker[[System.__Canon, mscorlib]](Int32, Int32, System.Threading.Tasks.ParallelOptions, System.Action`1, System.Action`2, System.Func`4, System.Func`1, System.Action`1)+0x428
26 00000040`6d9fe990 00007ff8`8a06d46e mscorlib_ni!System.Threading.Tasks.Parallel.ForEachWorker[[System.__Canon, mscorlib],[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1, System.Threading.Tasks.ParallelOptions, System.Action`1, System.Action`2, System.Action`3, System.Func`4, System.Func`5, System.Func`1, System.Action`1)+0x16f
27 00000040`6d9fea50 00007ff8`37a41376 mscorlib_ni!System.Threading.Tasks.Parallel.ForEach[[System.__Canon, mscorlib]](System.Collections.Generic.IEnumerable`1, System.Action`1)+0x9e
28 00000040`6d9feaf0 00007ff8`896a5ab3 Felsokning_Services_ShadowService!Felsokning.Services.ShadowService.ShadowService.StartShadowCopyWork(System.Object)+0x246
29 00000040`6d9fec00 00007ff8`896a5944 mscorlib_ni!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x163
2a 00000040`6d9fecd0 00007ff8`896db0ba mscorlib_ni!System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)+0x14
2b 00000040`6d9fed00 00007ff8`896daeb7 mscorlib_ni!System.Threading.TimerQueueTimer.CallCallback()+0xba
2c 00000040`6d9fed60 00007ff8`89695b05 mscorlib_ni!System.Threading.TimerQueueTimer.Fire()+0x87
2d 00000040`6d9fedd0 00007ff8`97046da3 mscorlib_ni!System.Threading.TimerQueue.FireNextTimers()+0x75
2e 00000040`6d9fee50 00007ff8`97046c58 clr!CallDescrWorkerInternal+0x83
2f 00000040`6d9fee90 00007ff8`9704751d clr!CallDescrWorkerWithHandler+0x4e
30 00000040`6d9feed0 00007ff8`97049d14 clr!MethodDescCallSite::CallTargetWorker+0xf8
31 00000040`6d9fefd0 00007ff8`97047ce1 clr!AppDomainTimerCallback_Worker+0x34
32 00000040`6d9ff0c0 00007ff8`97047c50 clr!ManagedThreadBase_DispatchInner+0x39
33 00000040`6d9ff100 00007ff8`97047b8d clr!ManagedThreadBase_DispatchMiddle+0x6c
34 00000040`6d9ff200 00007ff8`97047d1f clr!ManagedThreadBase_DispatchOuter+0x75
35 00000040`6d9ff290 00007ff8`97049cb6 clr!ManagedThreadBase_FullTransitionWithAD+0x2f
36 00000040`6d9ff2f0 00007ff8`97049c16 clr!AppDomainTimerCallback+0x7c
37 00000040`6d9ff350 00007ff8`9704975f clr!ThreadpoolMgr::AsyncTimerCallbackCompletion+0x7e
38 00000040`6d9ff3a0 00007ff8`9704835c clr!UnManagedPerAppDomainTPCount::DispatchWorkItem+0x1bc
39 00000040`6d9ff440 00007ff8`97048105 clr!ThreadpoolMgr::ExecuteWorkRequest+0x64
3a 00000040`6d9ff470 00007ff8`971fefef clr!ThreadpoolMgr::WorkerThreadStart+0xf5
3b 00000040`6d9ff510 00007ff8`a43181f4 clr!Thread::intermediateThreadProc+0x86
3c 00000040`6d9ffbd0 00007ff8`a538a251 kernel32!BaseThreadInitThunk+0x14
3d 00000040`6d9ffc00 00000000`00000000 ntdll!RtlUserThreadStart+0x21

Threads: 1 of 13
Frames : 50
Command: knL

Right, so what’s the state of the SemaphoreSlim, itself?

0:006> !mdt 000001f45a4f3200
000001f45a4f3200 (System.Threading.SemaphoreSlim)
    m_currentCount:0x0 (System.Int32)
    m_maxCount:0x7fffffff (System.Int32)
    m_waitCount:0x6 (System.Int32)
    m_lockObj:000001f45a4f3240 (System.Object)
    m_waitHandle:NULL (System.Threading.ManualResetEvent)
    m_asyncHead:NULL (System.Threading.SemaphoreSlim+TaskNode)
    m_asyncTail:NULL (System.Threading.SemaphoreSlim+TaskNode)

The properties that we care about are the m_currentCount and the m_waitCount. We can look at the source code for SemaphoreSlim and see that these signify what the problem is:

m_currentCount – The semaphore count, initialized in the constructor to the initial value, every release call increments it and every wait call decrements it as long as its value is positive otherwise the wait will block. Its value must be between the maximum semaphore value and zero.

m_waitCount – The number of synchronously waiting threads, it is set to zero in the constructor and increments before blocking the threading and decrements it back after that. It is used as flag for the release call to know if there are waiting threads in the monitor or not.

So, here’s what happened: I leveraged Parallel.ForEach and for each disk action I required the caller to request wait on the semaphore, until it completes and releases the semaphore. The Semaphore’s initial value was stupid-low: 10. So, as soon as the second file was added, we created a condition in which multiple threads (because of parallelization) were requesting locks on the Semaphore and starving it out of being available for any further work.

In theory, this would be 1 lock for the source directory, 2 locks for the two target directories, 1 lock per file (so 2x2x2)… You can quickly see why this was a bad approach, even though it had the best intentions, at heart.

So, now, I’m refactoring the code to leverage threadpooling (or async tasks) and have definitely learned my lesson on resource starvation. 🙂

Thanks for coming to this TEDTalk and happy debugging!

Wait-Chain Traversal: Or, How We Can Use PowerShell to JIT C# to Load an Unmanaged Assembly (C++), InterOping With Windows APIs, Returning that Data Back to PowerShell

So, as the long-winded title infers, today I’ll be covering something that I wrote a long time ago and have recently re-written (albeit, probably badly) for you to use and/or learn from.

In this case, we’re using a PowerShell script, which JITs C# code, the C# code calls into an unmanaged DLL (C++), and that calls into an Windows API. Once the data has been obtained from the Windows API, we pass the data back from the unmanaged assembly to the managed code (via Marshal) and then return that back to the PowerShell instance to be displayed to the user.

Before we dive into the what we’re doing, we should cover some key concepts. The first is JIT’ing. JIT stands for “Just-In-Time” (Compilation) and the name is slightly a misnomer but we’ll cover that in a second. So, in JIT, what happens is that the run time precompiles the code before it’s ran. This is important because a key-concept in exception handling is the runtime’s seek operation to find a handler for an exception that is thrown. You’ll often see this as a FirstChanceException in a dump file. In PowerShell, we have the added ability to leverage the JIT compilation by passing source code as a type into the App’s Domain. It’s important to distinguish that once the App Domain has been disposed of, the type specified is lost and has to be re-instantiated again.

So, what – exactly – is this code going to be doing? Well, since Windows Vista, the Windows Operating System exposes the Wait Chain Traversal API. You can see a demonstration of this API in Task Manager: Go to the Details tab, right click on a process and click “Analyze Wait Chain”.

Since Windows Server 2016 Core doesn’t include a desktop or any GUI interfaces, a more robust way was needed to obtain the same information in production, to determine if the reason an application wasn’t responding or performing work was because the threads were blocked.

When you run the code, you can tell if this is the case or not by something like the following:

12972   [7464:12972:blocked]->[ThreadWait]->[7464:12968:blocked]->[End]

Where the first thread is blocked by a thread wait on the the second thread, which is also (itself) blocked.

So, first things first, the PowerShell code. Take a peek here to see that. Note that the Source code is contained with a specific character-delimited string @”<code>”@. After that we add a Type, pointing it to the source code we’ve defined and referencing the assemblies that we’ll need for this type to work. Worthy of noting is that when we add this type, it is exposed in PowerShell the same way any normal .NET type is, via the []:: convention.

Note that in the C# source we import the unmanaged DLL and reference the exposed method. In the body of the code, we also construct an IntPtr to reference for the return. So, now, we get to Marshalling.

An IntPtr is, quite literally, a pointer or handle to an object. A pointer is a reference to memory where an object exists and the object is typically delimited by characters to signify the termination of it (e.g.: the end of a string is null-terminated). A handle is roughly the same premise but the handle abstracts memory management from the caller. So, at 200 ticks, the handle could point to address 0x000001 and at 369 ticks, it could point to 0x34778.

Alright, so why this matters is because when we pass from unmanaged code back to managed code, there’s nothing that implicitly tells the managed code where to find the object in native memory; so, we have to pass a pointer to the object back to managed (I believe managed creates it’s own copy and creates an address for that object) and, using that, we can then try to convert the passed object from a native string into a managed string (via Marshalling).

What about this unmanaged code I keep hearing about? Oh, well… You can find that here. Don’t forget to add the headers referenced in stdafx.h, or your compiler will cry bloody murder.

So, how this works is: The script is called via PowerShell. After some magic to verify that we weren’t given junk data, then PowerShell JIT’s the C# code and performs the runtime operations. The compiler loads the unmanaged DLL into memory. The C# code then calls into the unmanaged DLL via the exposed method (declspec/dllexport/cdecl). The unmanaged code performs it’s work and returns the string back to the caller – unaware that the caller is managed code. The managed code creates an IntPtr to reference the return and then Marshal is called to convert the native string into a managed string. This is then returned back to the user. In the case of multiple instances of a process, the managed string is added to an array and that array is returned.

While it may not seem like much and seem like overglorified complication just to check for threads on wait chaings, it was code that was written to be used in production where we had no desktop environments and we had to determine if this was the case.

I hope that someone gets some kind of use out of it. 🙂

Until next time.