The next step was to find out how much time was being spent inside of NtGdiCloseProcess. So I moved to the CPU Usage (Sampled) table in WPA and got a butterfly graph, this time of callees of NtGdiCloseProcess. You can see from the screen shot below that over a 1.125 s period there was, across the entire system, about 1085 ms of time spent inside of NtGdiCloseProcess, representing 96% of the wall time: