The various hardware units of a CUDA device are capable of tracking a set of counters during the execution of a kernel. Those counters are updated at runtime and can be queried after the execution completed. The observed values represent the actual workload as seen by the hardware units; therefore often referred to as raw counter values. Many performance metrics are derived directly from one or more raw counter values.