I aimed to understand the characteristics of hardware counters (e.g., Intel x86's rdtsc
) by testing a piece of code with a fixed execution time. The test code uses a fixed number of register decrements, as follows.
unsigned long long cycles = 3600000000UL*duration; // CPU base freq. is 3.6GHz
tsc_start = tsc_now();
__asm__ __volatile__ (
"mov %[cycles], %%rcx\n\t"
"delay_loop: dec %%rcx\n\t"
"jnz delay_loop\n\t"
:
: [cycles] "r" (cycles)
: "rcx"
);
tsc_end = tsc_now();
To reduce interference, the program runs on a CPU core with interrupts and task scheduling removed. This CPU operates at its base frequency and is the only running task of the NUMA node.
The relationship between the number of loop iterations (duration) and the measured clock cycles is shown below. The jitters are no more than a few hundred clock cycles (first two columns), and IMHO, they come from the cold start of the first few iterations (predictive branching and memory subsystem).
duration (seconds) | cycles used | sleep | dd |
---|---|---|---|
1 | 3600000070 | ||
10 | 36000000334 | 36000061476 | 36000114526 |
100 | 360000000032 | 360000083336 | 360000121658 |
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745612623a4636040.html
评论列表(0条)