Fun with Timers and cpuid

Mar 3, 2021

Who really designed this processor?

8 Comments

Apparently the TSC behavior has changed drastically between Zen/Zen+/Zen2 and Zen 3. On my Zen 3 machine, it actually ticks with a granularity matching the full clockspeed of 3.4GHz, while on my old zen+ system it ticks at the same 100 MHz you were seeing on that Zen 2 server.

Expand full comment

Reply (1)

Jim Cownie

Feb 20

Interesting. Thanks.

Expand full comment

CyrIng

Jan 21, 2024

In bigLittle cntfreq returns frequency counter but of the max processor

For instance, with RK3588, we are reading frequency of A76 on A55 cores!

How to improve accuracy ?

Expand full comment

Reply (1)

Jim Cownie

Feb 21, 2024Edited

That behaviour seems entirely sane to me. Since a thread may migrate between the two cores, you want to have both cores deliver time in the same units, which should be the smaller ticks that make sense on the faster core. AFAICT that is the behaviour you are describing.

Of course, the actual period that can be resolved by the physical timer on each core may be different, but the units in which it is expressed, and, we hope, the starting point for the clocks want to be the same everywhere.

Expand full comment

Russ

Oct 5, 2022

I see a lot of examples on the internet for AArch64 where an isb instruction is placed before (or before and after) the mrs instruction to read the cntvct register, such as:

asm volatile("isb; mrs %0, cntvct_el0" : "=r" (ticks));

asm volatile("isb; mrs %0, cntvct_el0; isb" : "=r" (ticks));

When I try this, I get unusual results. For long periods the ticks value does not increment and other times it goes backwards (values get smaller)? I'm testing on an Apple M1.

Expand full comment

Egon

Jun 14, 2021

Note, "Intel® 64 and IA-32 Architectures, Software Developer’s Manual, Volume 3B: System Programming Guide, Part 2" does say that:

> On certain processors, the TSC frequency may not be the same as the frequency in the brand string.

Expand full comment

Andy Thomason

Mar 8, 2021

It seems that rdtsc does a pipeline flush on most CPUs. It would have been nice to have one which didn't and also had an input register for locating it in the scoreboard. Decode, issue and retire times would also be handy, if only to 32 bit resolution.

Expand full comment

Reply (1)

Jim Cownie

Mar 8, 2021

My impression is that rdtsc (as against rdtscp) is rather vague about what instruction fencing is enforced.

There is information at https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/ia-32-ia-64-benchmark-code-execution-paper.pdf (rather old) which suggests one approach. I believe that only cpuid is guaranteed to be an instruction fence.

From my POV, I am more interested in timing data transfers, so memory fencing is more interesting for my tests than instruction fencing. But, given the apparent resolution, timing single operations is hard anyway.

Anandtech clearly manage something, though, for instance https://www.anandtech.com/show/16535/intel-core-i7-11700k-review-blasting-off-with-rocket-lake/3

Expand full comment

CPU fun

Fun with Timers and cpuid