c++ - cpu_core vs cpu_atom in perf - Stack Overflow|江阴雨辰互联

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results:

Here, I can see some metrics counted twice, once for cpu_atom, and once for cpu_core. What is the difference between these two?

I've read that cpu_core corresponds to ISA instructions, while cpu_atom corresponds to microarchitecture internals (in my case, x86 micro-ops). This is somewhat confusing, since I would expect numbers for cpu_atom to be bigger than numbers for cpu_core

It's also a bit confusing how the two, cpu_core and cpu_atom metrics differ relative to each other on multiple runs: This is a much different fraction than the previous run.

There are also times where cpu_atom metrics are not counted:

And there is this run... I assume the 191.02% is a bug. This is 110,238,856 / 57,711,854, which is the cpu_core/branch-misses / cpu_atom/branchs. If this is not a bug, I wonder why divide metrics from cpu_core by metrics from cpu_atom.

Just for reference, here is the code of the ran executable:

#include <benchmark/benchmark.h>

#include <algorithm>
#include <vector>

void test(benchmark::State& s) {
    const auto N = s.range(0);
    std::vector<unsigned long> v1(N), v2(N), c(N);
    
    srand(1);
    std::generate(v1.begin(), v1.end(), [] { return rand(); });
    std::generate(v2.begin(), v2.end(), [] { return rand(); });
#ifdef HIT
    std::generate(c.begin(), c.end(), [] { return rand() >= 0; });
#else
    std::generate(c.begin(), c.end(), [] { return rand() & 1; });
#endif

    for (auto _ : s) {
        unsigned long result = 0;
        for (int i = 0; i < N; i++) {
            if (c[i]) {
                result += v1[i];
            } else {
                result *= v2[i];
            }
        }
        benchmark::DoNotOptimize(result);
        benchmark::ClobberMemory();
    }
}

BENCHMARK(test)->Arg(1 << 22);
BENCHMARK_MAIN();

Compiled as following:

g++ branch_prediction.cpp -o miss -g3 -O3 -mavx2 -lbenchmark

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results:

Here, I can see some metrics counted twice, once for cpu_atom, and once for cpu_core. What is the difference between these two?

It's also a bit confusing how the two, cpu_core and cpu_atom metrics differ relative to each other on multiple runs: This is a much different fraction than the previous run.

There are also times where cpu_atom metrics are not counted:

Just for reference, here is the code of the ran executable:

#include <benchmark/benchmark.h>

#include <algorithm>
#include <vector>

void test(benchmark::State& s) {
    const auto N = s.range(0);
    std::vector<unsigned long> v1(N), v2(N), c(N);
    
    srand(1);
    std::generate(v1.begin(), v1.end(), [] { return rand(); });
    std::generate(v2.begin(), v2.end(), [] { return rand(); });
#ifdef HIT
    std::generate(c.begin(), c.end(), [] { return rand() >= 0; });
#else
    std::generate(c.begin(), c.end(), [] { return rand() & 1; });
#endif

    for (auto _ : s) {
        unsigned long result = 0;
        for (int i = 0; i < N; i++) {
            if (c[i]) {
                result += v1[i];
            } else {
                result *= v2[i];
            }
        }
        benchmark::DoNotOptimize(result);
        benchmark::ClobberMemory();
    }
}

BENCHMARK(test)->Arg(1 << 22);
BENCHMARK_MAIN();

Compiled as following:

g++ branch_prediction.cpp -o miss -g3 -O3 -mavx2 -lbenchmark

Share Improve this question asked Mar 12 at 6:13 Osama Ahmad 2,10612 silver badges35 bronze badges

I can't see the tiny next in the pictures on the mobile screen. Can you put the text in the pictures as text in a code block? – 3CxEZiVlQ Commented Mar 12 at 8:00

Add a comment |

1 Answer 1

Sorted by: Reset to default 4

cpu_atom is from E cores. cpu_core is from P cores. (https://superuser/questions/1677692/what-are-performance-and-efficiency-cores-in-intels-12th-generation-alder-lake/1677779#1677779)

If you want only one or the other, use taskset -c 1 ./a.out to limit it to running on core #1 for example. Note that cpu_migrations is 11 in your first image so it started didn't run on the same core the whole time, including moving between E and P cores.

I've read that cpu_core corresponds to ISA instructions, while cpu_atom corresponds to microarchitecture internals (in my case, x86 micro-ops).

No, completely wrong. The counters for micro-ops include uops_issued.any (front-end fused-domain issue/rename), uops_executed.thread (back-end execution ports, unfused domain), and uops_retired.retire_slots (back-end retirement, matches uops_issued.any if there was no mis-speculation).
These events exist on my Skylake, presumably also on P-cores (cpu_core).
Probably also on E-cores (cpu_atom) even though that's a very different microarchitecture (Gracemont).

发布者：admin，转转请注明出处：http://www.yc00.com/questions/1744768388a4592604.html

c++ - cpu_core vs cpu_atom in perf - Stack Overflow

1 Answer 1

发表回复

评论列表（0条）

联系我们

400-800-8888

c++ - cpu_core vs cpu_atom in perf - Stack Overflow

1 Answer 1

相关推荐