c++ - cpu_core vs cpu_atom in perf - Stack Overflow

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat,

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results:

Here, I can see some metrics counted twice, once for cpu_atom, and once for cpu_core. What is the difference between these two?

I've read that cpu_core corresponds to ISA instructions, while cpu_atom corresponds to microarchitecture internals (in my case, x86 micro-ops). This is somewhat confusing, since I would expect numbers for cpu_atom to be bigger than numbers for cpu_core

It's also a bit confusing how the two, cpu_core and cpu_atom metrics differ relative to each other on multiple runs: This is a much different fraction than the previous run.

There are also times where cpu_atom metrics are not counted:

And there is this run... I assume the 191.02% is a bug. This is 110,238,856 / 57,711,854, which is the cpu_core/branch-misses / cpu_atom/branchs. If this is not a bug, I wonder why divide metrics from cpu_core by metrics from cpu_atom.

Just for reference, here is the code of the ran executable:

#include <benchmark/benchmark.h>

#include <algorithm>
#include <vector>

void test(benchmark::State& s) {
    const auto N = s.range(0);
    std::vector<unsigned long> v1(N), v2(N), c(N);
    
    srand(1);
    std::generate(v1.begin(), v1.end(), [] { return rand(); });
    std::generate(v2.begin(), v2.end(), [] { return rand(); });
#ifdef HIT
    std::generate(c.begin(), c.end(), [] { return rand() >= 0; });
#else
    std::generate(c.begin(), c.end(), [] { return rand() & 1; });
#endif

    for (auto _ : s) {
        unsigned long result = 0;
        for (int i = 0; i < N; i++) {
            if (c[i]) {
                result += v1[i];
            } else {
                result *= v2[i];
            }
        }
        benchmark::DoNotOptimize(result);
        benchmark::ClobberMemory();
    }
}

BENCHMARK(test)->Arg(1 << 22);
BENCHMARK_MAIN();

Compiled as following:

g++ branch_prediction.cpp -o miss -g3 -O3 -mavx2 -lbenchmark

I'm constructing an example that shows the effect of branch mispredictions. When using perf stat, I get the following results:

Here, I can see some metrics counted twice, once for cpu_atom, and once for cpu_core. What is the difference between these two?

I've read that cpu_core corresponds to ISA instructions, while cpu_atom corresponds to microarchitecture internals (in my case, x86 micro-ops). This is somewhat confusing, since I would expect numbers for cpu_atom to be bigger than numbers for cpu_core

It's also a bit confusing how the two, cpu_core and cpu_atom metrics differ relative to each other on multiple runs: This is a much different fraction than the previous run.

There are also times where cpu_atom metrics are not counted:

And there is this run... I assume the 191.02% is a bug. This is 110,238,856 / 57,711,854, which is the cpu_core/branch-misses / cpu_atom/branchs. If this is not a bug, I wonder why divide metrics from cpu_core by metrics from cpu_atom.

Just for reference, here is the code of the ran executable:

#include <benchmark/benchmark.h>

#include <algorithm>
#include <vector>

void test(benchmark::State& s) {
    const auto N = s.range(0);
    std::vector<unsigned long> v1(N), v2(N), c(N);
    
    srand(1);
    std::generate(v1.begin(), v1.end(), [] { return rand(); });
    std::generate(v2.begin(), v2.end(), [] { return rand(); });
#ifdef HIT
    std::generate(c.begin(), c.end(), [] { return rand() >= 0; });
#else
    std::generate(c.begin(), c.end(), [] { return rand() & 1; });
#endif

    for (auto _ : s) {
        unsigned long result = 0;
        for (int i = 0; i < N; i++) {
            if (c[i]) {
                result += v1[i];
            } else {
                result *= v2[i];
            }
        }
        benchmark::DoNotOptimize(result);
        benchmark::ClobberMemory();
    }
}

BENCHMARK(test)->Arg(1 << 22);
BENCHMARK_MAIN();

Compiled as following:

g++ branch_prediction.cpp -o miss -g3 -O3 -mavx2 -lbenchmark
Share Improve this question asked Mar 12 at 6:13 Osama AhmadOsama Ahmad 2,10612 silver badges35 bronze badges 1
  • I can't see the tiny next in the pictures on the mobile screen. Can you put the text in the pictures as text in a code block? – 3CxEZiVlQ Commented Mar 12 at 8:00
Add a comment  | 

1 Answer 1

Reset to default 4

cpu_atom is from E cores. cpu_core is from P cores. (https://superuser/questions/1677692/what-are-performance-and-efficiency-cores-in-intels-12th-generation-alder-lake/1677779#1677779)

If you want only one or the other, use taskset -c 1 ./a.out to limit it to running on core #1 for example. Note that cpu_migrations is 11 in your first image so it started didn't run on the same core the whole time, including moving between E and P cores.

I've read that cpu_core corresponds to ISA instructions, while cpu_atom corresponds to microarchitecture internals (in my case, x86 micro-ops).

No, completely wrong. The counters for micro-ops include uops_issued.any (front-end fused-domain issue/rename), uops_executed.thread (back-end execution ports, unfused domain), and uops_retired.retire_slots (back-end retirement, matches uops_issued.any if there was no mis-speculation).
These events exist on my Skylake, presumably also on P-cores (cpu_core).
Probably also on E-cores (cpu_atom) even though that's a very different microarchitecture (Gracemont).

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744768388a4592604.html

相关推荐

  • c++ - cpu_core vs cpu_atom in perf - Stack Overflow

    I'm constructing an example that shows the effect of branch mispredictions. When using perf stat,

    16小时前
    40

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信