A call of PAPI_thread_init() seems necessary
Also, reliable results can only be obtained for threads that are bound to a specific core (socket at least?), otherwise the FLOP and CPU counters from different CPU's get mixed. Since PAPI works now on my IvyBridge system with the newest version I can do more detailed tests there.
Please register or sign in to comment