Wednesday, 27 August 2014

Optimal Compiler Flags for ARM


To get a good approximation for the ideal set of gcc flags to use on ARM, COREMARK was compiled and run several times with different flag combinations. I used 16 different combinations with what I assumed were the most important generic flags (Please leave a comment if you know of any other flags I should include!). Below is an example for the Cortex-A7. A similar set of flags was used for the A9 and A15.
Example of flag combinations for the Cortex-A7
I chose coremark for four reasons. One, it is supported by ARM and suggested as the embedded benchmark of choice. Two, it is easily compiled and allows easy configuration. Three, it does not use external libraries so the flags have an affect. Four, the benchmark generates random data during the test so no work is optimised away by the compiler,
Results for different flag combinations for A7, A9 and A15.

Listed below are the flags that were found to be the best. (Generally between 6-10% improvement).

A7    : -O3 -mfpu=neon-vfpv4 -march=armv7-a -mtune=cortex-a7
A9    : -O3 -mfpu=neon         -march=armv7-a
A15  : -O3 -mfpu=neon-vfpv4 -march=armv7-a

A sanity check on my runs found that the iterations I was using in CoreMark was fair for each board. This can be seen by the plateau that is reached by each board. 

Iteration count vs performance. 10s run time was adequate to reach a plateau.

No comments:

Post a Comment