Wednesday, 13 November 2013

ARM Cortex-A7 GCC Flags (Allwinner A20)

I have been digging around to try to optimise the GCC flags for the Cubieboard (Allwinner A20 SoC). This post serves as a convenient reference for all into some of the technical specifications and their associated GCC flags.

Compiler Version

Please take note that the Cortex-A7 architecture is quite new and therefore the version of compiler you choose is very important. The relevant change notes from GCC are as follows:

GCC 4.6 - No official support for the Cortex-A7
GCC 4.7 - Support was added (
GCC 4.8 - Code generation improvements for Cortex-A7 (

Therefore, you want to try to use at least GCC 4.8. Generally code from an older compiler will still work but it will be optimised for the Cortex-A9, for example, which has a VFPv3 FPU, but compiled for ARMv7a generally.

Allwinner A20 CPU Specifications

  • Dual core Cortex-A7 @ ~1 GHz
  • 256 KB L2 Cache (Shared)
  • 32 KB / 32 KB L1 Cache per core
  • VFPv4-D16 Floating Point Unit (FPU)
  • Virtualisation Extensions
The ARM web site provides a lot of information regarding the specifications of the Cortex-A7:

GCC Flags

A complete list of avaliable GCC flags for ARM can be found here: Each flag has a short (and sometimes longer) description.

The obvious flags are to set the CPU type. The -mcpu flag is quite specific. If you wanted to compile in a more generic manner you could set -march to the ARMv7a architecture and then only use -mtune and not -mcpu which would allow the code to run on all ARMv7a CPU's, but be more optimised for a certain CPU. Generally you should set either -mcpu or -march. I am not interested in this behaviour since it's an embedded system and I want it to be as optimised as possible. -mtune is redundant when -mcpu is set, but one never really knows what's going on inside the compiler so I set both just in case!


The next most important flags are for the FPU. Software floating point is basically a disaster so don't waste your time with it. We have both VFPv4-D16 and NEON. These are like two sides of the same coin. NEON enables SIMD (Single Instruction Multiple Data) for the VFPv4 FPU, which will provide a large speedup if the compiler can use it. There is a snag, however: NEON is not fully IEEE 754 compliant. What this means is that denormal values are treated as zero, which can lead to some floating-point accuracy issues. I plan on doing some more research into this issue, and if it is really worth worrying about.

The flags below are applicable:


Note that if you use NEON then you should also use:


This is because of the IEEE 754 compliance mentioned earlier. You need to specifically tell the compiler that you don't mind about any accuracy issues - otherwise it essentially won't use the NEON extensions!

Setting the Flags in a Cross-Compiler

I am using the Linaro Cross Toolchain which I showed how to set up in a previous post. The issue with this toolchain is that it is tuned for the Cortex-A9 processor. In order to change this permanently, one has to rebuild the toolchain (what a strain!). The Linaro FAQ tells us this.

Of course there is the manual way where you specify the flags to gcc when it is called, however this does not work well with the make command. Instead, I created a bash script and replaced the gcc symlink in the toolchain /bin directory with this script. Don't forget to make it executable!

/path/to/your/cross/gcc -mtune=cortex-a7 -mfpu=neon-vfpv4 $*

This will now transparently call the gcc compiler with the flags, every time. If you try to force the -mfloat-abi to hard, you will find that U-Boot fails to compile. This is because U-Boot compiles with software floating point due to their own reasoning.

No comments:

Post a Comment