Friday, 15 November 2013

Installing HPL on Cubieboard2 + Ubuntu 13.10

I am following almost exactly the same procedure as my previous post with Ubuntu 12.04. Here we are working with Ubuntu 13.10 Server on the Cubieboard2 which can be found here:,891.0.html

System Specs

  • Cubieboard 2
    •  Processor         - Allwinner A20
    •  Cores               - Cortex-A7 Dual core
    •  Graphics PU      - ARM® Mali400MP2
    •  Memory           - 1GB DDR3
  • Using Ubuntu 13.10 Server
    • This version uses hardfp which is more suited for the arm and makes use of the VFP
    • The GCC compiler for 13.10 is more updated than 12.04. We have 4.7


HPL requires the availability of a Message Passing Interface (MPI) and either the Basic Linear Algebra Subprograms (BLAS) or Vector Signal Image Processing Library (VSIPL). In my case I have used MPICH2 and the ATLAS package both of which I got from the repository. Before you start thinking why I have not used an ATLAS tuned BLAS and that my results will be poor because of it I remind you that my main objective is to have HPL up and running first and foremost. There are too many things that can go wrong in the ATLAS tuned BLAS approach. I will however get to these topics in future posts.

Get the required packages

sudo apt-get install mpich2
sudo apt-get install libatlas3-base-dev

Then get the HPL source code from
And extract it to a folder in your home directory. We need to produce the generic make file and then edit this according to our system.

Now to install

tar -xvf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cp Make.UNKNOWN ../Make.cubieboard

Now you must link your MPI libraries correctly in order for the build to incorporate multi core support. It took me a few hours of changing things around till I got it working. This is what I had to change in the end.

ARCH       = cubieboard
TOPdir     = $(HOME)/HDD/hpl-2.1
MPdir      = /usr/lib/mpich2
MPinc      = -I$(MPdir)/include
MPlib      = /usr/lib/libfmpich.a
LAdir      = /usr/lib/atlas-base/
LAlib      = $(LAdir)/ $(LAdir)/
HPL_LIBS   = $(HPLlib) $(LAlib) $(MPlib) -lmpl -lcr
CCFLAGS    = $(HPL_DEFS) -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -ffast-math -O3

Just make sure you use the correct TOPdir and if you have your libraries in different locations then change the above accordingly. I added the CCFLAGS as I wanted the best results (knowing I have standard BLAS libraries). Here is my entire make file if you would like to compare Make.cubieboard-U13.10 .

Now compile HPL

make arch=cubieboard

HPL has a large amount of input variables and an even large combination of them that can be very intimidating. I still have not wrapped my head around all of them. If you go into the HPL.dat file you will see what I mean. You can find it in the bin/cubieboard/ folder. You can find a full explanation of what the input variables do here. A very useful site I found gives you a standard HPL.dat file to start from. So lets start by going to the site and filling out the specs you need. Below is the HPL.dat file that I used.

HPLinpack benchmark input file
University of the Witwatersrand
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
8000         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Note that you must specify the number of cores that you want to run on. So in our case here the Cubieboard2 is a dual core hence we specify Ps X Qs = 1 X 2 = 2. If you wanted to run this on a single core then you would set Ps = Qs = 1. If you do not have the correct number of cores then you will get an error when running HPL. Note that if you run multiple process grids then you must start HPL with the maximum number of cores that are needed.

Now to start HPL on both cores I need to run the mpi command. This is done with

mpirun -np 2 ./xhpl

The -np determines the number of cores. This must be the same as the product Ps X Qs. The output is then piped to the file HPL.out

Next Up

This was largely successful as it proves that the HPL is working on both cores. The next steps will be to custom tune the BLAS libraries and also optimise the OS with better configured Kernels. This will be explain in a different post by Mitch.

No comments:

Post a Comment