System Specs
- Cubieboard 2
- Processor - Allwinner A20
- Cores - Cortex-A7 Dual core
- Graphics PU - ARM® Mali400MP2
- Memory - 1GB DDR3
- Using Ubuntu 13.10 Server
- This version uses hardfp which is more suited for the arm and makes use of the VFP
- The GCC compiler for 13.10 is more updated than 12.04. We have 4.7
Prerequisites
HPL requires the availability of a Message Passing Interface (MPI) and either the Basic Linear Algebra Subprograms (BLAS) or Vector Signal Image Processing Library (VSIPL). In my case I have used MPICH2 and the ATLAS package both of which I got from the repository. Before you start thinking why I have not used an ATLAS tuned BLAS and that my results will be poor because of it I remind you that my main objective is to have HPL up and running first and foremost. There are too many things that can go wrong in the ATLAS tuned BLAS approach. I will however get to these topics in future posts.
Get the required packages
sudo apt-get install mpich2 sudo apt-get install libatlas3-base-dev
Then get the HPL source code from http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
And extract it to a folder in your home directory. We need to produce the generic make file and then edit this according to our system.
Now to install
tar -xvf hpl-2.1.tar.gz cd hpl-2.1/setup sh make_generic cp Make.UNKNOWN ../Make.cubieboard
Now you must link your MPI libraries correctly in order for the build to incorporate multi core support. It took me a few hours of changing things around till I got it working. This is what I had to change in the end.
Just make sure you use the correct TOPdir and if you have your libraries in different locations then change the above accordingly. I added the CCFLAGS as I wanted the best results (knowing I have standard BLAS libraries). Here is my entire make file if you would like to compare Make.cubieboard-U13.10 .
Now compile HPL
ARCH = cubieboard TOPdir = $(HOME)/HDD/hpl-2.1 MPdir = /usr/lib/mpich2 MPinc = -I$(MPdir)/include MPlib = /usr/lib/libfmpich.a LAdir = /usr/lib/atlas-base/ LAlib = $(LAdir)/libf77blas.so.3 $(LAdir)/libatlas.so.3 HPL_LIBS = $(HPLlib) $(LAlib) $(MPlib) -lmpl -lcr CCFLAGS = $(HPL_DEFS) -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -ffast-math -O3
Just make sure you use the correct TOPdir and if you have your libraries in different locations then change the above accordingly. I added the CCFLAGS as I wanted the best results (knowing I have standard BLAS libraries). Here is my entire make file if you would like to compare Make.cubieboard-U13.10 .
Now compile HPL
make arch=cubieboard
HPL has a large amount of input variables and an even large combination of them that can be very intimidating. I still have not wrapped my head around all of them. If you go into the HPL.dat file you will see what I mean. You can find it in the bin/cubieboard/ folder. You can find a full explanation of what the input variables do here. A very useful site I found gives you a standard HPL.dat file to start from. So lets start by going to the site and filling out the specs you need. Below is the HPL.dat file that I used.
HPLinpack benchmark input file University of the Witwatersrand HPL.out output file name (if any) 8 device out (6=stdout,7=stderr,file) 1 # of problems sizes (N) 8000 Ns 1 # of NBs 128 NBs 0 PMAP process mapping (0=Row-,1=Column-major) 1 # of process grids (P x Q) 1 Ps 2 Qs 16.0 threshold 1 # of panel fact 2 PFACTs (0=left, 1=Crout, 2=Right) 1 # of recursive stopping criterium 4 NBMINs (>= 1) 1 # of panels in recursion 2 NDIVs 1 # of recursive panel fact. 1 RFACTs (0=left, 1=Crout, 2=Right) 1 # of broadcast 1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM) 1 # of lookahead depth 1 DEPTHs (>=0) 2 SWAP (0=bin-exch,1=long,2=mix) 64 swapping threshold 0 L1 in (0=transposed,1=no-transposed) form 0 U in (0=transposed,1=no-transposed) form 1 Equilibration (0=no,1=yes) 8 memory alignment in double (> 0) ##### This line (no. 32) is ignored (it serves as a separator). ###### 0 Number of additional problem sizes for PTRANS 1200 10000 30000 values of N 0 number of additional blocking sizes for PTRANS 40 9 8 13 13 20 16 32 64 values of NB
Note that you must specify the number of cores that you want to run on. So in our case here the Cubieboard2 is a dual core hence we specify Ps X Qs = 1 X 2 = 2. If you wanted to run this on a single core then you would set Ps = Qs = 1. If you do not have the correct number of cores then you will get an error when running HPL. Note that if you run multiple process grids then you must start HPL with the maximum number of cores that are needed.
Now to start HPL on both cores I need to run the mpi command. This is done with
Now to start HPL on both cores I need to run the mpi command. This is done with
mpirun -np 2 ./xhpl
The -np determines the number of cores. This must be the same as the product Ps X Qs. The output is then piped to the file HPL.out
Next Up
This was largely successful as it proves that the HPL is working on both cores. The next steps will be to custom tune the BLAS libraries and also optimise the OS with better configured Kernels. This will be explain in a different post by Mitch.
No comments:
Post a Comment