Friday, 15 November 2013

Installing HPL on Cubieboard2 + Ubuntu 13.10

I am following almost exactly the same procedure as my previous post with Ubuntu 12.04. Here we are working with Ubuntu 13.10 Server on the Cubieboard2 which can be found here: http://www.cubieforums.com/index.php/topic,891.0.html

System Specs

  • Cubieboard 2
    •  Processor         - Allwinner A20
    •  Cores               - Cortex-A7 Dual core
    •  Graphics PU      - ARM® Mali400MP2
    •  Memory           - 1GB DDR3
  • Using Ubuntu 13.10 Server
    • This version uses hardfp which is more suited for the arm and makes use of the VFP
    • The GCC compiler for 13.10 is more updated than 12.04. We have 4.7

Prerequisites

HPL requires the availability of a Message Passing Interface (MPI) and either the Basic Linear Algebra Subprograms (BLAS) or Vector Signal Image Processing Library (VSIPL). In my case I have used MPICH2 and the ATLAS package both of which I got from the repository. Before you start thinking why I have not used an ATLAS tuned BLAS and that my results will be poor because of it I remind you that my main objective is to have HPL up and running first and foremost. There are too many things that can go wrong in the ATLAS tuned BLAS approach. I will however get to these topics in future posts.

Get the required packages

sudo apt-get install mpich2
sudo apt-get install libatlas3-base-dev

Then get the HPL source code from http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
And extract it to a folder in your home directory. We need to produce the generic make file and then edit this according to our system.

Now to install

tar -xvf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cp Make.UNKNOWN ../Make.cubieboard

Now you must link your MPI libraries correctly in order for the build to incorporate multi core support. It took me a few hours of changing things around till I got it working. This is what I had to change in the end.

ARCH       = cubieboard
TOPdir     = $(HOME)/HDD/hpl-2.1
MPdir      = /usr/lib/mpich2
MPinc      = -I$(MPdir)/include
MPlib      = /usr/lib/libfmpich.a
LAdir      = /usr/lib/atlas-base/
LAlib      = $(LAdir)/libf77blas.so.3 $(LAdir)/libatlas.so.3
HPL_LIBS   = $(HPLlib) $(LAlib) $(MPlib) -lmpl -lcr
CCFLAGS    = $(HPL_DEFS) -mfpu=neon -mfloat-abi=hard -funsafe-math-optimizations -ffast-math -O3

Just make sure you use the correct TOPdir and if you have your libraries in different locations then change the above accordingly. I added the CCFLAGS as I wanted the best results (knowing I have standard BLAS libraries). Here is my entire make file if you would like to compare Make.cubieboard-U13.10 .

Now compile HPL

make arch=cubieboard

HPL has a large amount of input variables and an even large combination of them that can be very intimidating. I still have not wrapped my head around all of them. If you go into the HPL.dat file you will see what I mean. You can find it in the bin/cubieboard/ folder. You can find a full explanation of what the input variables do here. A very useful site I found gives you a standard HPL.dat file to start from. So lets start by going to the site and filling out the specs you need. Below is the HPL.dat file that I used.

HPLinpack benchmark input file
University of the Witwatersrand
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
8000         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
1            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Note that you must specify the number of cores that you want to run on. So in our case here the Cubieboard2 is a dual core hence we specify Ps X Qs = 1 X 2 = 2. If you wanted to run this on a single core then you would set Ps = Qs = 1. If you do not have the correct number of cores then you will get an error when running HPL. Note that if you run multiple process grids then you must start HPL with the maximum number of cores that are needed.

Now to start HPL on both cores I need to run the mpi command. This is done with

mpirun -np 2 ./xhpl

The -np determines the number of cores. This must be the same as the product Ps X Qs. The output is then piped to the file HPL.out

Next Up

This was largely successful as it proves that the HPL is working on both cores. The next steps will be to custom tune the BLAS libraries and also optimise the OS with better configured Kernels. This will be explain in a different post by Mitch.


First Wits press release of January's workshop: Big data under the spotlight


The Wits press office has released the first piece to advertise the Big Data workshop in January:

http://www.wits.ac.za/newsroom/newsitems/201312/22094/news_item_22094.html

Check it out!


Big data under spotlight


15 November 2013
One of the biggest challenges for scientists this century will be to develop supercomputers that can process huge data output from big science projects such as the Square Kilometre Array.
To address this and other data questions, renowned world physicists and engineers will be attending the 2014 High-performance Signal and Data Processing workshop to be held at Wits University from 27 to 31 January 2014.

They include Dr Peter Jenni, one of the “founding fathers” and former spokesperson of the ATLAS experiment at the CERN Large Hadron Collider in Switzerland that discovered the Higgs boson in 2012, and Dr Bernie Fanaroff, Wits alumni and Project Director of the Square Kilometre Array.
“There are supercomputers in the world, but they are essentially doing a lot of computation and are extremely expensive. We want to process big flows of data,” says Professor Bruce Mellado from the High Energy Physics Group (HEP) in the School of Physics at Wits University. Together with his colleagues in HEP and fellow workshop organisers, Dr Oana Boeriu and Dr Trevor Vickey, Mellado and his team is developing and building a high-throughput supercomputer.

“Called the Massive Affordable Computing (MAC) project, HEP aims to use existing computer elements available on the market to build supercomputers that are cheap and energy efficient,” Mellado says.

Processing the vast quantities of data that the SKA will produce will require very high performance central supercomputers capable of 100 petaflops per second processing power. This is about 50 times more powerful than the current most powerful supercomputer and equivalent to the processing power of about one hundred million PCs. The technological challenges related to high-throughput data flows at the ATLAS detector today are common to those facing the SKA in the future.

With this workshop, themed Challenges in Astro- and Particle Physics and Radio Astronomy Instrumentation, the organisers aim to bring together key people to discuss the grand challenges facing the signal processing community in Radio Astronomy, Gamma Ray Astronomy and Particle Physics. But, the development of high-throughput computers will also have a revolutionary impact on data processing in all fields of science - including the medical sciences, palaeosciences and engineering - and the organisers hope to attract delegates from those fields of study as well.
The workshop will also have plenary sessions for in-depth presentations and knowledge sharing between delegates will be in lecture format, as well as a classroom environment for hands-on hardware training. General overviews and in-depth presentations will be given.

Students and young researches are also welcome to deliver presentations and encouraged to submit abstracts. The registration and abstract submission are now open till 31 December 2013.
It is envisioned to publish a book of proceedings. Proceedings will be peer reviewed. The deadline for proceedings submission is 15 February 2014. The conference is co-presented by the SKA Africa, the University of Cape Town, the National Research Foundation/iThemba Labs, Stellenbosch University and CERN-SA.




Building a Cubieboard Kernel: Part 1

To date it seems that all of the pre-compiled kernels and toolchains online for the Cubieboard are using stock parameters which tend to be tuned for the Cortex-A9 or built using an older version of GCC which does not fully support the Cortex-A7 CPU!

For these reasons, and also because I would like to make a more 'Lean and Mean' kernel with less pointless drivers to waste memory, I have endeavoured to build my own. This post will describe the general process of building a kernel for the Cubieboard. I will note a few initial changes I made to the kernel config but there needs to be some testing before I can conclude whether my changes (and more to come, I'm sure) are worth it or not. I plan on making a 'Part 2' to confirm performance changes and my final kernel config.

Let's Get Started!

The first step is to ensure you have a working cross-compiler toolchain installed. If you do not, see my post here on setting up the latest Linaro toolchain. This post describes how to modify this toolchain to be more optimised for the Cortex-A7.

Besides the toolchain setup, above, please make sure you have u-boot-tools installed:

sudo apt-get install u-boot-tools

This package contains the mkimage command that is required to make the final image. You then need to get the source code. Kernel sources tend to be huge so I opted to get only the latest revision of code and no history. I think this at least halved the download size!

git clone --depth 1 https://github.com/linux-sunxi/linux-sunxi.git --branch sunxi-3.4

This was about a 400 MB download. Once it completes, there is a handy command to load an initial working config for the Cubieboard:

make ARCH=arm CROSS_COMPILE=${CC201310A7} sun7i_defconfig


Some Config Changes

If you would like to view or modify this default configuration then you can get to the normal menuconfig with:

make ARCH=arm CROSS_COMPILE=${CC201310A7} menuconfig

This will bring up the classic Linux kernel menuconfig. Here you can browse through, see some info on the various items with the help command, and change things! Be sure to save the config when you are done: there is a save option near the bottom of the main menu. Save the config as .config for it to be used by the make command.

As I mentioned earlier I chose to modify a few things in this initial run. I plan on comparing the performance between the kernels supplied by the community, a kernel that is a stock configuration but compiled for the Cortex-A7 with GCC 4.8 and also a kernel with my modifications to the config.

Initially, I chose to only turn of forced preemption, which should allow higher throughput by telling the kernel to not jump through tasks too quickly. The default was set to a real-time system which is great for desktop, but not great for processing tasks. Here's how you find the setting:

Kernel Features -> Preemption Model -> No Forced Preemption


Another issue I discovered was that by default, the ethernet drivers are not compiled into the kernel - they are build as a module. This means that to use the module we have to manually tell Linux to load it. I don't want this behaviour, so I specified to build the ethernet drivers into the kernel.


Note that to get ethernet to work after you first boot, later on in the process, you will probably have to tell the system to bring up the interface and add some stuff to the config files so that this happens on boot:

ifconfig eth0 up
echo auto eth0 >> /etc/network/interfaces
echo iface eth0 inet dhcp >> /etc/network/interfaces

The Build

Once you are happy with your changes you can build the kernel. Modify the -j3 to -j(number of CPU's + 1) to suite your build system for a faster build.

make ARCH=arm CROSS_COMPILE=${CC201310A7} uImage modules -j3

and then

make ARCH=arm CROSS_COMPILE=${CC201310A7} INSTALL_MOD_PATH=output modules_install

This will take a while... Once it's done you only need to copy the kernel uImage and modules onto your SD card! The commands below will do this for you. Note that I have mounted the boot partition of my SD card to /media/boot and the rootfs to /media/rootfs. If the uImage file is missing then the compile above failed at some point.

sudo cp -v arch/arm/boot/uImage /media/boot/
sudo rm -r /media/rootfs/lib/*
sudo cp -rv output/* /media/rootfs/lib/

Unmount the SD card, put it in your Cubieboard and hope for the best! ;)

Thursday, 14 November 2013

Conclusions from first Back-end Front-end integration excercise


The following is the outcome of the first Back-end Front-end integration exercise:
  • The communication between Daughterboard (front-end) and the Super Read-out Driver, sROD, emulator (front-end) has been successfully tested.
  • Achieved upstream and downstream flow.
  • Two different platforms were used as sROD emulator: Xilinx Virtex 7 and Xilinx Kintex 7. Obtained satisfactory results with both evaluation boards.
  • An overnight test has been performed to spot potential transmission errors. No error was identified after 14 hours. 

Tests have been performed between an external PC and the readout chain using an IP bus:
  • An IP bus interface has been integrated in the design of the sROD emulator.
  • This facilitates easy access to hardware registers in the sROD emulator over ethernet. 
  • Different communication tests have been performed successfully using the IP bus interface, allowing to use the external PC to read and write registers in the sROD emulator. 


Tests communication between the Daughterboard and the Mainboard:
  • Commands and data words have been transmitted from the sROD emulator to the daughterboard using the IPbus interface.
  • The Daughterboard processes and transmits the commands and data to the Mainboard.
  • This test was not completely successful. The daughterboard reacted to the commands received over IPbus, but not in the expected way.
  • Bug probably in the state machine that decodes the command into specific actions for the Mainboard. 

Next integration session will take place in December at CERN.

To the right, Pablo Moreno, from Wits.




Wednesday, 13 November 2013

Installing HPL on Wandboard + Ubuntu 12.04

With the ultimate goal of benchmarking an array of Wandboards a good starting point is to install High Performance LINPACK on one machine and then begin expanding the number of boards. This post will discuss how to configure all the required packages and the HPL itself to get it up and running. Note: This is not a discussion on tuning or optimizing for high flop counts but just to get a working benchmark. I will discuss tuning in a later post.

System Specs

  • Wandboard Quad
    •  Processor         - Freescale i.MX6 Quad
    •  Cores               - Cortex-A9 Quad core
    •  Graphic engine  - Vivante GC 2000 + Vivante GC 355 + Vivante GC 320
    •  Memory           - 2GB DDR3
  • Using Ubuntu 12.04 LTS
    • This version uses softfp which is not ideal and hence why I am not going to tune the HPL
    • The GCC compiler for 12.04 is outdated when it comes to ARM and does not support hardfp
    • Ubuntu 13.10 is available and has the much needed updates. I will move across to this soon when I have a decent set of results to compare the 12.04 and 13.10 versions. This will be a nice comparison primarily between the hardfp vs softvp effect.

Prerequisites

HPL requires the availability of a Message Passing Interface (MPI) and either the Basic Linear Algebra Subprograms (BLAS) or Vector Signal Image Processing Library (VSIPL). In my case I have used MPICH2 and the ATLAS package both of which I got from the repository. Before you start thinking why I have not used an ATLAS tuned BLAS and that my results will be poor because of it I remind you that my main objective is to have HPL up and running first and foremost. There are too many things that can go wrong in the ATLAS tuned BLAS approach. I will however get to these topics in future posts. I assume you have the standard compilers that have come with the Ubuntu 12.04 image. If you are wondering why I used MPICH2 rather than OpenMPI is that MPICH2 worked first :)

Get the required packages

sudo apt-get install mpich2
sudo apt-get install libatlas-base-dev

Then get the HPL source code from http://www.netlib.org/benchmark/hpl/hpl-2.1.tar.gz
And extract it to a folder in your home directory. We need to produce the generic make file and then edit this according to our system.

Now to install

tar -xvf hpl-2.1.tar.gz
cd hpl-2.1/setup
sh make_generic
cp Make.UNKNOWN ../Make.wandboard

Now you must link your MPI libraries correctly in order for the build to incorporate multi core support. It took me a few hours of changing things around till I got it working. This is what I had to change in the end.

ARCH       = wandboard
TOPdir     = $(HOME)/HDD/hpl-2.1
MPdir      = /usr/lib/mpich2
MPinc      = -I$(MPdir)/include
MPlib      = /usr/lib/libfmpich.a
LAdir      = /usr/lib/atlas-base/
LAlib      = $(LAdir)/libf77blas.a $(LAdir)/libatlas.a
HPL_LIBS   = $(HPLlib) $(LAlib) $(MPlib) -lmpl -lcr
CCFLAGS    = $(HPL_DEFS) -mfpu=neon -mfloat-abi=softfp -funsafe-math-optimizations -ffast-math -O3

Just make sure you use the correct TOPdir and if you have your libraries in different locations then change the above accordingly. I added the CCFLAGS as I wanted the best results (knowing I have standard BLAS libraries). Here is my entire make file if you would like to compare Make.wandboard-U12.04.

Now compile HPL

make arch=wandboard

HPL has a large amount of input variables and an even large combination of them that can be very intimidating. I still have not wrapped my head around all of them. If you go into the HPL.dat file you will see what I mean. You can find it in the bin/wandboard/ folder. You can find a full explanation of what the input variables do here. A very useful site I found gives you a standard HPL.dat file to start from. So lets start by going to the site and filling out the specs you need. Below is the HPL.dat file that I used.

HPLinpack benchmark input file
University of the Witwatersrand
HPL.out      output file name (if any)
8            device out (6=stdout,7=stderr,file)
1            # of problems sizes (N)
10240         Ns
1            # of NBs
128           NBs
0            PMAP process mapping (0=Row-,1=Column-major)
1            # of process grids (P x Q)
2            Ps
2            Qs
16.0         threshold
1            # of panel fact
2            PFACTs (0=left, 1=Crout, 2=Right)
1            # of recursive stopping criterium
4            NBMINs (>= 1)
1            # of panels in recursion
2            NDIVs
1            # of recursive panel fact.
1            RFACTs (0=left, 1=Crout, 2=Right)
1            # of broadcast
1            BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1            # of lookahead depth
1            DEPTHs (>=0)
2            SWAP (0=bin-exch,1=long,2=mix)
64           swapping threshold
0            L1 in (0=transposed,1=no-transposed) form
0            U  in (0=transposed,1=no-transposed) form
1            Equilibration (0=no,1=yes)
8            memory alignment in double (> 0)
##### This line (no. 32) is ignored (it serves as a separator). ######
0                               Number of additional problem sizes for PTRANS
1200 10000 30000                values of N
0                               number of additional blocking sizes for PTRANS
40 9 8 13 13 20 16 32 64        values of NB

Note that you must specify the number of cores that you want to run on. So in our case here the Wandboard is a quad core hence we specify Ps X Qs = 2 X 2 = 4. If you wanted to run this on a single core then you would set Ps = Qs = 1. If you do not have the correct number of cores then you will get an error when running HPL. Note that if you run multiple process grids then you must start HPL with the maximum number of cores that are needed.

Now to start HPL on all four cores I need to run the mpi command. This is done with

mpirun -np 4 ./xhpl

The -np determines the number of cores. This must be the same as the product Ps X Qs. The output is then piped to the file HPL.out

Next Up

This was largely successful as it proves that the HPL is working on 4 cores. I quite like the idea of having results before any optimizations as we can quantitatively see how much improvement we get from the tuned BLAS and new hardfp we will have in the next set of testing on the Ubuntu 13.10 system.


ARM Cortex-A7 GCC Flags (Allwinner A20)

I have been digging around to try to optimise the GCC flags for the Cubieboard (Allwinner A20 SoC). This post serves as a convenient reference for all into some of the technical specifications and their associated GCC flags.

Compiler Version

Please take note that the Cortex-A7 architecture is quite new and therefore the version of compiler you choose is very important. The relevant change notes from GCC are as follows:

GCC 4.6 - No official support for the Cortex-A7
GCC 4.7 - Support was added (http://gcc.gnu.org/gcc-4.7/changes.html)
GCC 4.8 - Code generation improvements for Cortex-A7 (http://gcc.gnu.org/gcc-4.8/changes.html)

Therefore, you want to try to use at least GCC 4.8. Generally code from an older compiler will still work but it will be optimised for the Cortex-A9, for example, which has a VFPv3 FPU, but compiled for ARMv7a generally.

Allwinner A20 CPU Specifications

  • Dual core Cortex-A7 @ ~1 GHz
  • 256 KB L2 Cache (Shared)
  • 32 KB / 32 KB L1 Cache per core
  • SIMDv2 NEON
  • VFPv4-D16 Floating Point Unit (FPU)
  • Virtualisation Extensions
The ARM web site provides a lot of information regarding the specifications of the Cortex-A7: http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0463d/ch02s01s02.html

GCC Flags

A complete list of avaliable GCC flags for ARM can be found here: http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html Each flag has a short (and sometimes longer) description.

The obvious flags are to set the CPU type. The -mcpu flag is quite specific. If you wanted to compile in a more generic manner you could set -march to the ARMv7a architecture and then only use -mtune and not -mcpu which would allow the code to run on all ARMv7a CPU's, but be more optimised for a certain CPU. Generally you should set either -mcpu or -march. I am not interested in this behaviour since it's an embedded system and I want it to be as optimised as possible. -mtune is redundant when -mcpu is set, but one never really knows what's going on inside the compiler so I set both just in case!

-mcpu=cortex-a7
-mtune=cortex-a7

The next most important flags are for the FPU. Software floating point is basically a disaster so don't waste your time with it. We have both VFPv4-D16 and NEON. These are like two sides of the same coin. NEON enables SIMD (Single Instruction Multiple Data) for the VFPv4 FPU, which will provide a large speedup if the compiler can use it. There is a snag, however: NEON is not fully IEEE 754 compliant. What this means is that denormal values are treated as zero, which can lead to some floating-point accuracy issues. I plan on doing some more research into this issue, and if it is really worth worrying about.

The flags below are applicable:

-mfpu=vfpv4-d16
-mfpu=neon-vfpv4

Note that if you use NEON then you should also use:

-funsafe-math-optimizations

This is because of the IEEE 754 compliance mentioned earlier. You need to specifically tell the compiler that you don't mind about any accuracy issues - otherwise it essentially won't use the NEON extensions!

Setting the Flags in a Cross-Compiler

I am using the Linaro Cross Toolchain which I showed how to set up in a previous post. The issue with this toolchain is that it is tuned for the Cortex-A9 processor. In order to change this permanently, one has to rebuild the toolchain (what a strain!). The Linaro FAQ tells us this.

Of course there is the manual way where you specify the flags to gcc when it is called, however this does not work well with the make command. Instead, I created a bash script and replaced the gcc symlink in the toolchain /bin directory with this script. Don't forget to make it executable!

#!/bin/bash
/path/to/your/cross/gcc -mtune=cortex-a7 -mfpu=neon-vfpv4 $*

This will now transparently call the gcc compiler with the flags, every time. If you try to force the -mfloat-abi to hard, you will find that U-Boot fails to compile. This is because U-Boot compiles with software floating point due to their own reasoning.

Tuesday, 12 November 2013

First attempt to integrate Back-end and Front-end for the TileCal upgrade

Today we attempted for the first time to integrate the Back-end and Front-end of the new upgraded TileCal electronics. Work performed in collaboration with IFIC, Stockholm University, University of Chicago, Argonne National Laboratory and CERN. In the picture one can see the daughter board that will sit on the detector, connected with FPGA evaluation boards that serve as simulators of the future sROD.

Linaro Cross-Compile Setup on Ubuntu 12.04

This is a simple guide that shows how to set up an Ubuntu 12.04 based build environment for Linaro.

It is convenient to use a virtual machine for a build environment since you can share it with other team members and also make backups for the entire system by simply duplicating the virtual hard disk.

The first step is to download Ubuntu 12.04 Server 64 bit. If you are not a fan of CLI, you should try to get used to it! :P The Server edition is more light-weight, more optimised for throughput and uses less RAM, letting you get faster builds. If you are installing on a PC or inside a VM the process is pretty much the same – I'll leave it to you! I would recommend at least 2 GB RAM, and 2 CPU’s, though.

I decided to install OpenSSH Server as well as SAMBA file sharing so that I can easily get files onto and off of the machine. I won't explain how to set these up – there are plenty of guides online already!

Install Some Packages

Once the machine is installed there are quite a few packages we need to install. Please type the commands below:

sudo apt-get install build-essential git ccache zlib1g-dev gawk bison flex gettext uuid-dev

Since we are on a 64 bit machine and the Linaro binaries rely on 32 bit as of present, you also need to install 32 bit libs:

sudo apt-get install ia32-libs

Finally, since we will probably be compiling a kernel at some point, we need ncurses for make menuconfig:

sudo apt-get install ncurses-dev

All together this is about 200 MB of downloads. Feel free to combine all the packages into one command and go grab a coffee if you aren't on a fast connection!


Install the Cross-Compiler

The Linaro project is becoming the industry standard in ARM platforms, in my opinion. They regularly produce cross-toolchains (a difficult and tedious task to do one-self) as well as following the Ubuntu release schedule and making some tweaks and releasing an ARM optimised version.

At the moment we are only interested in the toolchain. Please browse to https://releases.linaro.org/ to see what’s available and what the latest reasonable packages are. At the time of writing, 13.11 is the latest but I will be using 13.10 as this has already been compiled by some good people: https://launchpad.net/linaro-toolchain-binaries

It’s a good idea to create a new directory in your home directory (or wherever you want) for any Linaro related stuff. I’ll be using ~/linaro/ as the directory for all linaro stuff. Download the latest pakage that has been built and extract it:

cd ~/linaro

wget https://launchpad.net/linaro-toolchain-binaries/trunk/2013.10/+download/gcc-linaro-arm-linux-gnueabihf-4.8-2013.10_linux.tar.xz

tar xJf gcc-linaro-arm-linux-gnueabihf-4.8-2013.10_linux.tar.xz

Remember the handy shortcut of pressing TAB after typing a few letters to auto-complete the rest of the file name!

You will now have extracted a new folder with the cross-compiler toolchain inside it. We will export a variable that will point to this directory so that we can call the GCC, etc. inside easily. I am going to name this variable something useful, besides just CC (cross-compiler) since it is probable that I will be downloading newer versions of the toolchain in future and I would like to be able to access any of them!

export CC201310=`pwd`/gcc-linaro-arm-linux-gnueabihf-4.8-2013.10_linux/bin/arm-linux-gnueabihf-

Naturally you will replace the names with whatever version you are using. The `pwd` part prints the working directory. It’s a shortcut to writing out /home/you/linaro/, which you could do if you wanted to!

Now, whenever you type ${CC201310} followed by a command like gcc, you are actually pointing to that command inside the ${CC201310} directory. To test:

${CC201310}gcc --version

Should give you a result! If you get an error, make sure you installed everything earlier and the export went well.

You should export this every time you boot the machine, or stick it into a script that runs on boot. We can do this by adding a line at the end of ~/.bashrc:

export CC201310=/home/you/linaro/gcc-linaro-arm-linux-gnueabihf-4.8-2013.10_linux/bin/arm-linux-gnueabihf-


That’s it!

Congratulations! Pretty simple, right? Now, if you want to make something using the cross compiler:

make ARCH=arm CROSS_COMPILE=${CC201310}

Obviously this make command isn't going to work if you type it in... Follow the instructions of the software you are trying to install! The command above is just a template. 

I should add that as of writing, the compiler is by default set up for any ARM Cortex with VFPv3 FPU but is tuned for the Cortex-A9. If you want to optimise for the Cortex-A7, etc. then you should specify this to the compiler. See http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html. 

For the Cortex-A7, some example flags are below. Also note that you will have to do some extra reading on how to actually set these flags. The toolchain set up in this post should be adequate to get you going with no further tweaking...

-mcpu=cortex-a7 –mtune=cortex-a7 –mfpu=neon-vfpv4

Monday, 28 October 2013

High-performance Signal and Data Processing Workshop

We are very happy to announce the workshop High-performance Signal and Data Processing 2014. The workshop aims to bring together key people to discuss grand challenges facing the signal processing community in Radio Astronomy, Gamma Ray Astronomy and Particle Physics, and to provide knowledge sharing to delegates in both a lecture format, as well as a classroom environment for hands-on hardware and software training. General overviews and in-depth presentations will be given.

The event will take place at the Wits Professional Development HUB at the University of the Witwatersrand, the week of January 27th to January 31st 2014. We are honored to have the following keynote speakers at the event:
Dr. Daniel Adams, Chief Director: Emerging Research Areas and Infrastructure at the DST
Prof. Jean Cleymans, Chairman of SA-CERN
Dr. Bernard Fanaroff, SKA Project Director
Dr. Peter Jenni, former Spokesperson of the ATLAS experiment at CERN
Prof. Justin Jonas, MeerKAT Associate Director of Science and Engineering
Prof. Naba Mondal, Spokesperson of the India-based Neutrino Observatory

Keynote addresses will be followed by plenary sessions for in-depth presentations by experts in areas of physics and engineering. Students and young researches are also welcome to deliver presentations. Several hands-on sessions with FPGA-based boards, RHINO and ROACH, will be provided. A session on Grid software is also envisioned.
The registration and abstract submission are now open till December 31st 2013. It is envisioned to publish a book of proceedings. Proceedings will be peer reviewed. The deadline for proceedings submission is February 15th 2014. Students and young researchers are particularly encouraged to submit abstracts and proceedings.

There are funding opportunities for students outside the Johannesburg area. Preference will be given to South African students.

Visit the site at: http://www.hpspsa.com/ to get more information.

The flyer can be found here.


Wednesday, 9 October 2013

Nobel Prize

The Physics department held a celebratory cake and tea at the news of the Nobel Prize being awarded to Peter Higgs and François Englert.

Wits has a large presence at ATLAS, one of the two general purpose detectors, responsible for the discovery of the Higgs boson.

Well done to Bruce, Trevor and Oana together with their postdocs and PG students!