Tuesday, 30 September 2014

HPL Result Comparisons Between Tegra K1 and Other Boards

Hardware Used

Four platforms were used to test various Cortex A Series CPUs. They are described in the table below:

HPL Results

The HPL results shown below are all given in GFLOPS. I included the results at 1GHz for all boards and then at the max frequency. The clock frequencies were as close as possible to 1GHz but the difference was almost negligible. Immediately we notice the Jetson Tegra K1 has approximately 4 GFLOPS more than the Odroid XU+E. This is expected as this is approximately the same ratio as the clock frequency ratio of 2300 to 1600. 
HPL Results for four ARM boards

HPL Efficiency

Similarly to what was done in previous posts I have taken the power measurements of the boards at each frequency and recorded the HPL performance. This gave a nice profile of HPL performance / Watt as a function of clock frequency. The A7 performs fairly poorly but it is a duel core. There are not that many available CPU frequencies on the Wandboard (A9) so we are stuck with just 3 data points but even so we can pretty much see the pattern. The A15-p2 (Odroid, Green) clearly shows the transition between the power saver (Quad A7) and the higher powered A15 which occurs at approximately 600 MHz. The Tegra K1 has a much better power efficiency (Over 2 GFLOPS/Watt at low frequencies). This is impressive but impractical since one would never run these devices at <300 MHz for processing data. What is impressive is that even at 2 GHz the efficiency is still over 1 GFLOPS/Watt.

HPL Squared per Watt

As mentioned in my previous posts the efficiency alone is not that useful. A more interesting feature to look for is the best operating frequency to run these chips to maximise both power consumption and performance simultaneously. This really does give us a nice profile of the boards. It also clearly shows the improvements of the Tegra K1 over the Odroid. What is the main reason is still a little unclear but what we do know is this: The Tegra K1 is a later revision of the Cortex A15. What was changed between the r3p2 and r3p3 revisions is not clear from the ARM website : http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0438g/ch01s08s10.html as all they mention are the register values that are changed and "Various engineering errata fixes". I think a more significant reason is the process type used to manufacture the chips. The Odroid was made with the 28 nm HKMG process and the Tegra K1 was made with the 28HPM process. According to TSMC the 28HPM provides more power will maintaining the same power leakage as the 28LP process.

Thursday, 25 September 2014

Prometeo, the next generation test bench for the tile calorimeter upgrade in ATLAS.

During Sep 9.-Sep12, the expert week of the tile calorimeter upgrade, the latest version of Prometeo GUI was published.  After several months designing and consulting from tile calorimeter users, the design has been fixed and working well.

Prometeo is the test bench for the tile calorimeter upgrade in 2022. It is a portable enclosure which is able to certify the tile calorimeter front-end electronics in the maintenance. Prometeo shares the same design of super read-out driver(sROD) and working at the LHC bunch crossing frequency.

The Prometeo GUI is an interface for experts and users to diagnose the front-end electronics. The interface is web and it calls python scripts in the back-end to control the main board in Prometeo. The whole program is running on CERN web service or a mini linux machine inside Prometeo.

The motivation of using the web interface is:
1) Compilation free
2) Compatible of different platforms, PAD, smartphone, or regular PC
3) Robustness in more than 10 years.
4) Compatible to CERN-ROOT software and easy for interface design.

The web interface is much easier than a regular GUI, like Qt/JavaSwing. It is often seen in some router software and it is used well in some HEP tools like MadGraph. The Prometeo GUI combines the web, python, IPbus hardware control together. It is a good mode for a light weight hardware control software.

Tuesday, 23 September 2014

Peak at the New Electronics Lab

Sneak Peak at the New Lab

We have been in the process of building a new electronics lab at Wits. Yesterday, Oscar Kureba, Matthew Spoor and myself moved in all our equipment from the old lab. There is still a lot of work to be done but its part of the fun!

Friday, 12 September 2014

HDAYS 2014 in Santander came to a close

HDAYS 2014 in Santander came to a close today. It's been a blast, as always. Great people and wonderful location. Excellent overview of the riches of Higgs physics at the LHC. Run II promises to be a very exciting period for Higgs physics that will set the stage for future endeavours. HDAYS 2015 is set for September 14th-18th. Below is the group photo

Characterising the TegraK1 Cortex A15

A Benchmark Characterisation of the Cortex-A15-r3p3

I have run HPL and Coremark from the lowest frequency (204 MHz) to the highest (2.3 GHz). At first I thought the performance per Watt would be interesting but as expected the lower the frequency the lower the power consumption so it only shows that the efficiency is best at 204 MHz. A much more interesting value is the Performance/Watt X Performance. This essentially shows us at which frequency the CPU maximises both the performance and the efficiency simultaneously.

I have done this in the past with the Cubieboard2, Wandboard and Odroid (Cortex-A7-r0p4, Cortex-A9-r2p2, Cortex-A15-r3p2) but I only did it with HPL. I was asked if I had tried this with Coremark. It was simple enough but I wanted to see if I got the same profile shape as HPL. The question arose "how do I compare HPL and Coremark together?". Obviously a direct comparison is not possible as they are fundamentally different benchmarks but we can compare the shapes of the graphs. This can be done by normalising the results so that the area under each graph is equal to 1. The units on the y axis are then expressed in inverse frequency (MHz­­­-1). What we get is shown below.

This is really interesting. This shows that the approach of using the Performance/Watt X Performance does indeed describe the characteristic of the CPU. It also is nice to see that it is similar for both benchmarks. We see for both cases that the optimum frequency is at 1.73 GHz to maximise both Performance and Efficiency. We are in the process of putting together a full set of results into a paper. I will add a link as soon as its finished.

Monday, 8 September 2014

Hands on the New Nvidia TegraK1

Setup to Measure Power Consumption of the TegraK1

I have been benchmarking the Nvidia TegraK1 on our new Jetson development board for the past few days. I am still busy putting together all the results (mostly of the Cortex-A15 r3p3 and not the GPU, just yet) and will post them soon. In the mean time here are some pics of the setup for fun.

Using a small PCB with a 0.01 Ohm resistor (Designed and Built by Mitch) to measure power consumption of the entire board.
If you look closely you can see the results in excel :P (Actually you can't it's showing the Odroid sheet in my excel spreadsheet lol)

Sneak Peak

A quick sneak peak of the power measurements. I used HPL to stress test the boards at different frequencies and measured power consumption. The boards are:
  1. Cortex-A7 = Cubieboard2
  2. Cortex-A9 = Wandboard
  3. Cortex-A15-r3p2 = Odroid XU+E
  4. CortexA15-r3p3 = Jetson TK1

Friday, 5 September 2014

ACAT 2014 Workshop

Today was the conclusion of the 16th international workshop for Advanced Computing and Analysis Techniques (ACAT) in physics research. I am extremely glad to have attended this excellent workshop. It was very well organised and sported a wide variety of topics from new computing hardware and techniques to algorithms and even some physics theory, making a well rounded program. The networking and friend-making opportunities were also great.

My talk was an overview of the research the group at Wits has performed so far, as well as details on my progress and plans for the PCI-Express SoC interconnect. There was also a poster from the University of Cape Town, but unfortunately Josh (the author) was unable to attend the conference. I presented the poster in his stead to much interest from the community.

There were two other talks and a plenary by the Barcelona Supercomputing Center on ARM System on Chips. Several other talks also made passing mention of ARMs potential suitability for scientific computing. All of these presentations were met with positive comments and questions from the audience.

I felt honored several times as the organizers and ACAT founders repeatedly mentioned that they had a South African participant (the first apparently)! Our project also received several mentions in the summaries - enhanced by the South African label.

I look forward to attending the workshop again in 18 months time and I hope next time more South Africans will be fortunate enough to attend. Perhaps with the Square Kilometer Array project surging ahead, we can attract the ACAT workshop to South Africa in a few years?