Hot Chips 22

by Sebastien Mirolo on Thu, 26 Aug 2010

I attended the Hot Chips conference from August 23rd to August 24th at Stanford University Campus in Palo Alto. The temperature outside was definitely hot, the hottest days on record as a matter of fact. Inside the chips were also running hot with the presentation of gargantuan power consumption numbers. If you are interested in the press announcements, you can look into a good coverage of the IBM Z196 and the AMD Bulldozer. I will cover here the bits and pieces that were not written down in the slides and that I found personally interesting.

Trends and Numbers

Through every talks, the theme was on System-on-Chip with only differences in the level of programmability of the different components. The difference between general purpose cores and dedicated accelerators might just be tight to the historical market of each vendor if you think at the crazy looking "XML accelerator" in the IBM z196. The AES extensions of Intel's Westmeere also picked up my attention with regards to the recent news about Intel and McAffee.

The AMD Bulldozer introduces nested page tables for the benefit of an hypervisor and the AMD Bobcat focuses on minimizing data movements, using queues instead of link lists whenever possible, in order to preserve energy.

ARM introduced virtualization features in its core and increased the memory address space to 64 bits. ARM is not providing I/O virtualization yet but definitely something about ARM and/or the "embed market" is changing.

nVidia presented its new GF100 architecture early Monday morning. The main focus was on tessellation and particularly the use of displacement maps for tessellation. A major achievement for nVidia engineers was to break through the one triangle-per-clock barrier (about 2.5 in the demo). Interestingly also, running FORTRAN code on the GPU seems an important selling point when nVidia wonders outside the 3D graphics market.

During the questions and answers, someone mentioned that they run more than 40 miles of cable in their High-Performance Computer/Center. Else I am OK with an L1 and L2. L3, you are pushing it. An L4? the IBM zSeries is definitely a monster.

Ideas with potential

While Intel was impressive in the presentation of their Tick-Tock logistic and the delivery of their Westmeere product, it is Wei Hu and Yunji Chen of the Institute of Computing Technology, Chinese Academy of Sciences that presented the most daring and ambitious development in processor architecture. The goal of the GS464V is no more than to create a High-Performance Low-Power XPU. I cannot recall what the X stands for but it is means no less than the integration of a GPU, CPU and DSP into a single core. Three key ideas in the design of that XPU are the direct links from the Vector Unit to the memory in addition to the L1 and L2 paths, the memory controller swapped out for a reconfigurable Memory Access Coprocessor and the fusion of computation and shuffling into one instruction. Numbers quoted during the talk were 100 frames per second for 1080p HD H264 decode in a single 1Ghz core. The GS464V seems to still be in the development stages at this point but if, as presented, CPU and OS is number one of the sixteen major projects (i.e. 5 to 10 billions of funding) in China, there is little doubt it will see the light of day somehow.

Mindspeed made an interesting point about cell towers. Which carrier wants to drive a truck to install brand new towers every time a new standard emerges (2G, 3G, 4G, etc.) ? To be competitive, carriers focus on the cost-per-bit and it pushes towards building cell towers out of programmable SoCs built around fine-grained VLIWs such as the Mindspeed Application Processor.

Raminda Madurawe from Tier Logic was definitely one of the most eloquent speaker while he presented the underlying ideas behind Tier Logic's FPGAs. For someone like me that do not know much about TFT transistors or metal layers, everything seemed to make a lot of sense. Apparently it was not enough though, too bad.

Though a technical presentation on FPGA acceleration, the talk on searching for gas and oil was for me a far better lesson in business opportunities. Until alternatives energy sources can massively be used, there is an increased pressure to search for deeper and smaller pockets of gas and oil. More data are thus collected by sonar boats methodically sailing the ocean, more complex computations are thus run in data centers inland, in the hope of finding those last exploitable fields. Some analysis run today already take a week to complete and more refined FWT elastic computations can be two to three orders of magnitude more intensive. The magic for a company like Maxeler Technologies is that there are about three kernels (Finite Difference, FFT and Sparse Matrices) and two measures (time and price) people searching for oil and gas care about.

If you haven't heard about Google Goggles yet, you are missing on some really cool development in search technologies. The promise behind Goggles is to take a picture of something with your cell phone and use that picture as the query. The presentation was very entertaining and the demo impressive. From a technology point of view, it was one of the only applications presented that was computation bounded, in part because of the seemingly real-time requirement for the responses and the arithmetic complexity of the underlying OCR and sift-derived algorithms.

For fun

Sometimes it is not about technology. It was interesting to see the presentation about the integrated GPU/CPU for the new Xbox 360. Engineers converted the GPU Verilog to VHDL, ran equivalence tools, formal verification etc. The rationale? They were just more familiar with VHDL.

by Sebastien Mirolo on Thu, 26 Aug 2010