August 4, 2020

3001 words 15 mins read

Deep as chips: the new microprocessors powering AI

Deep as chips: the new microprocessors powering AI

The growing use of artificial intelligence (AI) is upscaling many standard types of IT workload as well as powering new services driven by advanced algorithmic data processing and machine learning techniques. Developers of AI systems, however, have been somewhat constrained by the limitations of standard microprocessors. Pioneering AI applications needed supercomputer-class compute resources in or

der to produce the desired outputs. The downside was that these were often run on massed banks of multi-purpose central processing units (CPUs) working in parallel, but not optimised to the specific processing functionality that AI needs to perform optimally. More recently, graphics processing units (GPUs) have become established as de facto AI co-processor accelerators. Unlike conventional CPUs with four-to-eight complicated cores designed to tackle computational calculations in sequential order (even when they have multiple cores to offload work to), GPUs have many more simple cores – hundreds, even thousands – with dedicated VRAM memory, so are adept at handling statistical computation (i.e. floating point arithmetic) and the massively parallel processing needed for progressive machine learning applications. These attributes have proved highly providential for GPU vendors, most notably Nvidia, which has leveraged demand for AI-optimised GPUs to establish market leadership. The company continues to develop additional connective capability for the kind of dataflow that benefits AI workloads. GPUs are also manufactured at volume, which helps make them more affordable. However, serviceable as they may be, GPU design did not start with AI’s purposes in mind. What AI systems developers looked for were new processors engineered and optimised specifically for AI jobs, many-​cored processors with built-in parallelism, able to perform intelligent analysis of big datasets in real-time, all on a highly localised architecture that is closely networked with co-located processors so that data can be transferred between them with near-zero latency, which would help keep energy consumption down. AI-specific chips are sometimes categorised as ‘AI accelerators’. AI chips produced to date do not conform to an industry standard, but are usually manycore-based designs, and focus generally on low-precision arithmetic, novel dataflow architectures, or in-memory computing capability. There is no standard definition of an AI chip. A broader view, attributed to Beijing Innovation Centre for Future Chips (ICFC), is that any build of microprocessor used for AI applications can be called an AI chip. Furthermore, the ICFC notes, some chips based on traditional computing architectures combined with hardware and software acceleration schemes have worked well with AI applications. However, many unique designs and methods for AI applications have emerged that cover all levels of microprocessor build, from materials and devices, to circuits and architectures. Nvidia sells a lot of its Tesla-class GPUs to cloud services providers (CSPs). The big four CSPs – Amazon Web Services (AWS), Google Cloud Platform (GCP), Microsoft Azure and Alibaba Cloud – deployed Nvidia GPUs in 97.4 per cent of infrastructure-as-a-service offerings sold with dedicated accelerators, according to Lift Insights. However, the AI-driven escalation in demand for GPUs has alerted a mixed spectrum of technology providers to the potential for compute engines inspired by GPU concepts, but engineered for AI from the outset. Some contenders are aimed at servicing AI in cloud, others at AI in devices at the ‘edge’ of the hardware stack. These providers foresee an opportunity to win some of the $91.18bn (£70.7bn) that Allied Market Research thinks AI chip sales will be worth by 2025, as a result of a stonking 45.2 per cent compound annual growth rate. McKinsey, meanwhile, proffers a more conservative valuation, and reckons that by 2025 AI-related processors could account for almost 20 per cent of all microprocessor demand, worth about $67bn (£52bn). Even sliced thin, that’s a pretty fruity cake, with no shortage of players hungry for a slice. McKinsey also forecasts that GPUs will lose market share to AI-specific processors, by about 50 per cent by the middle of the 2020s. “Opportunities [for AI chips] will emerge at both data centres and the edge,” McKinsey adds. “If this growth materialises, semiconductor vendors will be positioned to capture more value from the AI technology stack than they have obtained with previous innovations – about 40-50 per cent of the total.” AI creates an unprecedented opportunity for chip vendors due to “its applicability across virtually every industry vertical, the strong forecast for the sheer number of chips needed both in the cloud and at the ‘edge’, and the growing need for specialised computing requirements to accelerate new algorithms”, explains PwC’s ‘Global Semiconductor Market’ report. The consultancy predicts that AI will be the catalyst for another growth cycle for the semiconductor sector: “The need for instant computing, connectivity and sensing, will drive massive demand for AI-tailored processors for the next decade.” Last year (2019) saw a succession of new AI chips from a variety of high-tech providers that ranged from incumbent microprocessor vendors (like AMD, Intel, IBM, Qualcomm) to CSPs (AWS, GPC, Alibaba)  and investor-funded start-ups (like BrainChip, Cerebras, Graphcore, Groq and Gyrfalcon Technology). This upsurge has been described by some commentators as a new ‘chips arms race’, as multiple players jockey for a stake in the emergent market. One of the few defining characteristics of the nascent AI chip sector at this stage is that barriers to entry seem low, provided an arriviste has sufficient investor confidence. According to Scott Runner, VP of technology at Altran, the level of investment and RD being committed to AI chip development presents a rare opportunity for the start-ups to thrive in plain sight of entrenched market leaders. “Some of the AI application needs are too niche for a large microprocessor player to target, or else require such domain-specific knowledge that a start-up company can clearly focus and differentiate,” Runner says. “AI is ideal – start-ups don’t have to spread themselves thin, can solve one application vertically or implement one architecture horizontally.” “People have been using existing CPUs and GPUs to get more arithmetic compute to move the agenda forward, but what is needed is a completely new type of processor to support a fundamentally different workload,” says Nigel Toon, co-founder and CEO of AI chip company Graphcore. “[What we really need is] many processors running one task, rather than one processor running many tasks.”

‘Machine Intelligence workloads are something completely new. For that reason a new kind of processor architecture is required.’
    Nigel Toon, co-founder, Graphcore

For now, the AI chip sector is “not one of those races that the winner wins on Moore’s Law by transistor scaling where each new process node is much more expensive than the past”, Runner adds. “Architectures implemented in fairly standard, ‘affordable’ fabrication processes can deliver remarkable results, dependent on the architecture and application.” Runner’s reference to Moore’s Law is apposite. The limits of physics dictate that its famously stated axiom that the number of transistors per integrated circuit would double about every two years will not apply indefinitely. As chip physics are shrunk down to the scale of 1nm (equivalent to around 10 atoms), it becomes tricky to regulate the flow of electrons that constitute the 0s and 1s that a microprocessor stores or processes. Even in the presence of a potential barrier, an electron flow continues due to quantum tunnelling, the quantum-mechanical phenomenon where a subatomic particle passes – ‘leaks’ – through a barrier, which renders some conventional processor architectures less efficient. This is one reason why chip architectures are being rethought by vendors like Intel and Graphcore along neuromorphic lines – architectures inspired by the interconnected structure of the biological brain. The neuromorphic chip can realise the interconnection between arbitrary neurons. That is, under a given-scale biomimetic neural network, any neuron can transmit data to any other neuron.  

                             Image credit: E&T


                             Image credit: E&T

To leverage the complex interconnection, the neuromorphic chip is designed in a hierarchical way. It includes array cores, which have crossbars; an on-chip network; and a high-interconnection I/O link. The way data is trafficked within the chip bears similarities with packetised datacoms networks. Data to be transmitted needs to carry the target address information, and packets are transmitted on the shared link. Neuron-level transmissions have the arbitrariness of addresses, so each neuron destination address is different. “For 70 years we have told computers what to do, step by step, in a software program. We are now moving from programmed algorithmic to machine intelligence systems that can learn,” says Toon at Graphcore. “These machine learning workloads are completely new: structures that have many separate parameters and many compute tasks operating on those parameters. That creates massive parallelism – which means you need highly parallel processors that can talk to each other and share problems. That’s something that’s not supported by today’s CPUs and GPUs. A new kind of processor architecture is required.” The differentiation between a data centre and an ‘edge’ device is another compelling dynamic of the AI chip sector, as some chip vendors see potential for AI processing to occur on an endpoint system itself – a smartphone, sensor unit or remotely located facial-recognition platform, say – rather than have to wait for outputs to be shunted to and from a cloud.

                             Image credit: Graphcore

Despite – or because of – the opportunities AI chips promise, contestants in this market face formidable challenges as they endeavour to bring their solutions to market. They must consolidate technological credentials, win over AI solutions developers, and persuade AI practitioners that they should run their applications and workloads on platforms powered by their particular AI chipsets. So far, contrary to general trends of the CPU market, AI chips have been largely predicated along proprietorial lines: vendors have created chips engineered to their own specific design with less concern for direct functional compatibility with rival products. Each product launch has included claims about achievable performance, some gauged in terms of the IPS (inferences per second) performance metric rather than FLOPS (floating-point operations per second). Like-for-like comparisons carry less traction where solutions claim to optimise a specific AI workload, states Andrew McCullough, AI technology expert at PA Consulting. “The most important benchmark for AI chip performance depends on the application,” adds McCullough. “Overall, speed tends to be the critical quality but for some edge devices power efficiency is just as important. Mobile devices fall into this category when AI processing has to be implemented on the edge device itself, rather than away in the cloud.” Established chipmakers might also find it too taxing to make a full-blown break with the past, McCullough adds. “They tend to have intellectual property back catalogues wedded to a particular programming paradigm. There comes a point where starting from scratch can produce a better solution due to a step-change in technology.” This market re-assignment has resulted in unusual industry moves and shakes. Some of the innovative AI chip propositions have come from vendors with short track-records in the microprocessor sector – start-ups like BrainChip, Cerebras, Graphcore, and Gyrfalcon Technology – yet with the credibility to attract funding from proven technology players who are convinced by what they are buying into. Graphcore, for instance, closed an additional $200m (£155m) funding round in 2019, which brings the total capital raised by the UK-based company to more than $300m (£233m); investors include Microsoft and BMW; the company has been valued at $1.7bn (£1.3bn). At the same time AI chip solutions have been announced by large technology providers who also have negligible history as microprocessor specialists. Indeed, the polymathic Google announced that it would not only install its self-minted Tensor Processing Unit (TPU) AI chips in its own GCP data centres, it will also produce a version designed to perform AI tasks on ‘edge’ devices. Other chipmakers investigating potentials for ‘edge’ AI chips include Apple, ARM and Synopsys. The division between chips designed to operate within data centre infrastructures and at the ‘edge’ of the hardware stack informs another aspect of the market differentiation that has started to shape the AI chip sector. For example, as with GCP, Amazon’s Inferentia chip (December 2019) has also been primarily installed in its AWS data centres to provide AI-enabled services for operational use cases and virtual environments for AI developers. Inferentia is not a direct competitor to big AI chip incumbents or the start-ups because Amazon will not be selling the chips to other parties. It does, however, deprive Intel and Nvidia of a major customer for AI-purposed chips. AWS expects to sell services that run on the chips to its cloud customers starting this year.

  Neuromorphic models
  Moving from central processing to intelligence processing

  Graphcore’s Intelligence Processing Unit (IPU) is an example of a basic neuromorphic AI chip architecture. It was designed specifically for machine learning workloads, and so differs significantly from CPU and GPU architectures.

The design aim of the IPU is the efficient execution of ‘fine-grained’ operations across a relatively large number of parallel threads. It offers true multiple instruction-multiple data parallelism and has distributed, local memory as its only form of memory on the device, apart from the register file. Each IPU has 1,216 processing elements called tiles; a tile consists of one computing core plus 256KiB [kibibytes, 1024 bytes] of local memory. In addition to the tiles, the processor contains the exchange, an on-chip interconnect that allows high-bandwidth, low-latency communication among tiles. Each IPU also contains ten IPU link interfaces. The IPU link is a proprietary interconnect that enables low-latency, high-throughput communication between IPU processors. These links make transfers between remote tiles as simple as between local tiles, so are essential for scalability.

Author: James Hayes

Date: 2020-11-11


First passengers carried in Hyperloop pod (2020-11-09) The Hyperloop is a proposed form of rapid energy-efficient transportation involving a pod travelling through a sealed tube containing a vacuum at near aircraft speeds The concept was popularised and open sourced by industrialist Elon Musk as an efficient alternative to road transport infrastructure In recent years several groups have been working to advance and eventually commercialise the technol..
E&T Innovation Awards 2020: who will the winners be? (2020-11-13) While the black ties and cocktail dresses may remain in the wardrobe for this years ET Innovation Awards the change in circumstances opens up the event to the entire engineering and technology community This year everyone is invited For free All you need to do is register online - its free It is an event that not only rewards industrys finest it also inspires The shortlisted companies and projects.. E&T Innovation Awards 2020: who will the winners be?
2020 Re-Vision: lessons from the long year of a pandemic (2020-11-10) Our 2020 Vision special way back in December 2019 looked at the various predictions over the decades made for this landmark year from steel houses and flying cars to living on Mars and teleportation As we near the end of 2020 many of the 20 predictions we examined remain as science fiction but not all Sometimes crystal balls underestimated progress Its not always easy to spot the truly revolutiona..
Engineering places: Baikonur (2020-11-12) In the middle of the vast Kazakh steppe stands what is simultaneously one of the worlds most famous and least-known engineering sites the Baikonur Cosmodrome and that peculiar history has much to do with its origins By 1954 the Soviet Union needed a new missile test facility that had to be secret secure and offer the vast distances needed totest these weapons a long way from the prying eyes of Ame..
View from Brussels: The EU’s magic money tree starts to bloom (2020-11-26) When not scrambling to try and manage the coronavirus crisis the EU institutions have been busy attempting to drag the massive Brussels machinery onto a greener more sustainable trajectory That is because the EUs executive branch the European Commission has based most of its policies around its flagship Green Deal programme and much of the blocs political legitimacy is deep-rooted in sticking to t..
G20 governments still heavily investing in fossil fuel subsidies (2020-11-10) Monetary support for the sector has dropped by only 9 per cent since 2014-2016 levels hitting $584bn 451bn annually over the last three years a joint study from the International Institute for Sustainable Development IISD the Overseas Development Institute ODI and Oil Change International OCI has found The authors write that the marginal progress that has been made will likely be undone this year ..
Nasa achieves first fully-fledged astronaut trip with private firm (2020-11-16) Nasa astronauts Shannon Walker Victor Glover and Michael Hopkins and Japan Aerospace Exploration Agency astronaut Soichi Noguchi took off at 1227am on Monday UK time The flight which is heading for the International Space Station ISS is expected to take around 275 hours door to door and will be fully automated unless the crew decides to take control It was originally due to take place on Saturday ..
Advanced face mask kills bacteria and viruses with sun exposure (2020-11-12) While face masks made of various cloth materials have been shown to filter nanoscale aerosol particles such as those released by a cough or sneeze live bacteria and viruses on the surface of the mask itself still present a contagion risk to the wearer Now a team has developed a new cotton fabric that releases reactive oxygen species ROS when exposed to daylight killing microbes attached to the fab..
‘Planet hunter’ space telescope gets seal of approval (2020-11-13) The Atmospheric Remote-sensing Infrared Exoplanet Large-survey Ariel will be used to investigate the relationship between exoplanet chemistry and environment collecting data from approximately 1000 known exoplanets Around 4000 planets outside our solar system have been discovered since astronomer Michel Mayor and his colleagues confirmed that 51 Pegasi detected by measuring the wobbling motion of ..
Government announces unit to tackle big tech dominance (2020-11-27) The Digital Markets Unit will be tasked with giving people more choice and control over how their data are used by digital giants like Facebook Amazon and Google as well as ensuring that competing businesses are fairly treated The dedicated unit will be set up within the Competition and Markets Authority CMA It will coordinate its work with regulators including Ofcom which will soon take responsib..