Intel header 2

Intel’s next-gen Alder Lake-S “Gracemont” core CPU architecture to support AVX/AVX2, AVX-VNNI instruction sets


Intel’s next generation of Alder Lake-S Core processors are said to be the company’s first 10nm-based mainstream desktop processors, and we have already seen documents, datasheets, and release notes for these CPUs which also appeared on Intel’s own developer website. These processors have been confirmed to support a new LGA 1700 socket as well.

Back in July, Intel confirmed that its next-gen 10nm Alder Lake-S CPU lineup will launch in the second half of 2021. Intel has also confirmed that the Alder Lake-S CPUs will use a new Hybrid Core/Atom processor architecture. These CPUs are expected to support the DDR5 memory and PCIe 5.0 interface, though there was chatter on the web before that these might also launch with support for PCIe gen-4.0.

As the name implies, the LGA1700 socket is comprised of 1,700 pins, which gives us 500 more pins that the existing LGA1200 socket that houses Comet Lake-S, and the upcoming Rocket Lake-S series of processors.

Alder Lake-S CPU lineup will be the first Intel architecture to offer ARM’s big.LITTLE approach to desktop processors. Alder Lake-S would feature an 8+8 core configuration, in which half the cores are going to be Big Cores and the rest of the remaining would be Small Cores. These processors would thus feature a total of 16 cores in a single package.

According to one report these architectures are Golden Cove (Willow Cove successor) and Gracemont (Tremont successor), respectively. Willow Cove is expected to appear in the upcoming Rocket Lake-S series of processors.

Intel recently updated its ‘Architecture Instruction Set Extensions and Future Features Programming Reference’ official company document, revealing some more details about its upcoming Alder Lake processor lineup. This reference PDF file is generally used by Software Developers, and technical analysts. As you already know by now Intel’s next-gen Alder Lake-S CPU lineup is going to feature both high-performance Golden Cove cores as well as low-power Gracemont cores, in a Hybrid architecture design. This article focuses on the functionality and instruction set support of the Gracemont cores.

The small low-power cores have mostly been constrained by size limitation, and power consumption figures and they didn’t have support for the instructions required for various high-performance computing tasks, and also for media encoding/decoding workloads, including other similar heavy tasks. But this is now going to change with the Alder Lake-S CUP lineup, with the upcoming Gracemont cores having support for AVX, AVX2, and AVX-VNNI instruction sets, respectively.

According to Intel’s document, all cores used in Alder Lake architecture will support AVX, AVX-VNNI, AVX2, and UMONITOR/UMWAIT/TPAUSE instructions; however, they won’t have support for the Intel AVX-512 instruction set. Having AVX-512 might have helped here since the lack of support means that the CPU will not be able to offload workloads from its Golden Cove cores to Gracemont cores. In case you didn’t know, UMONITOR, UMWAIT, and TPAUSE are a set of user wait instructions.

UMONITOR arms address monitoring hardware using an address. A store to an address within the specified address range triggers the monitoring hardware to wake up the processor waiting in umwait.

UMWAIT on the other hand instructs the processor to enter an implementation-dependent optimized state while monitoring a range of addresses. The optimized state may be either a light-weight power/performance optimized state or an improved power/performance optimized state. TPAUSE instructs the processor to enter an implementation-dependent optimized state c0.1 or c0.2 state and wake up when time-stamp counter reaches specified timeout.

There are two such optimized states to choose from: light-weight power/performance optimized state, and improved power/performance optimized state. The selection between the two is governed by the explicit input register bit[0] source operand.

But nonetheless, Intel’s AVX-512 has limited use in Client user applications, and they may also require more power. Though, the Gracemont micro-architecture has support for 256-bit AVX instructions. Usually, the addition of AVX-512 requires additional hardware and increase of interconnect bandwidth, which raises the power consumption and increases the die size as well.

So by adding support for AVX in Gracemont will allow these low-power cores to execute high performance workloads, and other demanding applications, without requiring the need to use the high performance Golden Cove cores for these tasks. Assuming both the cores in Alder Lake can also work simultaneously, the low power cores might also assist the high-performance Golden Cove cores if need be, to squeeze every ounce of performance from DL workloads.

We don’t have full details on Alder Lake’s architecture yet. In addition to existing AVX and AVX2 instructions, Intel’s Alder Lake and the HPC Sapphire Rapids CPUs will also have support for AVX-VNNI (vector neural network instruction) which is used for deep learning.

Intel Alder Lake AVX-AVX2

Back in July, a new GNU compiler update was posted by Phoronix, which included a list of compatible instructions for both Intel’s upcoming data center/HPC Sapphire Rapids chips and the Alder Lake desktop chips.

Alder Lake CPU lineup was noticeably missing full support for the AVX-512, a SIMD instruction recently introduced by Intel for its desktop chips. These instruction sets are also disabled in Intel’s latest hybrid Lakefield chips. This is done to keep the instruction set more consistent between cores (small Atom Tremont cores lack support for AVX instructions though).

This also eases the operating system scheduling routines that target different workloads at the corresponding cores. Therefore, the lack of AVX-512 support for Alder Lake-S serves as further evidence that Intel will bring a new hybrid core architecture design to desktop PCs.

However, it should be noted that while the lack of AVX-512 support hints at a hybrid design, the Alder Lake ‘support matrix’ has a listing of standard AVX instructions that are not supported by the current generation Atom Tremont cores. So this means that the next generation of Atom Gracemont cores, which are largely considered ‘small’ cores used in Alder Lake, could bring AVX support, albeit in a limited manner.

The Sapphire Rapids chips on the other hand support many AVX instructions, like for example AVX512F, CLWB, AVX512VL, AVX512BW, AVX512DQ, AVX512CD, AVX512VNNI, and the new AVX512BF16 that enables support for bfloat 16, which is a compact numerical format having similar performance to FP32, but with only half the bits.

Also, according to one report posted by Zhihu, the new hybrid Technology in the upcoming Alder Lake architecture would allow both CPU cores (small and big) to share the same instruction set and registers, but the actual availability of certain instructions would depend on which core is actually enabled and active.

The user also shared a screenshot from Intel’s internal document which suggests that some of the following AVX, TSX-NI, and the FP16 instruction sets will be disabled when the Hybrid Technology is enabled (both Big and Small cores are enabled at the same time).

The instructions will only work when the hybrid technology is disabled, which means the Small cores are disabled. By disabled we can think of the processor’s mode/PL state, though I need more technical info to confirm how this will work on Desktop PCs. Both the big and small cores will have the same amount of instruction sets and model-specific registers, but will have different computational powers, and the overhead will also differ.

We can assume the big cores to have higher throughput and clock speeds, whereas the small cores are going to be more power efficient. The ARM’s Big/Small core architecture makes more sense for mobile devices, to conserve power, but we still don’t know how this Hybrid technology is going to actually work on Desktops.

The Alder Lake-S CPU architecture will also feature the CLDEMOTE ‘cache line demote instruction’ set, which I’ve explained in details in my previous article. Though, to reiterate, the CLDEMOTE instruction hints to hardware that the cache line that contains the linear address should be moved (demoted) from the cache(s) closest to the processor core to a level more distant from the processor core.

CLDEMOTE instructions are used so that the OS can tell the processor core which specific line in the cache is no longer needed, and its contents can be moved elsewhere, however, not directly into the main operating memory, but still into the processor cache, at higher levels (from L1 to L2 and from L2 to L3).

Intel has not announced any exact launch date for Alder Lake-S processors yet, but since the Alder Lake-S is the successor to Tiger Lake, so the processors will debut as the 12th Generation Core lineup next year, second half of 2021.

Intel CPU roadmap

Stay tuned for more!