Project Denver

Nvidia Carmel
General information
Launched	2018
Designed by	Nvidia
Max. CPU clock rate	to 2.3 GHz
Cache
L1 cache	192 KiB per core; (128 KiB I-cache with parity, 64 KiB D-cache with ECC)
L2 cache	2 MiB @ 2 cores
L3 cache	(4 MiB @ 8 cores, T194)
Architecture and classification
Technology node	12 nm
Instruction set	ARMv8.2-A
Physical specifications
Cores	2;

Nvidia Denver 1/2
General information
Launched	2014 (Denver); 2016 (Denver 2)
Designed by	Nvidia
Cache
L1 cache	192 KiB per core; (128 KiB I-cache with parity, 64 KiB D-cache with ECC)
L2 cache	2 MiB @ 2 cores
Architecture and classification
Technology node	28 nm (Denver 1) to 16 nm (Denver 2)
Instruction set	ARMv8-A
Physical specifications
Cores	2;

Project Denver is the codename of a central processing unit designed by Nvidia that implements the ARMv8-A 64/32-bit instruction sets using a combination of simple hardware decoder and software-based binary translation (dynamic recompilation) where "Denver's binary translation layer runs in software, at a lower level than the operating system, and stores commonly accessed, already optimized code sequences in a 128 MB cache stored in main memory".^[2] Denver is a very wide in-order superscalar pipeline. Its design makes it suitable for integration with other SIPs cores (e.g. GPU, display controller, DSP, image processor, etc.) into one die constituting a system on a chip (SoC).

Project Denver is targeted at mobile computers, personal computers, servers, as well as supercomputers.^[3] Respective cores have found integration in the Tegra SoC series from Nvidia. Initially Denver cores was designed for the 28 nm process node (Tegra model T132 aka "Tegra K1"). Denver 2 was an improved design that built for the smaller, more efficient 16 nm node. (Tegra model T186 aka "Tegra X2").

In 2018, Nvidia released an improved design (codename: "Carmel", based on ARMv8 (64-bit; variant: ARM-v8.2^[4] with 10-way superscalar, functional safety, dual execution, parity & ECC) got integrated into the Tegra Xavier SoC offering a total of 8 cores (or 4 dual-core pairs).^[5]^{[failed verification]} The Carmel CPU core supports full Advanced SIMD (ARM NEON), VFP (Vector Floating Point), and ARMv8.2-FP16.^[6] First published testings of Carmel cores integrated in the Jetson AGX development kit by third party experts took place in September 2018 and indicated a noticeably increased performance as should expected for this real world physical manifestation compared to predecessors systems, despite all doubts the used quickness of such a test setup in general an in particular implies.^[7] The Carmel design can be found in the Tegra model T194 ("Tegra Xavier") that is designed with a 12 nm structure size.

Overview

Pipelined processor with 7-way superscalar execution pipeline
128 KiB instruction + 64 KiB data L1 cache per core (both 4-way), 2 MiB L2 cache (16-way shared)^[8]
Denver also sets aside 128 MiB of main memory as an interpretation cache, which is inaccessible to the main operating system.
Running at up to 2.5 GHz^[9]
ARM code is translated either by a hardware translator or through software emulation to an instruction set that is internal to Project Denver. ARM instructions can be reordered, removed if they do not contribute to the end result, or otherwise optimized if software emulation is used.^[2]

Chips

A dual-core Denver CPU was paired with a Kepler-based GPU solution to form the Tegra K1; the dual-core 2.3 GHz Denver-based K1 was first used in the HTC Nexus 9 tablet, released November 3, 2014.^[10]^[11] Note, however, that the quad-core Tegra K1, while using the same name, isn't based on Denver.

The Nvidia Tegra X2 has two Denver2 (ARMv8 64bit) cores inside and another four A57 (ARMv8 64bit) cores using a coherent HMP (Heterogeneous Multi-Processor Architecture) approach.^[12] This pairs the units with a Parker-GPU.

The Tegra Xavier is pairing an Nvidia Volta-GPU and several special purpose accelerators with 8 CPU cores with the Carmel design. In this design 4 Carmel ASIC macro blocks (with each having 2 cores) are matched to each other with one more crossbar and 4 MiB of L3 memory.

History

The existence of Project Denver was revealed at the 2011 Consumer Electronics Show.^[13] In a March 4, 2011 Q&A article CEO Jen-Hsun Huang revealed that Project Denver is a five-year 64-bit ARMv8-A architecture CPU development on which hundreds of engineers had already worked for three and half years and which also has 32-bit ARM instruction set (ARMv7) backward compatibility.^[14] Project Denver was started in Stexar Company (Colorado) as an x86-compatible processor using binary translation, similar to projects by Transmeta. Stexar was acquired by Nvidia in 2006.^[15]^[16]^[17]

According to Tom's Hardware, there are engineers from Intel, AMD, HP, Sun and Transmeta on the Denver team, and they have extensive experience designing superscalar CPUs with out-of-order execution, very long instruction words (VLIW) and simultaneous multithreading (SMT).^[18]

According to Charlie Demerjian, the Project Denver CPU may internally translate the ARM instructions to an internal instruction set, using firmware in the CPU.^[19] Also according to Demerjian, Project Denver was originally intended to support both ARM and x86 code using code morphing technology from Transmeta, but was changed to the ARMv8-A 64-bit instruction set because Nvidia could not obtain a license to Intel's patents.^[19]

The first consumer device shipping with Denver CPU cores, Google's Nexus 9, was announced on October 15, 2014. The tablet was manufactured by HTC and features the dual-core Tegra K1 SoC. The Nexus 9 was the first 64-bit Android device available to consumers.^[20]

References

External links

Valich, Theo (September 20, 2012). "NVIDIA Project Boulder Revealed: Tegra's Competitor Hides in GPU Group".
Linley Gwennap (August 18, 2014). "Nvidia's First CPU Is a Winner. Denver Uses Dynamic Translation to Outperform Mobile Rivals". MPR, Linley Group.

[2]

[1]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

Search