domingo, 19 de maio de 2013

NVIDIA Tegra 4 vs Exynos 5 Octa vs Snapdragon 800: The Next Generation

For the past few years, mobile devices have been continuously more powerful, yet more mobile. The last generation of high-end SoCs, specifically, NVIDIA's Tegra 3, Qualcomm's Snapdragon S4, and Samsung's Exynos 4 Quad, among others, have established a new standard for mobile performance by bringing quad-core CPUs into mobile devices, as well as taking mobile gaming to a whole new level, yet without sacrifying the power consumption of these SoCs, using solutions like the 4-PLUS-1 architecture inside the Tegra 3 (four performance cores, with one low power core to handle light workloads, draining much less power). After that evolution, these main competitors in the mobile SoC space needed decent successors that continued the trend of more performance with less power consumption, and the forthcoming generation doesn't dissapoint. The NVIDIA Tegra 4, Samsung's Exynos 5 Octa, and Qualcomm's Snapdragon 800 are each very impressive, each of which have one especially breathtaking feature.

All of these SoCs have one thing in common: their die process. Qualcomm has been using its 28nm process since last year, while NVIDIA is making a gigantic stride from its aged 40nm process to 28nm, while Samsung is upgrading the Exynos line from its own 32nm HKMG process to 28nm HKMG with the Exynos 5 Octa.

CPU

ARM's Cortex A15 CPU has already shown very impressive performance, only with two cores, in the Exynos 5 Dual SoC released (so far only) in the Nexus 10 tablet. With only two cores, the Cortex A15 topped almost all benchmarks, performing even better than every quad-core mobile CPU. In this generation of SoCs, we will be seeing double the already champion performance of the Exynos 5 Dual, since Cortex A15s in quad-core configurations will be common.

The NVIDIA Tegra 4 maintains the 4-PLUS-1 architecture used in the Tegra 3. The main, performance CPU cores will be four Cortex A15 core clocked at an impressive 1.9GHz, which is honestly more than a mobile device will ever need. Since so much performance would result in fast battery drain, NVIDIA included a fifth Cortex A15 core, but this one is designed for low power, in other words, to handle light workloads such as playing video and music, for example, and power gating the four power hungry Cortex A15 cores, hence dramatically increasing power efficiency. The fifth A15 core can run at up to 825MHz, which is roughly the performance of a single 1.6GHz Cortex A9 core. The Tegra 4's little brother, the mid-range Tegra 4i, will have a slightly different CPU. While still employing 4-PLUS-1, the 4i will not use A15 cores, but a newer, improved version of the old Cortex A9; the four A9 cores will be able to run at up to 2.3GHz, which should bring the old CPU's performance up to scratch. A fifth companion core will also be used with the Tegra 4i.

Samsung's Exynos 5 Octa even has a name that calls people's attention to the CPU, since the name implies the first SoC to integrate 8 CPU cores. But that doesn't mean that we'll see 8-core performance on a mobile device. The Octa is actually the first SoC to use ARM's big.LITTLE setup. A similar concept to NVIDIA's 4-PLUS-1, the Octa makes use of four Cortex A15 cores with a maximum clock speed of 1.8GHz, for heavy workloads, and a much smaller quad-core Cortex A7 cluster, running at up to 1.2GHz, to handle the lighter of workloads. A Cortex A7 CPU is similarly performing to a Cortex A9, only with better power efficiency. So the low power cores can actually deliver pretty high performance at much less power consumption, than roughly a quad-core Cortex A9 @ 1.2GHz (Tegra 3 T30L, for example). It is fair to assume that the low-power Cortex A7s will be in usage much more often than the companion core in the Tegra 4, which might give Samsung the advantage in terms of power consumption.

Qualcomm is known for not licensing complete ARM CPUs, but instead they license the architecture and modify it to make a custom CPU, which they've been calling Krait since the Snapdragon S4. Krait CPUs tend to fare pretty well in terms of power consumption as well. For the still quite distant Snapdragon 800, Qualcomm will employ a new custom CPU, the Krait 400. The Snapdragon 800 will employ four Krait 400 cores, which will be able to run at a stunning max frequency of 2.3GHz. Krait cores have a special characteristic about them, that they can have different cores running at different clock speeds, based on the workload, which can yield some good results in terms of energy efficiency.

The GPU + Memory

Gaming has become a very important aspect in mobile technology. As such, SoC manufacturers, especially Apple, invest a lot in integrating excellent GPUs in their SoCs, and the newest offerings from Samsung, NVIDIA and Qualcomm are no different. NVIDIA, ironically, has never really had industry leading mobile GPUs in its Tegra series, but the same can't be said about the GPU in the Tegra 4. Designed by NVIDIA, of course, the new ULP GeForce is an enormous improvement over the last generation, as the Tegra 4 has a total of 72 graphics "cores", up from 12 in the Tegra 3. Only that is already a 6x improvement. However, Tegra 4 remains the only modern SoC whose GPU uses discrete pixel and vertex shaders, as opposed to a unified architecture. The 72 cores in the Tegra 4 are divided into four pixel units, each of which contain 12 ALUs (that's what NVIDIA refers to as a "core"), as well as six vertex units, each containing 4 ALUs. The max clock speed for the GPU is an impressive 672MHz. At this clock rate, the Tegra 4 has 96.8 GFLOPS processing power; an indrustry leading result, finally. In fact, recent benchmarks show the Tegra 4 to slightly outperform the mighty iPad 4 for the first time. The Tegra 4i uses the same GPU architecture as the Tegra 4, but is watered down, with 60 total cores. The 4i has the same 48 pixel shader ALUs as the Tegra 4, although instead of spreading these 48 ALUs over four pixel units, the Tegra 4i has two larger pixels units, each containing 24 pixel ALUs, while the amount of vertex ALUs gets cut by half in the 4i, with only three vertex units with 4 vertex ALUs each. At a slightly lower clock rate of 660MHz, the Tegra 4i's GPU has a max theoretical performance of 79 GFLOPS, which is still very impressive. While the GPUs' performance is impressive, it's features aren't as impressive, as Tegra 4 is one of the few modern GPUs that don't offer full OpenGL ES 3.0 compatibility (a few features are available, however).

Given that we're now on a time where resolution is being a big priority for mobile devices, these next-gen SoCs just had to offer brutal memory bandwidth. This has always been a concern in the Tegra SoCs, but NVIDIA has finally decided to eliminate that issue. The Tegra 4 chip, destined at high-end tablets, which by now are expected to have a resolution of at least 1920 x 1200, will probably handle high resolutions easily, since it's memory interface is an impressive, yet not innovating dual-channel DDR3L-1866. This should in theory deliver enough fill rate for playing even the most complex games on a very high resolution tablet (2560 x 1600), especially considering NVIDIA's claims that the GPU is optimized for using memory bandwidth more efficiently. As the Tegra 4i is destined to be inside smartphones, it will generally have to handle lower resolutions, so NVIDIA opted for a less powerful, less power hungry memory interface, perhaps a bit too much though, as the Tegra 4i has only a single-channel LPDDR3 memory. In a Package-on-package (PoP) configuration, where the memory is soldered onto the SoC, the Tegra 4i will support up to LPDDR3-1600, which is just as much as the last-gen Tegra 3. If the memory is embedded with the SoC (discrete), however, the 4i will support up to single-channel LPDDR3-1866, half of what the Tegra 4 can do, which isn't really enough, especially considering 2013 will be 1080p smartphone year.

The Exynos 5 Octa represents a big departure from its usual GPU tradition. The Exynos line has always used  ARM's Mali GPUs, but the Exynos 5 moves to a PowerVR GPU, in this case, the PowerVR SGX544MP3, practically identical to the GPU used in the iPad 3. ImgTech's GPU line has a very mature unified shader architecture, and it doesn't at all support OpenGL ES 3.0. There are 12 unified shader units in this GPU (four per module, as this is the MP3, hence, three modules), each of which contain four ALUs. But the difference between the iPad 3's SoC and the Exynos Octa is that the Octa's SGX544MP3 is clocked much higher, at 533MHz. That equates to 51.1 GFLOPS peak performance, which is almost exactly in between the iPad 3's 32 GFLOPs and the iPad 4's 71.6 GFLOPS. Looking at these figures, it seems that the Exynos 5 Octa won't be keeping up with the competition at all. Its performance, at least in paper, pales compared to NVIDIA's competitive offering. The memory interface of the Octa isn't yet known, but it will probably be somewhat alike, if not better than, its little brother, the Exynos 5 Dual, which has a dual-channel DDR3-1600 memory interface, which should actually be good even for 2560x1600 tablets.

Unfortunately, Qualcomm never discloses any information about its Adreno GPUs, so all we know is that the Snapdragon 800 will debut the Adreno 330 GPU, which should provide the same feature set as the current Adreno 320, but with higher performance capabilities. Considering that currently the Adreno 320 is already one of the most powerful GPUs available, a lot is expected from the 330. Aside from performance, the 330 will be an OpenGL ES 3.0 capable graphics processor, and that's pretty much all that is known. But the bottom line is, when the Snapdragon 800 debuts, it will probably boast the best mobile GPU, with the likes of even surpassing the Tegra 4's beefy GPU. Much like the Exynos 5 Dual, and probably the Octa too, the Snapdragon 800's memory interface will be of two DDR3-1600 channels.

Conclusion

All of these SoCs have many similarities, well, except for the Tegra 4i, which is clearly destined to be a mid-range SoC, and cannot compete with any of the other top-end SoCs. These are similar in the way that they all use 28nm process, they're all quad-cores of brand new CPU architectures, and all of them have some method of increasing power consumption. In the case of NVIDIA and Samsung, the solution is to increase power efficiency by adding in low power cores to handle simple workloads, while Qualcomm does chooses to have their cores with asynchronous clock rates. Their key difference comes in terms of their GPUs. While Samsung, oddly, chooses to use a GPU that isn't very competitive, NVIDIA and Qualcomm are taking gaming performance very seriously and improving their GPU offerings. In terms of memory, in fact, all of these SoCs have nearly identical memory interfaces, so the key difference will be how efficiently the CPU and GPU use the memory bandwidth. But ultimately it all comes down to when these SoCs are being released. Technically, the Exynos 5 Octa already debuted in the GT-I9500 variant of the Galaxy S4, but most S4s are the I9505 variants with a Snapdragon 600 SoC, so there are few devices shipping with the Octa due to low availability of the SoC. The Tegra 4 is supposed to break cover in May/June with the NVIDIA SHIELD gaming portable, and the Snapdragon 800, as well as the Tegra 4i, are supposed to debut later this year.