Mostrando postagens com marcador soc. Mostrar todas as postagens
Mostrando postagens com marcador soc. Mostrar todas as postagens

sábado, 16 de novembro de 2013

Apple A7 vs NVIDIA Tegra 4 vs Snapdragon 800: SoC Wars


Mobile SoC performance has become one of the most competitive aspects in the mobile sector. Since 2010, when the iPad made it clear how important processing power is for mobile devices, performance in mobile devices has had exponential growth, and SoC vendors began to compete more and more. In 2013, the main SoC manufacturers can be narrowed down to Qualcomm, Apple, NVIDIA, and to a lesser extent, Samsung. TI used to be a big player in the SoC market, but this year it practically disappeared from the SoC sector. Now that these companies have their latest silicon shipping in commercially available products, in time for the holiday season, it's time to put their best offerings to the test and see who has the best offering.

Apple A7 NVIDIA Tegra 4 Snapdragon 800
 Process Node   28nm HKMG   28nm HPL  28nm HPM
 Die Size  102mm2  ~80mm2 118.3mm2
 Instruction Set   ARMv8 (64-bit)   ARMv7 (32-bit)   ARMv7 (32-bit)
 CPU  Dual-core Cyclone @ 1.3/1.4GHz   Quad-core Cortex-A15 @ 1.9GHz + Low Power Cortex-A15 @ 825MHz  Quad-core Krait 400 @ 2.3GHz
 GPU  PowerVR G6430 @ 450MHz  72-core ULP GeForce @ 672MHz  Adreno 330 @ max 550MHz
 RAM  32-bit Dual-channel LPDDR3-1600 (12.8GB/s)  32-bit Dual-channel LPDDR3/DDR3L-1866 (14.9GB/s)  32-bit Dual-channel LPDDR3-1866 (14.9GB/s)



The CPU: Dual-core vs Quad-core

Apple's most impressive feat on the mobile performance sector so far is that, in an age of quad-cores with insane clock speeds, Apple has not once shipped a device with more than two CPU cores and with a relatively low clock speed, and has still managed to at least keep up with the latest competition. Let's see how Apple's latest CPU, the dual-core Cyclone with a max clock speed of 1.4GHz, stacks up against NVIDIA's latest offering, the Tegra 4's four Cortex-A15s @ 1.9GHz and the Snapdragon 800's four Krait 400 cores @ 2.3GHz

Architecturally speaking, Apple's CPU is far superior to the Cortex-A15 and the Krait 400. That's because the A7 CPU runs on a brand new 64-bit ARMv8 architecture. The luxury of 64-bit allows the Cyclone CPU to be able to address memory much faster, giving it a tangible performance gain in some cases over traditional 32-bit solutions. Not only that, but Apple has made the Cyclone core much wider than its predecessor, the Swift core. In fact, I think it's the widest mobile CPU so far. The wider architecture plus 64-bit give the Cyclone cores much better single-threaded performance over any of its competitors, and remember that in most use cases single-threaded performance is the most important. Kudos to Apple for competing against monstrous quad-cores with only a dual-core. 

The NVIDIA Tegra 4's CPU uses NVIDIA's Variable Symmetric Multi-Processing architecture, which was introduced with the Tegra 3. Like ARM's big.LITTLE architecture, the Tegra 4 consists of a main CPU cluster, composed of four high-performance Cortex-A15 cores running at a max 1.9GHz, and a shadow A15 core than can go up to 825MHz. When CPU demand is low, the Quad-core A15 cluster is power-gated, and all processing transfers to the shadow A15 core, and it remains like this as long as demand from the CPU is low enough. The advantage of this is, of course, reduced power consumption.

Qualcomm's Snapdragon 800 uses Qualcomm's own modification of the Cortex-A15 core, dubbed Krait 400. Since Qualcomm likes to keep its mouth shut about its CPU architectures, not much is known about the Krait 400. What we know is that the Krait 400 is mostly the Krait 300 core in a 28nm HPm process. However, the move from 28nm LP in the Krait 300 and 28nm HPm in the Krait 400 means that there's been some relayout in the Krait 400. Other differences from Krait 300 include lower memory latency. Apart from that, we only know that, like the Cortex-A15 upon which it's based on, the Krait 400 is a 3-wide machine with OoO (Out-of-Order) processing capabilities. The move to HPm means the Krait 400 can achieve higher clocks than its predecessor, which accounts for the insane 2.3GHz max clock speed. Put that four of those monster cores together and you potentially have the most powerful mobile CPU to date. Unfortunately, it still remains that it also lags behind the Apple A7 in single-threaded performance, which is also very important in mobile OSes. 

Now let's put in some quantitative information to see how these CPUs compare in their actual performance: 

What I said before about single-threaded performance shows here. Apple's Cyclone cores can deliver at least 50% more performance on a single core than any of its competitors. But due to the fact that the A7 has only two cores while all of its main competitors have four of them, in multi-threaded situations the A7 loses its advantage, but can still keep up with all of its competitors. It's very impressive how Apple always manages to match quad-core performance with only two cores. 

The GPU and Memory

Apple has always put more emphasis on the GPU rather than the CPU on its SoCs, and the A7 is no different. Apple continues to license GPUs from Imagination Technologies, like it has been doing since its first iPhone. This time around, Apple is using a PowerVR "Rogue" series GPU, which is based on ImgTech's latest technology and, of course, supports OpenGL ES 3.0. The exact model of the new PowerVR GPU in the A7 is the G6430 variant, which contains four GPU modules with 32 unified shader units on each module. That equates to a total of 128 shader units with at a clock speed of 450MHz. 

Ironically, the NVIDIA Tegra 4's GPU is the least fancy of the current high-end mobile GPUs. Designed by NVIDIA, the GPU in the Tegra 4 is based on the ancient NV40 architecture (the same used in the GeForce 6000 series), hence, its the only modern GPU that uses discrete pixel and vertex shaders. In this case, there are a total of 72 shader units, 48 of which are pixel shaders and the remaining 24 are vertex shaders. The GPU runs on a max clock speed of 672MHz. The biggest limitation of the Tegra 4's GeForce GPU is that it only supports OpenGL ES 2.0. Right now, this isn't really a problem, as game developers haven't yet migrated to OpenGL ES 3.0 for their games, but that practically destroys the future-proofing of the Tegra 4.

Finally, we have the Snapdragon 800 with its Adreno 330 GPU. Like I said before, Qualcomm likes to reveal as little information as possible about its SoCs, and the Adreno line of GPUs are probably the biggest mysteries I'm faced with now. All I can say is that it's a unified shader architecture compatible with the latest OpenGL ES 3.0 API. The Adreno 330, in its highest configuration, runs at 550MHz, but the vast majority of Snapdragon 800 devices have their GPUs clocked at 450MHz. By the way, the benchmark results I'll show later on reflect the Adreno 330's performance at 450MHz, since no devices have released yet with the 550MHz bin of the Adreno 330. 

Snapdragon 800 Apple A7 NVIDIA Tegra 4 NVIDIA Tegra 4i
 GPU Name   Adreno 330  PowerVR G6430   72-core GeForce  72-core GeForce
 Shader Cores
 ?
 4  4 Pixel; 6 Vertex  2 Pixel; 3 Vertex 
 ALUs/Core
 ?
 32  12 Pixel; 4 Vertex  24 Pixel; 4 Vertex
 Total ALUs
 ?
 128  72 (48 Pixel, 24 Vertex)  60 (48 Pixel; 12 Vertex)
 Max Clock Speed  550MHz  450MHz  672MHz  660MHz
 Peak GFLOPS
 ?
 115.2  96.8  79.2


Peak theoretical compute power puts the Tegra 4 behind the A7, but the Tegra 4 is still close enough to the A7 to call it competitive. However, be aware that, while the A7's unified shader architecture allows it to have its peak 115.2 GFLOPS performance available to it in any situation (the same applies to the Adreno 330), the story is quite different with the Tegra 4. The discrete pixel shader architecture means that the GPU's peak 96.8 GFLOPS can only be achieved when the mix of pixel and vertex shader requests matches the ratio between pixel and vertex shader hardware (2:1), so most of the time the GPU achieves less than 96.8 GFLOPS.

There may not be a huge gap in theoretical compute between the A7's and Tegra 4's GPU, but the architectural difference is astounding. You can hardly put a unified shader architecture that supports OpenGL ES 3.0 in the same league as a discrete pixel and vertex shader architecture that is limited to OpenGL ES 2.0. While these differences may not affect real-world performance, the omission of OpenGL ES 3.0 is bad for future-proofing. 

Interestingly, every current high-end SoC uses pretty much the same memory interface. The Tegra 4, Apple A7 and Snapdragon 800 have dual-channel DDR3L solution, except that the Tegra 4 and the Snapdragon 800 allow for a slightly higher clock speed (933MHz) versus the A7 (800MHz), giving the A7 12.8 GB/s peak theoretical memory bandwidth, versus 14.9 GB/s on the Tegra 4 and Snapdragon 800. While the A7 has technically less theoretical memory bandwidth than its competitors, it counteracts this with a very interesting solution. It turns out the A7 has 4 MB of SRAM on-die, acting as a L3 cache, which can be used to unload instructions off the main memory interface and hence increase the bandwidth. You may recall that a similar solution is used in the Xbox One's SoC to increase memory bandwidth. 

Considering the 4MB SRAM on the A7's die, it may turn out that the A7 can deliver significantly more memory bandwidth than the Tegra 4, but still, both have enough memory bandwidth to power ultra high-resolution (>1080p) tablets comfortably. 

The T-Rex HD test shows the Tegra 4 significantly behind the Apple A7 and also puts it as the slowest of the high-end mobile GPUs. The Apple A7, however, is only beaten by the Snapdragon 800, however only by a very small margin. 

The less intensive Egypt HD test also shows the Tegra 4 behind the A7 and other high-end mobile SoCs, but by a smaller margin. The A7 is the second slowest of these SoCs in this test, achieving slightly lower scores than the Mali-T628 in the Exynos 5420 and the Adreno 330 in the Snapdragon 800. Both tests show the Snapdragon 800 as the supreme mobile GPU.
ImgTech GPUs have always had industry leading fill rate capabilities, and it shows in the A7. The PowerVR G6430 GPU has a much higher fill rate than any of its competitors. On the ther end of the spectrum, we have the Tegra 4. Tegra GPUs have a tendency of being substandard in terms of fill rate, and it shows. The Tegra 4 manages a significantly lower fill rate score than every one of its competitors, especially the Apple A7. That's a problem, because the Tegra 4 is currently used to power some of the few tablets which boast 1600p displays, for example, the ASUS Transformer Pad TF701T. On devices with 1080p screens or less however, even the Tegra 4 probably won't run into any bottlenecking due to the limited fill rate. The Snapdragon 800 also doesn't do very well, as it's also outperformed by the Mali-T628 in the Exynos 5420.



Here, the Tegra 4 and the Apple A7 are in the lead, with the Apple A7 pulling ahead slightly.



Adding lighting per vertex for some reason causes the Apple A7 to lag behind all of its competitors, leaving the Tegra 4 on the lead.


When using per pixel lighting, the A7 once again falls behind everyone else, and this time the Tegra 4 also joins it with the second lowest score.

Even though in some cases the Apple A7 lags behind its competition severely, I highly doubt this is going to make performance suffer in any way, since most mobile games aren't very geometry bound. 

The Snapdragon 800, while not at the top spot in most of these tests, shows strong scores across the board, outperforming the whole competition by a significant margin in the fragment lit test. 

Power Consumption

All of the current high-end SoCs should have low enough power consumption, since they all use 28nm silicon. On the CPU side, the A7 enjoys a low core count as well as a low clock speed, so I don't expect the CPU to draw too much power. The Tegra 4, on the other side, has four power-hungry Cortex-A15 cores with a much higher clock speed, however, the shadow A15 core has potential to counteract the extra power consumed when the main A15 cluster is active. The S800 doesn't have any extra low power cores, and relies on the efficiency of the main Krait 400 cores to yield good battery life. But given Qualcomm's record of making CPUs with low idle power, this is definitely not a problem.

One optimization that Qualcomm makes to reduce power consumption is that it can have different clock speeds on each active core. The competitors' architectures only allow them to run every active core at the same clock speed, even if unnecessary. So, for example, if there are two cores active, one of them fully loaded and the other running a much lighter task, the Krait 400 will have the first core on its max clock speed, while the second core could have a much lower clock, while its competing CPUs will run both cores at the max clock speed, even if the second core doesn't really need it. This is one of the many optimizations that make the Krait 400 core very power efficient. 

I can't really tell whether it's the 72-core GeForce GPU, the PowerVR G6430 or the Adreno 330 that consumes less power, but given ImgTech's record of making the most power efficient mobile GPUs, it's not a stretch to assume that the G6430 is the GPU that draws less power. 



Conclusion

While the Tegra 4, the Apple A7 and the Snapdragon 800 have completely different architectures, I'd say that they're pretty close to each other, based on the performance they've showed on synthetic benchmarks. The differences between the CPUs are the most astounding. While Apple focused on keeping core count and clock speed low while driving up single-core performance, NVIDIA's (or rather, ARM's) and Qualcomm's solution offsets the relatively lower single-threaded performance by using more cores at a higher clock speed. While the former is probably better for overall system performance, as mobile OSes tend to rely much more on single-threaded performance, the latter is probably better for multi-tasking. In any case, it's evident that all current high-end SoCs are surprisingly close together when it comes to peak multi-threaded performance.

Comparing the Tegra 4, Apple A7 and the Snapdragon 800 as well as the rest of the high-end competition, it's clear that the only one that is truly distinguished is the A7. The Tegra 4 and the Exynos 5420, for instance, both have four Cortex-A15 cores with a similar clock speeds (1.9GHz vs 1.8GHz, respectively), and they also have a separate CPU cluster for handling light tasks with low power (the Tegra 4 has a single A15 core at its disposal, while the Exynos 5420 uses a quad-core Cortex-A7 cluster for the same purpose). The Snapdragon 800 uses a unique architecture, the Krait 400, in a quad-core configuration and even takes the clock speed beyond the norm with an insane 2.3GHz, but unlike two of its competitors, it doesn't need extra low power cores, but has other solutions to keep idle power consumption low.

In GFXBench's high-level GPU benchmarks, it seems that all four main high-end SoCs are more or less on the same level, with only the Snapdragon 800 slightly pulling head of the A7. In both high-level tests, however, we can see the Tegra 4 lagging behind all of its competition. How ironic.

GFXBench's Low-level tests show a huge difference between the current high-end mobile GPUs, however. In the fill rate department we see the Apple A7 blowing all of its competitors out of the water, and we also see the Tegra 4 on the bottom of the chart and the Snapdragon 800 slightly ahead of the Tegra 4, but still behind the Exynos 5420 and the Apple A7.

The verdict of this comparison is that, while pretty much all of the current flagship SoCs are pretty close in terms of CPU power, the Tegra 4 falters slightly when the GPU is put to the test. The Apple A7 does very well on the GPU side, but it's just slightly outperformed by the Adreno 330 GPU on the Snapdragon 800. But really, they're all so close it's hard to pick one as a definite winner. You could call the Snapdragon 800 the overall inner, but I say it's too close to call.

domingo, 19 de maio de 2013

NVIDIA Tegra 4 vs Exynos 5 Octa vs Snapdragon 800: The Next Generation

For the past few years, mobile devices have been continuously more powerful, yet more mobile. The last generation of high-end SoCs, specifically, NVIDIA's Tegra 3, Qualcomm's Snapdragon S4, and Samsung's Exynos 4 Quad, among others, have established a new standard for mobile performance by bringing quad-core CPUs into mobile devices, as well as taking mobile gaming to a whole new level, yet without sacrifying the power consumption of these SoCs, using solutions like the 4-PLUS-1 architecture inside the Tegra 3 (four performance cores, with one low power core to handle light workloads, draining much less power). After that evolution, these main competitors in the mobile SoC space needed decent successors that continued the trend of more performance with less power consumption, and the forthcoming generation doesn't dissapoint. The NVIDIA Tegra 4, Samsung's Exynos 5 Octa, and Qualcomm's Snapdragon 800 are each very impressive, each of which have one especially breathtaking feature.

All of these SoCs have one thing in common: their die process. Qualcomm has been using its 28nm process since last year, while NVIDIA is making a gigantic stride from its aged 40nm process to 28nm, while Samsung is upgrading the Exynos line from its own 32nm HKMG process to 28nm HKMG with the Exynos 5 Octa.

CPU

ARM's Cortex A15 CPU has already shown very impressive performance, only with two cores, in the Exynos 5 Dual SoC released (so far only) in the Nexus 10 tablet. With only two cores, the Cortex A15 topped almost all benchmarks, performing even better than every quad-core mobile CPU. In this generation of SoCs, we will be seeing double the already champion performance of the Exynos 5 Dual, since Cortex A15s in quad-core configurations will be common.

The NVIDIA Tegra 4 maintains the 4-PLUS-1 architecture used in the Tegra 3. The main, performance CPU cores will be four Cortex A15 core clocked at an impressive 1.9GHz, which is honestly more than a mobile device will ever need. Since so much performance would result in fast battery drain, NVIDIA included a fifth Cortex A15 core, but this one is designed for low power, in other words, to handle light workloads such as playing video and music, for example, and power gating the four power hungry Cortex A15 cores, hence dramatically increasing power efficiency. The fifth A15 core can run at up to 825MHz, which is roughly the performance of a single 1.6GHz Cortex A9 core. The Tegra 4's little brother, the mid-range Tegra 4i, will have a slightly different CPU. While still employing 4-PLUS-1, the 4i will not use A15 cores, but a newer, improved version of the old Cortex A9; the four A9 cores will be able to run at up to 2.3GHz, which should bring the old CPU's performance up to scratch. A fifth companion core will also be used with the Tegra 4i.

Samsung's Exynos 5 Octa even has a name that calls people's attention to the CPU, since the name implies the first SoC to integrate 8 CPU cores. But that doesn't mean that we'll see 8-core performance on a mobile device. The Octa is actually the first SoC to use ARM's big.LITTLE setup. A similar concept to NVIDIA's 4-PLUS-1, the Octa makes use of four Cortex A15 cores with a maximum clock speed of 1.8GHz, for heavy workloads, and a much smaller quad-core Cortex A7 cluster, running at up to 1.2GHz, to handle the lighter of workloads. A Cortex A7 CPU is similarly performing to a Cortex A9, only with better power efficiency. So the low power cores can actually deliver pretty high performance at much less power consumption, than roughly a quad-core Cortex A9 @ 1.2GHz (Tegra 3 T30L, for example). It is fair to assume that the low-power Cortex A7s will be in usage much more often than the companion core in the Tegra 4, which might give Samsung the advantage in terms of power consumption.

Qualcomm is known for not licensing complete ARM CPUs, but instead they license the architecture and modify it to make a custom CPU, which they've been calling Krait since the Snapdragon S4. Krait CPUs tend to fare pretty well in terms of power consumption as well. For the still quite distant Snapdragon 800, Qualcomm will employ a new custom CPU, the Krait 400. The Snapdragon 800 will employ four Krait 400 cores, which will be able to run at a stunning max frequency of 2.3GHz. Krait cores have a special characteristic about them, that they can have different cores running at different clock speeds, based on the workload, which can yield some good results in terms of energy efficiency.

The GPU + Memory

Gaming has become a very important aspect in mobile technology. As such, SoC manufacturers, especially Apple, invest a lot in integrating excellent GPUs in their SoCs, and the newest offerings from Samsung, NVIDIA and Qualcomm are no different. NVIDIA, ironically, has never really had industry leading mobile GPUs in its Tegra series, but the same can't be said about the GPU in the Tegra 4. Designed by NVIDIA, of course, the new ULP GeForce is an enormous improvement over the last generation, as the Tegra 4 has a total of 72 graphics "cores", up from 12 in the Tegra 3. Only that is already a 6x improvement. However, Tegra 4 remains the only modern SoC whose GPU uses discrete pixel and vertex shaders, as opposed to a unified architecture. The 72 cores in the Tegra 4 are divided into four pixel units, each of which contain 12 ALUs (that's what NVIDIA refers to as a "core"), as well as six vertex units, each containing 4 ALUs. The max clock speed for the GPU is an impressive 672MHz. At this clock rate, the Tegra 4 has 96.8 GFLOPS processing power; an indrustry leading result, finally. In fact, recent benchmarks show the Tegra 4 to slightly outperform the mighty iPad 4 for the first time. The Tegra 4i uses the same GPU architecture as the Tegra 4, but is watered down, with 60 total cores. The 4i has the same 48 pixel shader ALUs as the Tegra 4, although instead of spreading these 48 ALUs over four pixel units, the Tegra 4i has two larger pixels units, each containing 24 pixel ALUs, while the amount of vertex ALUs gets cut by half in the 4i, with only three vertex units with 4 vertex ALUs each. At a slightly lower clock rate of 660MHz, the Tegra 4i's GPU has a max theoretical performance of 79 GFLOPS, which is still very impressive. While the GPUs' performance is impressive, it's features aren't as impressive, as Tegra 4 is one of the few modern GPUs that don't offer full OpenGL ES 3.0 compatibility (a few features are available, however).

Given that we're now on a time where resolution is being a big priority for mobile devices, these next-gen SoCs just had to offer brutal memory bandwidth. This has always been a concern in the Tegra SoCs, but NVIDIA has finally decided to eliminate that issue. The Tegra 4 chip, destined at high-end tablets, which by now are expected to have a resolution of at least 1920 x 1200, will probably handle high resolutions easily, since it's memory interface is an impressive, yet not innovating dual-channel DDR3L-1866. This should in theory deliver enough fill rate for playing even the most complex games on a very high resolution tablet (2560 x 1600), especially considering NVIDIA's claims that the GPU is optimized for using memory bandwidth more efficiently. As the Tegra 4i is destined to be inside smartphones, it will generally have to handle lower resolutions, so NVIDIA opted for a less powerful, less power hungry memory interface, perhaps a bit too much though, as the Tegra 4i has only a single-channel LPDDR3 memory. In a Package-on-package (PoP) configuration, where the memory is soldered onto the SoC, the Tegra 4i will support up to LPDDR3-1600, which is just as much as the last-gen Tegra 3. If the memory is embedded with the SoC (discrete), however, the 4i will support up to single-channel LPDDR3-1866, half of what the Tegra 4 can do, which isn't really enough, especially considering 2013 will be 1080p smartphone year.

The Exynos 5 Octa represents a big departure from its usual GPU tradition. The Exynos line has always used  ARM's Mali GPUs, but the Exynos 5 moves to a PowerVR GPU, in this case, the PowerVR SGX544MP3, practically identical to the GPU used in the iPad 3. ImgTech's GPU line has a very mature unified shader architecture, and it doesn't at all support OpenGL ES 3.0. There are 12 unified shader units in this GPU (four per module, as this is the MP3, hence, three modules), each of which contain four ALUs. But the difference between the iPad 3's SoC and the Exynos Octa is that the Octa's SGX544MP3 is clocked much higher, at 533MHz. That equates to 51.1 GFLOPS peak performance, which is almost exactly in between the iPad 3's 32 GFLOPs and the iPad 4's 71.6 GFLOPS. Looking at these figures, it seems that the Exynos 5 Octa won't be keeping up with the competition at all. Its performance, at least in paper, pales compared to NVIDIA's competitive offering. The memory interface of the Octa isn't yet known, but it will probably be somewhat alike, if not better than, its little brother, the Exynos 5 Dual, which has a dual-channel DDR3-1600 memory interface, which should actually be good even for 2560x1600 tablets.

Unfortunately, Qualcomm never discloses any information about its Adreno GPUs, so all we know is that the Snapdragon 800 will debut the Adreno 330 GPU, which should provide the same feature set as the current Adreno 320, but with higher performance capabilities. Considering that currently the Adreno 320 is already one of the most powerful GPUs available, a lot is expected from the 330. Aside from performance, the 330 will be an OpenGL ES 3.0 capable graphics processor, and that's pretty much all that is known. But the bottom line is, when the Snapdragon 800 debuts, it will probably boast the best mobile GPU, with the likes of even surpassing the Tegra 4's beefy GPU. Much like the Exynos 5 Dual, and probably the Octa too, the Snapdragon 800's memory interface will be of two DDR3-1600 channels.

Conclusion

All of these SoCs have many similarities, well, except for the Tegra 4i, which is clearly destined to be a mid-range SoC, and cannot compete with any of the other top-end SoCs. These are similar in the way that they all use 28nm process, they're all quad-cores of brand new CPU architectures, and all of them have some method of increasing power consumption. In the case of NVIDIA and Samsung, the solution is to increase power efficiency by adding in low power cores to handle simple workloads, while Qualcomm does chooses to have their cores with asynchronous clock rates. Their key difference comes in terms of their GPUs. While Samsung, oddly, chooses to use a GPU that isn't very competitive, NVIDIA and Qualcomm are taking gaming performance very seriously and improving their GPU offerings. In terms of memory, in fact, all of these SoCs have nearly identical memory interfaces, so the key difference will be how efficiently the CPU and GPU use the memory bandwidth. But ultimately it all comes down to when these SoCs are being released. Technically, the Exynos 5 Octa already debuted in the GT-I9500 variant of the Galaxy S4, but most S4s are the I9505 variants with a Snapdragon 600 SoC, so there are few devices shipping with the Octa due to low availability of the SoC. The Tegra 4 is supposed to break cover in May/June with the NVIDIA SHIELD gaming portable, and the Snapdragon 800, as well as the Tegra 4i, are supposed to debut later this year.

sexta-feira, 16 de novembro de 2012

Nexus 10's SoC, Samsung Exynos 5250 review


Google's new Nexus 10 tablet, which was developed alongside Samsung, features some killer specs, most notably, the unsurpassed screen resolution of 2560 x 1600 pixels. This immense resolution results in a fine 299ppi pixel density, surpassing even the iPad's 264ppi display. However, performance demand is proportional to resolution, therefore, to be able to power such a powerful display, Google needed an extremely powerful SoC, something of the likes of Apple's A5X SoC. Well, Google has done right in choosing Samsung's brand new Exynos 5250 SoC. The Exynos 5250 is built on Samsung's successful 32nm HKMG (High-K Metal Gate) process. 

The CPU

Less than a year ago, all SoC manufacturers were bound to Cortex-A9 CPUs, so there was no real competition in terms of single-threaded performance, since it was all the same. Today, of course, the story is different. ARM's Cortex-A15 architecture leveraged certain ARM-based CPU designing companies into making similar architectures. Currently, we have three new architectures competing with each other, Qualcomm's Krait, Apple's Swift, and ARM's Cortex-A15. The real difference between these is single-threaded performance. While the Swift didn't really surprise anyone, and Krait, well, did a pretty good job, the Cortex-A15 is just astounding on a per-core basis. Indeed, one A15 core is about twice as fast as one Krait core. Now, The Exynos 5250 has two A15 cores clocked at 1.7GHz, which theoretically should translate into the best mobile CPU performance ever seen.

Now let's see how two A15s @ 1.7GHz fare against the Quad-core Krait @ 1.5GHz and the dual-core Swift @ 1.4GHz.


The Exynos 5250 in the Nexus 10 distinguishes itself clearly from the rest of the competition, and it does that, while also being one of the few who have only two cores (granted, it has the highest clock). Comparing it to the only other dual-core processor there, the Swift, found in the Apple A6X flagship SoC, we can see that per-core efficiency is unprecedented in the Cortex-A15.

Memory interface

The main constraint that a high-resolution screen poses is memory bandwidth. Considering how bad Android devices' SoCs usually do in terms of memory bandwidth, the Exynos 5250 was a true surprise. This SoC contains a dual-channel DDR3-1600 (800MHz) memory controller, bringing the theoretical memory bandwidth to a PC-class 12.8GB/s. Apple's A5X and A6X chips (both powering an iPad's Retina display) both achieve the same bandwidth as the Exynos 5250, theoretically speaking, only with a much wider memory interface (quad-channel LPDDR2-1066). That said, the Exynos 5250 is the first non-Apple SoC ever to be able to power such a fine display as the Nexus 10's.

The GPU

Another big surprise that comes with the Exynos 5250 is the brand new Mali-T604 GPU, built by ARM. It is the first GPU to use ARM's new midgard architecture, which is a new unified shader architecture. As we will see in the benchmarks below, the Mali-T604 kills the GPU inside the A5X, but lags behind the A6X. At any rate, it is by far the best GPU on the Android space.


Despite the theoretical memory bandwidth of the Exynos 5250 being equal to Apple's A6X, we clearly see that there is a huge gap between the A6X's maximum fill rate and the Exynos' maximum fill rate. Aside from the A6X, the Exynos 5250 shows excellent fill rate, leaving the other competition in the dust, and it should surely be able to handle properly the Nexus 10's immense display resolution.



Unfortunately, we see in the Offscreen Egypt HD test that the iPad 4's PowerVR SGX554MP4 GPU is actually significantly stronger than the Mali-T604, which only edges ahead of the Adreno 320. Still, it performs very well, and is the most powerful GPU on the Android space. The Onscreen test shows us the sad reality that the 2560 x 1600 display is a bit too demanding for the Mali-T604. The overall weaker GPU, combined with the significantly higher reolution, resulted in the Nexus 10 performing much worse than the iPad 4. Despite that, and taking into consideration the unprecedented resolution, the Mali-T604 does offer some reasonably good performance, and should suffice for most gaming uses.



Here is the part where Mali GPUs always get disappointing. ARM's GPUs have never been very good at their triangle throughput. We can see that the T604's midgard architecture has really improved Mali's triangle throughput, looking at the improvement since the last-gen Mali-400MP, but it is still humiliating for ARM. The triangle throughput isn't even comparable to what the iPad 4 offers, and even the 1-year-old NVIDIA Tegra 3 edges ahead (granted, triangle throughput is NVIDIA's forte). It should be good enough to power most games, but in situations where the polygon count is very high, the weak triangle throughput could act as a performance bottleneck.

Conclusion

We've seen that the Samsung Exynos 5250 has its strong points, but we've also seen a weak side to Samsung's new flagship SoC, although the vast majority of the SoC's features are excellent. The Exynos 5250 is a very interesting SoC because it uses new architectures for almost everything; it is the first to use ARM's Cortex-A15 CPU design, and also the first to have ARM's new Mali-T604 GPU. As a result, the midgard architecture also debuts with the Exynos 5250. Unlike most of the current SoCs, the Exynos 5250's features are all fine, from the die process to the CPU, the GPU and the memory controller. We also get the best from power efficiency, thanks to Samsung's 32nm HKMG process and the Cortex-A15, which proved to be quite an efficient CPU. The Nexus 10 is a very demanding device, and the Exynos 5250 is the only SoC capable of satisfying the new slate's needy demands.

sexta-feira, 2 de novembro de 2012

Apple A6X SoC Analyzed: Dual-core Swift and quad-core PowerVR SGX 554MP4


So we have the iPad 4 coming out today, and Apple previously made a claim that the SoC (System-on-Chip) powering it would be twice as fast as the previous generation iPad's A5X SoC. Well, we just got some benchmark results on the iPad 4, and they only seem to prove that indeed, the A6X is at least twice as good as the A5X, and in some benchmarks even more than twice.

CPU first, of course. The A6X features a dual-core CPU based on a custom architecture made by Apple, named Swift, much like in the iPhone 5, except the A6X's CPU is clocked at 1.4GHz, as opposed to 1.3GHz in the A6. The performance of the A6X CPU is ok, and is able to keep up with what the Android competition currently offers, not more, not less.


CPU performance has never been Apple's focus, but in the iPad 4 it's ok, at most. At the very least, it matches (and slightly outperforms) the Galaxy S III, at the same clock speed but with two cores less, but it is still destroyed by Samsung's Exynos 5 Dual, Qualcomm's Snapdragon S4 quad-core, and to some extent NVIDIA's Tegra 3. On a per-core performance perspective, Apple's Swift is very good, although the per-core king of the hill now is the new Exynos 5250.

Before going into the GPU, I'll quickly talk about the memory interface in the A6X. Like in the A5X, the memory interface in the A6X is the widest one ever seen on a mobile device. The A6X features a quad-channel LPDDR2-1066 memory controller, bringing the theoretical memory bandwidth up to an impressive 12.8 GB/s. This is, of course, necessary to power the 2048 x 1536 display in the iPad 4.

And then there's the GPU, the biggest change in the A6X. Apple has always been known for pushing the mobile graphics performance forward, and it has done just that again. The A6X features a stunning PowerVR SGX554MP4 quad-core GPU. The main difference between the 543 used in previous-gen iPads and the 554 is a doubling in ALU count per core. having four cores, this, therefore, means that the 554MP4 has a total of 32 ALUs, as opposed to 16 ALUs in the 543MP4, therefore, at the same clock speed, we get doubled performance since the A5X. If we assume that the clock speed remains unchanged, this results in a revolutionary 64 GFLOPS of peak theoretical performance. Apart from that, we get the same PowerVR goodness Apple has always benefited from; a TBDR (Tile Based Deffered Renderer) with unified shader architecture.




The Offscreen Egypt HD test shows us that the SGX 554MP4 is much more powerful than the Adreno 320, but in the Onscreen test, what puts the iPad 4 behind the Adreno 320-boasting PadFone 2 is the iPad 4's immense resolution.



As we can see, the iPad 4 is at the top in every benchmark, except for the Egypt HD Onscreen test, where the iPad's large resolution limits its performance, but at any rate, the margin between the PadFone 2 and the iPad 4 in the onscreen test is almost insignificant. We can see that, once again, Apple has managed to put itself at the very top of mobile graphics performance, by a very large margin.


All of that power results in a giant die area of 123mm2. It is really very large, considering that it is build with Samsung's 32nm HKMG (High-K Metal Gate) process. The A6X sets a new benchmark for other competitors to reach. Samsung's Exynos 5250, despite being from the same generation as the A6X, has already lost to it. The warning goes to NVIDIA, mainly. The graphics company's upcoming Tegra 4 'Wayne' SoC is rumored to have a quad-core Cortex-A15 CPU (which is double the performance of the already champion Exynos 5250), and a Kepler based GPU that allegedly has 24 cores, and will be built on 28nm process. We can rest assured that a quad-core Cortex-A15 will be more than enough, and 28nm will also bring the Tegra series up to date. A Kepler GPU seems promising, too, but NVIDIA will have to work hard if it wants to beat the new graphics performance (and also memory bandwidth) king of the hill, the Apple A6X.