Mostrando postagens com marcador qualcomm. Mostrar todas as postagens
Mostrando postagens com marcador qualcomm. Mostrar todas as postagens

sexta-feira, 12 de dezembro de 2014

Apple A8 vs Snapdragon 805 vs Exynos 5433 - Smartphone SoC Comparison (2014 Edition)

Smartphones have become just about the most important gadget in a person's life, since it has positioned itself as a sort of a does-everything device (recently, even measuring your heart rate). As such, it needs a powerful processor to keep things going smoothly, even with so many utilities and features baked in. Accordingly, smartphone processor performance has seen exponential growth over the last few years, blazing past even some older laptops at this point. This year of 2014 we have the latest and greatest ultra-mobile processors shipping in devices in time for the holiday season. Among the best competitors we have: Apple, with its A8 processor, found in the iPhone 6 and 6 Plus, Qualcomm, with its latest and greatest Snapdragon 805, and Samsung's Octa-core Exynos 5433 SoC. It's no doubt that all three processors are performance monsters, but which of them offers the best performance, and more importantly, which one is the most power efficient?

Firstly, let's see how these processors compare on paper:

Apple A8 Snapdragon 805 Exynos 5433
 Process Node   20nm  28nm HPM  20nm HKMG 
 CPU  Dual-core 64-bit "Enhanced Cyclone" @ 1.4GHz  Quad-core 32-bit Krait 450 @ 2.7GHz  Octa-core 64-bit big.LITTLE (Quad-core ARM Cortex-A57 @ 1.9GHz + Quad-core ARM Cortex-A53 @ 1.3GHz)
 GPU  PowerVR GX6450 @ 450MHz (115.2 GFLOPS)  Adreno 420 @ 600MHz (337.5 GFLOPS)  Mali T760-MP6 @ 700MHz (204 GFLOPS)
 Memory Interface   Single-channel 64-bit LPDDR3-1600 (12.8GB/s)  64-bit Dual-channel LPDDR3-1600 (25.6GB/s)  32-bit Dual-channel LPDDR3-1650 (13.2GB/s)


At least on paper, all three processors are extremely powerful and very competitive when it comes to power and efficiency. However, the three SoCs use extremely different approaches to achieve their performance. While Apple prefers to have a smaller CPU core count, whilst making the core itself very large to achieve a high performance with just two cores, Samsung's quantity-over-quality philosophy means that they chose to throw in a very large number of CPU cores (in fact, eight of them). Qualcomm sits between Apple and Samsung, offering four CPU cores with decent per-core performance. In practice, given that most applications do not scale performace very well beyond two cores, I personally prefer Apple's approach, however a more limited selection of apps that can actually utilize a large core count, for instance, games which involve more complex physics calculations, might see Samsung's approach as the fastest option. Either way, the most accurate way of comparing the performance of these processors is using synthetic benchmarks.

Let's start with the GeekBench 3 benchmark, which tests CPU performance:
As you can see, Apple's second generation Cyclone core is just about the fastest core used in any current smartphone. Nvidia's Denver CPU core used in their Tegra K1 SoC outperforms the Cyclone core, but since the Tegra K1 is pretty much a tablet-only platform, I'm not considering it in this comparison. Meanwhile, The also 64-bit Exynos 5433, while behind the A8 by a large margin, is slightly above the Snapdragon 805. I also included data from the Snapdragon 801 chipset to quantify the evolution of the Krait 450 core in the Snapdragon 805 compared to its predecessor, Krait 400. The difference isn't big, actually, which makes for the fact that the Snapdragon 805 has the weakest single-threaded performance of all current high-end SoCs.
With four high-performance CPU cores aided by another four low-power cores (yes, Samsung managed to make both core clusters work at the same time, unlike with their previous big.LITTLE CPUs), it was obvious from the start that Samsung's processor would come out on top in applications that scale to multiple cores. In fact, the Exynos 5433's multi-threaded performance has a significant advantage over the competition. In second place comes the Snapdragon 805, with a much lower yet still very high score. Again, the multi-threaded test shows only a marginal improvement over the Snapdragon 801. And in last place comes Apple's dual-core A8, which, despite employing a very powerful core solution, simply had too few cores to outperform the competition. Still, it's not far behind the Snapdragon 805, and its score is very respectable indeed. 

Now, moving on to what probably is considered the most important area in SoC performance: graphics. To measure these processors' capability for graphics rendering, we turn to the GFXBench 3.0 test.
It's reasonable to say that the three main competitors in the high-end SoC segment are pretty much on par in terms of their GPUs' OpenGL ES 3.0 performance. However, the PowerVR GX6450 in the iPhone 6 Plus takes the lead, followed closely by the Snapdragon 805's Adreno 420, and in last place is the Mali-T760 in the Exynos 5433, but again, losing by a small margin.
For OpenGL ES 2.0 performance we see the performance gap widen, however the same basic trend can be seen: The Apple A8 takes first place, followed closely by the Snapdragon 805 and a bit further behind we have the Exynos 5433. Also note how, unlike what we've seen in the CPU benchmarks, this time the Snapdragon 805 gets a huge boost compared to its predecessor, the Snapdragon 801.
The ALU test focuses on measuring the GPU's raw compute power, and on this front, Qualcomm seems to be sitting very comfortably, since both the Snapdragon 805 and its sucessor the 801 are far ahead of the Apple A8 and the Exynos 5433 GPUs.

The Fill test depends mostly on the GPU's Render Output Units (ROPs) and on the SoC's memory interface. Given that the Snapdragon 805 has a massive memory interface, comparable to the one on Apple's tablet-primed A8X chip, it naturally had a huge advantage in this test. Meanwhile, the Apple A8 is slightly below the last-gen Snapdragon 801, and the Exynos 5433 comes in last place, but by a small margin.

Power Consumption and Thermal Efficiency

Since these chips are supposed to run inside smartphones, a lot of attention has to be given for the SoC to fulfill two requirements: consume as little power as possible, especially during idle times, and not heat up too much when under strain. I believe that Apple's A8 chip fares best in this department, because apart from being built on a 20nm process, it's Cyclone CPU has proved to be quite efficient in previous appearances. As for Samsung's Exynos 5433, despite being built on 20nm too, I'm not sure that a processor that can have 8 CPU cores running simultaneously can keep itself cool when under strain without thermal throttling. Although at least, in terms of power consumption, idle power should be very low thanks to the low-power Cortex-A7 cores. Finally, it's a bit hard to determine how power efficient Qualcomm's processors are because the company discloses close to nothing about its CPU and GPU architectures. However, it is a proved solution. Krait + Adreno SoC's from Qualcomm can be found on almost every flagship smartphone from 2014, so while it has the disadvantage of still not having moved to 20nm, experience from the past proves that their SoCs and architectures are sufficiently efficient. 

Conclusion

It's a bit hard to determine exactly which processor is the best. Each one of these fares better than the others in at least one area, but each also has its clear weakness.

The Apple A8, using just two, however powerful, CPU cores, not to mention at relatively low clock speeds, can deliver top-notch single-threaded performance, however its low core count hurts its performance amid the quad- and octa-core competition in multi-threaded applications. Also, the PowerVR GX6450 GPU was a good choice, as at least for general gaming it appears to be the fastest solution available on any smartphone. Power consumption should be also pretty low thanks to the 20nm process used and to Apple's and ImgTech's efficient architectures.

The Snapdragon 805 is really more of an evolution of the 801, without any huge changes. For instance, it's the only 32-bit processor being compared here. However, it still manages to deliver excellent performance, building on the success of the outgoing 801. While it's single-threaded performance is a bit disappointing for a 2.5GHz CPU, it does very well in multi-threaded applications, nearing the Exynos 5433's performance. The Adreno 420 GPU also performs extremely well, losing only to the Apple A8 in GFXBench's general gaming tests and absolutely destroying the competition in terms of memory bandwidth and raw compute power. While a move to 20nm would be appreciated, Qualcomm's processors are known for being power efficient, so no problem here. 

Finally, Samsung's Exynos 5433 is really a mixed bag. It's 20nm HKMG process, together with the low-power Cortex-A7 cores, makes way for excellent power efficiency, at least in terms of idle power, and thanks to its huge core count, its multi-threaded performance is ahead of everyone else. It should be noted that, despite the 20nm process, having 8 cores running at full load might introduce the need for thermal throttling, especially in a smartphone chassis.
However, the Mali-T760 GPU employed is slightly behind the competition in terms of general gaming performance, and raw compute power is quite disappointing...thankfully, raw compute power matters little to the vast majority of users. Still, it's an excellent GPU, just not THE best. 

Overall, these are all excellent processors, each one with their respective advantages and disadvantages. It all comes down to what aspects you think is more important to you, for instance, if you value performance in multi-threaded applications, a Exynos 5433-powered device is ideal for that. For an excellent all-around package, which is also a proven solution for smartphones (plus admirable GPU compute power), pick a Snapdragon 805 device. And if you don't mind as much about multi-threaded performance, but want to have the best gaming performance in any smartphone, you can pick one of Apple's A8-powered iDevices. 

domingo, 23 de novembro de 2014

Apple A8X vs Tegra K1 vs Snapdragon 805 - Tablet SoC Comprarison (2014 Edition)

In the last few years, ultra-mobile System-on-Chip processors have made unprecedented strides in terms of performance and efficiency, advancing very quickly the standards for mobile performance. One form factor that particularly benefits from the exponential growth of SoC performance are tablets, since their large screens allow for the processors' abilities to be fully utilized. For the holiday season of 2014, we have the latest and greatest of mobile performance shipping inside high-end tablets. Apple has made a whole new SoC just for their iPad Air 2 tablet, which they call the A8X. Nvidia's Tegra K1 processor, which borrows Nvidia's venerable Kepler GPU architecture, has also appeared on a number of new high-end tablets. Finally, we also have the Qualcomm Snapdragon 805 processor found in the Amazon Kindle Fire HDX 8.9" (2014). Unfortunately, most other tablets either use the aging Snapdragon 801 processor, or in the case of Samsung's latest high-end tablets, use an even older Snapdragon 800 processor or the also old Exynos 5420 processor, which debuted with the Note 3 phablet in late 2013. In any case, at the pinnacle of tablet performance, we have the Apple A8X, the Tegra K1 and the Snapdragon 805 battling for the top spot.

 Apple A8X   Nvidia Tegra K1   Snapdragon 805
 Process Node   20nm  28nm HPM  28nm HPM
 CPU  Tri-core "Enhanced Cyclone" (64-bit) @ 1.5GHz  32-bit: Quad-core ARM Cortex A15 @ 2.3GHz
 64-bit: Dual-core Denver @ 2.5GHZ
 Quad-core Krait 450 @ 2.5GHz
 GPU  PoverVR GXA6850 @ 450MHz (230 GFLOPS)  192-core Kepler GPU @ 852MHz (327 GFLOPS)  Adreno 420 @ 600MHz (172.8 GFLOPS)
 Memory Interface  64-bit Dual-channel LPDDR3-1600 (25.6GB/s)  64-bit Dual-channel LPDDR3-1066 (17GB/s)  64-bit Dual-channel LPDDR3-1600 (25.6GB/s)


The CPU

It can certainly be said that all of this year's high-end mobile processors have excellent CPU performance. However, each manufacturer took a different path to reach those high performance demands, and that is what we'll be looking at in this section.

Starting with the A8X's CPU, what we have in hand is Apple's first CPU with more than two CPU cores. This time we have a Tri-core CPU, based on an updated revision of the Apple-designed Cyclone core, which utilizes the ARMv8 ISA and is therefore a 64-bit architecture. Clock speeds remain conservative with Apple's latest CPU, going no further than 1.5GHz. So with three cores at 1.5GHz, how does Apple get performance competitive with quad-core, 2GHz+ offerings from competitors? The answer lies within the Cyclone core.
The Cyclone CPU, now in its second generation, is a very wide core. As it is, it can issue up to 6 instructions per clock. Also, each Cyclone core contains 4 ALUs, as opposed to 2 ALUs/core in Apple's previous CPU architecture, Swift. Also, the reorder buffer has been increased to 192 instructions, in order to avoid memory stalls and to utilize more fully the 6 execution pipelines. In comparison, a Cortex-A15 core can co-issue up to 3 instructions per clock, half as much as Cyclone, and can hold up to 128 instructions in its reorder buffer, only two thirds of the amount that Cyclone's reorder buffer can hold.
By building a very wide CPU architecture, and keeping their CPUs to low core counts and clock speeds, Apple has, in one move, achieved excellent single-threaded performance, far beyond what a Cortex A15 or a Krait core can produce, while at least matching the quad-core competition in multi-threaded processing. I've always said that, due to the fact that CPU instructions tend to have a very threaded nature, CPUs should be way more efficient if they are built emphasizing single-threaded performance, and Apple continues to do the right thing with Cyclone.

The Snapdragon 805 is the last high-end SoC to utilize Qualcomm's own Krait CPU architecture, which was introduced WAY back with the Snapdragon S4. Needless to say, it's still a 32-bit core. The last revision of the Krait architecture is dubbed Krait 450. While Krait 450 carries many improvements compared to the original Krait core, the basic architecture is still the same. Like the Cortex-A15 it's based on, Krait is a 3-wide machine, capable of co-issuing up to 8 instructions at once. In comparison to Cyclone, it's a relatively small core, therefore, it won't be as fast in terms of single threaded performance. Krait 450's tweaked architecture allows it to run at a whopping 2.7GHz, or to be more exact, 2.65GHz. In the case of the Snapdragon 805, we have four of these Krait 450 cores. Qualcomm's signature architecture tweak, which involves putting each core on an individual voltage/frequency controller, allows each core to have a different frequency. That reduces the power consumption of the SoC, and should translate into better battery life. With four cores, and at such a high frequency, the Snapdragon 805's CPU gets very good multi-threaded performance, although the relatively narrow Krait core hurts single-threaded performance very much.

Finally, we have the Tegra K1 and its two different versions. The 32-bit version of the Tegra K1 employs a quad-core Cortex-A15 CPU clocked at up to 2.3GHz, and we've seen a CPU configuration like this in so many SoCs that by this point it's a very well known quantity. The interesting story here is the 64-bit Tegra K1, which uses a dual-core configuration of Nvidia's brand new custom CPU architecture, named Denver. If you don't care much to know about Denver's architecture, you'd better skip this section, because there is A LOT to say about Nvidia's custom CPU.

Denver: The Oddest CPU in SoC history

Denver is Nvidia's first attempt at making a proprietary CPU architecture, and for a first attempt it's actually very good. Some of Nvidia's expertise as a GPU maker has translated into its CPU architecture. For instance, exactly like with Nvidia's GPU architectures, Denver works with VLIW (Very Long Instruction Word) instructions. Basically, this means that the instructions are packed into a 32-bit long "word", and only then are sent into the execution pipelines.

Denver's most peculiar characteristic might be this one: it's an in-order machine, while basically every other high-end mobile CPU has Out-of-Order Execution (OoOE) capabilities. Denver's lack of a dedicated engine that reorders instructions in order to reduce memory stalls and therefore increase the IPC (Instructions Per Clock) should be a huge performance bottleneck. However, Nvidia employs a very interesting (and in my opinion unnecessarily complicated) way of dealing with its in-order architecture.

By not having a hardware OoOE engine built into the CPU, Nvidia has to rely on software tricks to reorder instructions and enhance ILP (Instruction Level Parallelism). Denver is actually not meant to decode ARM instructions most of the time. Rather, Nvidia chose to build a decoder that would run native instructions, optimized for maximum ILP. For this optimization to occur, Nvidia has implemented a Dynamic Code Optimizer (DCO). Basically, the DCO's job is to recognize ARM instructions that are being sent to the CPU frequently, translate it into native instructions and optimize the instruction by reordering parts of the instruction to reduce memory stalls and maximize ILP. For this to work, a small part of the device's internal storage must be reserved to store the optimized instructions.

One implication of this system is that the CPU must be able to decode both native instructions and normal ARM instructions. For this purpose there are two decoders in the CPU block. One huge 7-wide decoder for native instructions generated by the DCO, and a secondary 2-wide decoder for ARM instructions. The difference in size between the two decoders shows how Nvidia expects to have the native instructions being used most of the time. Of course, at the first time that a program is run, and there are no optimized native instructions ready for the native decoder to use, only the ARM decoder would be used until the DCO starts recognizing recurring ARM instructions from the program and optimizes those instructions, from which point onwards that specific instruction would always go through the native decoder. If a program ran the same instructions multiple times (for example, a benchmark program), eventually all of the program's instructions would have a corresponding native optimized instruction stored, and then only the native decoder would be utilized. That would correspond to Denver's peak performance scenario.

While Nvidia's architecture might be a very interesting move, I ask myself if it wouldn't just be easier to build a regular Out-of-Order machine. But still, if it performs well in real life, it doesn't really matter how odd Nvidia's approach was. 

Now, going on to the execution potion of the Denver machine, we see why Denver is the widest mobile CPU in existence. That title was previously held by Cyclone, with its 6 execution pipelines, however, Nvidia went a step ahead and produced a 7-wide machine, capable of co-issuing up to seven instructions at once. That alone should give the Denver core excellent single-threaded performance.

The 64-bit version of the Tegra K1 employs two Denver cores clocked at up to 2.5GHz. That makes it the SoC with the lowest core count among the ones being compared here. While single-threaded performance will most certainly be great, I'm not sure that the dual-core Denver CPU can outrun its triple-core and quad-core opponents.

In order to test that, let's start our synthetic benchmarks evalutation of the CPUs with Geekbench 3.0, which evaluates the CPU both in terms of single-threaded performance and multi-threaded performance.

CPU Benchmarks

In single-threaded applications, Nvidia's custom Denver CPU core takes the first place, followed closely by Apple's enhanced Cyclone core on the Apple A8X. Meanwhile, the older Cortex-A15 and Krait 400 CPU cores are far behind, with the 2.2GHz A15 core in the 32-bit Tegra K1 pulling slightly ahead of the 2.7GHz Krait 450 core in the Snapdragon 805. 


In multi-threaded applications, where all of the CPU's cores can be used, the A8X, with its Triple-core configuration blows past the competition. The dual-core Denver version of the Tegra K1 gets about the same performance as the quad-core Cortex-A15 Tegra K1 variant, with the quad-core Krait 450 coming in last place, but by a very, very small margin. 

Apple's addition of one extra core to the A8X's CPU, together with the fact that Cyclone is a very powerful core, make it easily the fastest CPU in the market for multi-threaded applications. While Nvidia's 64-bit Denver CPU core has some impressive performance, thanks to its wide core architecture, it's core count works against it in the multi-threaded benchmark. It is, in fact, the only dual-core CPU being compared here. Even if it's not as fast as the A8X's CPU, Nvidia's Denver CPU is a beast. Were it in a quad-core configuration, it would absolutely blow the competition out of the water.

The GPU

Moving away from CPU benchmarks, we shall now analyze graphics performance, which is probably even more important than CPU performance, given that it is practically a requirement for high-end tablets to act as a decent gaming machine. First we'll look at OpenGL ES 3.0 performance with GFXBench 3.0's Manhattan test, followed by the T-Rex test, which tests OpenGL ES 2.0 performance, followed by some of GFXBench 3.0's low level tests.

The Manhattan test puts the Apple A8X ahead of the competition, followed closely by both Tegra K1 variants, which have about the same performance, since they have the exact same GPU and clock speed. Unfortunately, the Adreno 420 in the Snpadragon 805 is no match for the A8X and the Tegra K1, something that points out the need for Qualcomm to up their GPU game.

The T-Rex test paints a similar picture, with the A8X slightly ahead of the Tegra K1, while both of the Tegra K1 variants get about the same score, and the Snapdragon 805 falls behind the other two processors by a pretty big margin.

The Fill rate test stresses mostly the processor's memory interface and the GPUs TMUs (Texture Mapping Units). Since both the Apple A8X and the Snapdrgon 805 have the same dual-channel 64-bit LPDDR3 memory interface clocked at 800MHz, the performance advantage the Snapdragon 805 has shown in comparison to the A8X can only be attributed to the possibility that the Adreno 420 GPU has better texturing performance than the PowerVX GXA6850 in the Apple A8X. Meanwhile, the two variants of the Tegra K1 feature the same memory interface, which also consists of a dual-channel 64-bit LPDDR3 interface, only with a lower 533MHz clock speed. Therefore, the Tegra K1 offers signifcantly less texturing performance compared to the A8X and the Snapdragon 805, but is a very worthy performer nevertheless.
The ALU test is more about testing the GPUs sheer compute power. Since Nvidia's Tegra K1 has 192 CUDA cores on its GPU, it naturally takes the top spot here, and by a pretty significant margin.

For some reason, all tests show the 32-bit Tegra K1 in the Nvidia Shield Tablet scoring a few more points than the 64-bit Tegra K1 in the Google Nexus 9. But given that the two processors have the exact same GPU, this difference in performance is probably due to software tweaks in the Shield Tablet's operating system, which would make sense, given that it is more than anything a tablet for gaming.

Thermal Efficiency and Power Consumption

In the ultra-mobile space, power consumption and thermals are the biggest limiting factors for peformance. As the three processors being compared here are all performance beasts, several measures had to be taken so that they wouldn't drain a battery too fast or heat up too much.

In order to keep power consumption and die size in check, Apple has decided to shrink the manufacturing process from 28nm to 20nm, a first in the ultra-mobile processor market. That alone gives it a huge advantage over the competition, since they can put more transistors in the same die area, and with the same power consumption. Since the A8X is, in general, the fastest SoC available, the smaller process node is important to keep the iPad Air 2's battery life good. 

Nvidia's Tegra K1 should also do well in terms of power consumption and thermal efficiency in situations where the GPU isn't pushed too hard. The 28nm HPM process it's built upon is nothing particularly good, but it's still not old for a 2014 processor. While the Kepler architecture is very power efficient, straining a 192-core GPU to its maximum is still going to produce a lot of heat in any case. The Nexus 9 tablet reportedly can get very warm on the back while the tablet is running an intensive game.

Finally, the Snapdragon 805 should be the less power hungry processor because it is also a smartphone processor. Given that a 5" phone can carry this processor without heating up too much or draining the battery too fast, a tablet should certainly be able to do the same. To put things in perspective, if we put the Tegra K1 or the Apple A8X inside a smartphone, both would be too power hungry and would produce too much heat to make for a decent phone. In any case, the Snapdragon 805 is, like the Tegra K1, built on a 28nm HPm process. Given that its not as much a performance moster as the other two processors mentioned here, it must be the least power hungry of all three.

Conclusion

Objectively speaking, the comparisons made here make it pretty much clear that once again Apple takes the crown for the best SoC for this generation of high-end tablet processors. Not that the competition is bad. On the contrary, Nvidia went, in just one generation, from being almost irrelevant in the SoC market (let's face it, the Tegra 4 was not an impressive processor) to being at the heels of the current king of this market (aka Apple). The Tegra K1 is an excellent SoC, and even if it can't quite match the Apple A8X, it's still quite close to it in most aspects.

Meanwhile, Qualcomm is seeing it's dominance in the tablet market start to fail. It's latest SoC, the Snapdragon 805, available even on some smartphones and phablets, is available in only one tablet, while most others carry the Snapdragon 801 or even the 800, and this is disappointing, given that a tablet can utilize the processing power more usefully than a smartphone or a phablet. Either way, the Snapdragon 805 is still a very good processor. It's just far from being the fastest. Perhaps Qualcomm should consider, like Nvidia and Apple, making a processor with extra oomph, but meant only to run inside tablets, because while the Snapdragon 805 is an excellent smartphone processor, it's not as competitive in the tablet market. 

sexta-feira, 28 de fevereiro de 2014

Qualcomm's 2014 SoC Lineup: Snapdragon 805, 801, 615, and More


Qualcomm enjoyed a very profitable 2013, with its Snapdragon 400, 600 and 800 System-on-Chips practically dominating the mobile market. And to keep manufacturers interested in Qualcomm's offerings, a refresh of various tiers of the Snapdragon line were recently announced. Specifically, we have six new Snapdragon SoCs coming out this year. 

On the top tier, we have the Snapdragon 801, which will be found inside the Samsung Galaxy S5 and the Sony Xperia Z2 and Xperia Z2 Tablet, as well as the Snapdragon 805, which is even more powerful than the 801, but will be available later this year. On the 600 tier, Qualcomm will offer the Snapdragon 602A, which isn't intended exactly for mobile devices, but rather, it's Qualcomm's offering for in-vehicle infotainment systems. Two 64-bit Snapdragon 600 variants will be available later this year too, the 610, which packs a quad-core CPU, and an octa-core variant named 615. The Snapdragon 410 is also due this year, and it'll pack a 64-bit CPU too.

So you might have noticed that, unlike with last year's Snapdragon lineup, where the designation 200, 400, 600 or 800 made it very clear where each tier stood, this year's numerical designations are a bit of a mess. It's still clear that the 805 and 801 are faster than the 610, 615 and 602A, which in turn are faster than the 410, but there might be some confusion between different variants of the same tier. For instance, 805 could be easily confused for 801, and the same goes for 610 and 615. It's just not as simple to understand as last year's lineup. Not only are the 2014 Snapdragons' nomenclatures messed up, but so are the architectural differences between the new SoCs. You might have noticed that while the 600 and 400 tiers are being upgraded to 64-bit CPU cores, the 801 and 805 are stuck with 32-bit capability, which simply does not make sense. Logically, 64-bit would come to top-tier processors first, and then make its way down to the subsequent tiers, but Qualcomm inexplicably decided to do the contrary. In any case, as it stands, the ability for 64-bit processing still isn't a very important feature in mobile devices, especially since Android OS doesn't even support 64-bit processing.

So let's analize each new Qualcomm SoC one at a time, starting with the high-end.

The Snapdragon 801 processor, found in recently announced Samsung and Sony flagship smartphones and tablets, is merely a mild upgrade over the Snapdragon 800. The biggest change in the 801 is the addition of eMMC 5 support, which will allow for faster flash storage solutions. Other than that, we still have a quad-core Krait 400 CPU, although the clock speed has been ramped up to up to 2.5GHz. This represents an ~8.7% speed increase over the Snapdragon 800. The Snapdragon 801 also keeps the Adreno 330 GPU, but increases its clock speed from 450MHz to up to 578MHz (~28% increase in theoretical performance). The memory interface, while still dual-channel 32-bit DDR3, gets a clock speed increase to 933MHz, which results in 14.9GB/s theoretical memory bandwidth, up from 12.8GB/s in most Snapdragon 800 variants. And that's all. Even though the architecture remains unchanged, the clock speed boosts should give the 801 a considerable performance advantage over the 800. 

Then there's the Snapdragon 805, which will be available later this year and will employ a quad-core configuration of a refresh to the Krait 400 CPU core, the Krait 450. The new CPU's improved efficiency allows for clock speeds to go up to 2.7GHz. Unfortunately, the Krait 450 core is still based on ARMv7, in other words, it doesn't support 64-bit processing. The GPU in the Snapdragon 805 gets a huge uplift, with the new Adreno 420, which brings a DirectX11-class feature set, improves on texture performance, as well as adds dedicated tesselation hardware, something previously seen only on PC graphics cards. The memory interface also gets a huge boost with the Snapdragon 805, moving to a 128-bit wide (quad-channel) LPDDR3-1600 interface. The added interface width results in peak theoretical memory bandwidth of 25.6GB/s. For comparison, most mobile SoCs today top out at 14.9GB/s. 
The Snapdragon 805 doesn't bring any big changes in comparison to the 800, with only mild improvements on the CPU side, but the new, more capable GPU and the impressively wide memory interface are enough to make the Snapdragon 805 an excellent processor. When it's available, I imagine it'll be quite capable to compete with the Tegra K1, the Apple A8 and whatever Samsung has to offer by then.

Moving down to the Snapdragon 600 tier, we have the new Snapdragon 602A which, as stated before, isn't meant for mobile devices, but rather for in-car infortainment systems, an area where many companies, NVIDIA included, have suddenly become interested in. Not much is known about the 602A, but we know it'll have a quad-core Krait (400?) CPU and an Adreno 320 GPU.

The Snapdragon 610 is one of the few Snapdragon processors that support 64-bit processing. It has four ARM Cortex-A53 cores (clock speed unknown) and an Adreno 405 GPU; so far we don't know anything about the GPU, although if it turns out to be a cut-down Adreno 420, we can expect the DirectX11-compatible architecture and the dedicated tesselation hardware. For the record, the Cortex-A53 is the lower end of the two Cortex-A5x CPUs released so far, and it's performance relative to current CPUs is still to be seen.
The Snapdragon 615 is essentially a 610 with four extra CPU cores inside. So it keeps the Adreno 405 GPU, but moves to an octa-core Cortex-A53 CPU. I'm a bit disappointed that Qualcomm decided to increase core count rather than use a more powerful CPU core. Most applications scale better to a fewer amount of threads, so not only will the last four cores or so probably end up not being efficiently utilized, the weaker single-threaded performance of the Cortex-A53 will hurt overall performance in applications that don't scale well to multiple threads. I would've preferred if Qualcomm had used a fewer number of the high-end Cortex-A57 cores, or even used its own Krait cores (even if that meant sacrificing 64-bit processing). 

Finally, there's the Snapdragon 410 mid-to-low-end processor, which combines four 64-bit Cortex-A53 cores at a clock speed of 1.2GHz with an Adreno 306 GPU (since this GPU belongs to the 3xx series, I suspect it won't have the architectural upgrades that the Adreno 405 and 420 got). Not much to go on about here, except to point out that even the weakest Snapdragon processor got an upgrade to 64-bit processing, while the high-end processors didn't. 

While I believe that until mobile processors built on a 22nm process show up there won't be another big leap in SoC performance, I'm not very impressed with Qualcomm's 2014 lineup. The Krait 450 CPU is a very small upgrade compared to the Krait 400, and the excessive use of the Cortex-A53 CPU core in the 600 and 400 tiers is rather disappointing because of the lower-end nature of the Cortex-A53. I'd talk about Qualcomm's decision to give the lower-end processors 64-bit processing, while leaving the high-end stuck at 32-bit, but since it's likely that a) Qualcomm doesn't have a 64-bit successor to Krait yet and b) The Cortex-A57 core is still not available, and since I wouldn't like to have seen more Cortex-A53s on the 800 tier, I won't comment on it. I also might as well reiterate how Qualcomm's nomenclature for its new SoCs can be rather confusing. On the bright side though, at least the Snapdragon 805 is bound to be very competitive, if not industry leading, in terms of GPU performance and memory bandwidth.

sexta-feira, 27 de dezembro de 2013

Samsung Galaxy Note 10.1 (2014 Edition) vs Microsoft Surface 2: Tablet Comparison


The holiday season is almost upon us, and so the biggest players in the tablet market finally have their latest flagships already available. The Surface 2, from Microsoft, and 
the Galaxy Note 10.1 2014 Edition, by Samsung, are some of the most interesting tablet 
flagships this holiday season. Both of them have very high-end specs, including high-resolution displays and powerful processors, along with a (perhaps too) high price tag. But which one is worth your money the most?

Galaxy Note 10.1 (2014 Edition) Microsoft Surface 2
 Body   243 x 171 x 7.9mm, 540g (Wi-Fi)/547g (LTE)   275 x 172.5 x 8.9mm, 676g 
 Display   10.1" TFT LCD 2560 x 1600 (299ppi)  10.6" ClearType 1920 x 1080 (208ppi)
 Connectivity   Wi-Fi, GSM (2G), HSDPA (3G), LTE (4G)  Wi-Fi
 Storage  16/32 GB, 3 GB RAM  32/64 GB, 2 GB RAM
 Camera (Rear)  8 MP with LED flash, Dual-camera, dual-recording and HDR and 1080p video  5 MP with LED flash and 1080p video
 Camera (Front)  2 MP with 1080p video  3.5 MP with 1080p video
 OS  Android 4.3 Jelly Bean  Windows 8.1 RT
 Processor  Wi-Fi: Exynos 5420 (Quad-core Cortex-A15 @ 1.9GHz + Quad-core Cortex-A7 @ 1.3GHz + Mali-T628 GPU)
 LTE: Qualcomm Snapdragon 800 MSM8974 (Quad-core Krait @ 2.3GHz + Adreno 330 GPU)
 NVIDIA Tegra 4 (Quad-core Cortex-A15 @ 1.7GHz + 72-core ULP GeForce)
 Battery  Non-removable 8,220 mAh
 Up to 10 hours of use
 Non-removable ~8,500 mAh (31.5 Wh)
 Up to 10 hours of use
 Accessories  - S Pen stylus  - Touch Cover 2 ($119)
 - Type Cover 2 ($129)
 - Power Cover ($199)
 Price  $549 (16 GB, Wi-Fi)  $449 (32 GB)


The two tablets, despite both being flagships, compete at different price points, which explains partially why the Surface 2's specs aren't as impressive as the Galaxy Note 10.1's.

Design

These two flagship tablets feature some very nice designs to go with the powerful hardware inside them and their high prices. In terms of materials the Surface 2 has the upper hand, because while the Galaxy Note 10.1 has a plastic back with a special texture that makes it look like leather (aka faux leather), the Surface 2's internals are protected by a durable magnesium alloy the Microsoft calls VaporMg. That gives the Surface 2 a premium look over the Note 10.1, and it's more durable too. The faux leather on the Note 10.1 might be appealing to some, but to others, including me, the leather imitation is not attractive. 

Despite its high-quality materials, the Surface 2 just falls short of its competitors when it comes to its size and weight. It's inevitably larger, since the screen is 1/2" larger than its Android rivals and is 0.9" larger than the iPad's screen, so we can't blame Microsoft for that. Not only that, but it's also thicker than most recent flagship tablets (the Galaxy Note 10.1 measures 7.9mm thick, and the iPad Air is 7.5mm thick) and much heavier, weighing 676g versus the Galaxy Note 10.1's 540g and the iPad Air's 469g. Of course some of that extra weight comes from the larger dimensions due to the larger screen, but that doesn't excuse the Surface 2 for being that heavy. It's still quite comfortable to hold, but the Galaxy Note 10.1 will definitely tire your arms less when holding the tablet for an extended period of time. The Surface 2 also has considerably larger bezels than its rivals. Considering the size of the screen, however, the bezel size is quite appreciable.

Of course, the Surface 2's built-in kickstand distinguishes it from all of its competitors. The new 2-stage kickstand is very useful, and is something you'd only be able to achieve on other tablets with covers like the iPad's Smart Cover, and considering the Surface 2's weight, you might find yourself using the kickstand more than you imagine. Also, one unique feature of the Surface 2 is its keyboard covers, which attach to the amazingly strong magnetic connector on the bottom side of the tablet and can also double as a cover for the screen. There are three options of keyboard covers, starting with the Touch Cover 2, which sells for $119 and this one features capacitive keys, which are now backlit, the Type Cover 2, which sells for $129, is thicker than the Touch Cover 2 with the benefit of having physical keys, which are also backlit. Finally there's the Power Cover, which will be available as of early 2014 for $199, and will be basically a Type Cover 2 with a built-in battery. Along with Microsoft Office RT 2013, the keyboard covers make the Surface 2 just about the most productive ARM tablet on the planet. 

The Galaxy Note 10.1 also has some productivity-oriented tricks up its sleeve with its S Pen digitizer, which comes included with the tablet and offers precise pen input for taking notes and other related tasks. 

Display

Of these two tablets, it's the Galaxy Note 10.1 2014 Edition that has the better display. While the display is smaller, measuring 10.1" diagonally, it packs much more pixels than the Surface 2, with a stunning 2560 x 1600 resolution and a top-notch 299ppi pixel density. Samsung's display also has excellent viewing angles and reproduces colors vibrantly and accurately. 

The Surface 2 packs a slightly larger 10.6" screen with a resolution of 1920 x 1080, resulting in a pixel density of 208ppi. Microsoft uses its so-called ClearType technology in the Surface 2, which means that the touch panel and the glass are laminated to the display, reducing reflections and thus making the tablet's screen more comfortable to use in direct light or outdoors. The display also has wide viewing angles, and like the Note 10.1 also reproduces accurate and saturated colors, although the Note 10.1 is still slightly more vivid. The difference in the two displays' pixel densities is quite easily noticeable when viewing text. The Note 10.1 is just completely devoid of any pixellation, while the Surface 2, while still very crisp, does show some pixellation in text if you look closely. 

The two displays have slightly different aspect ratios. While the Note 10.1 is 16:10, the Surface 2 is even wider with a 16:9 aspect ratio. So while the Note 10.1 is noticeably less wide, both screen share the same benefits and problems, for instance, they're excellent for watching videos, but while the Note 10.1 would show a very small amount of letterboxing, the Surface 2 should be devoid of any letterboxing. Both are also quite awkward to use in portrait mode, but the Note 10.1 is arguably a bit less awkward to use in portrait. In any case, both displays are excellent, but the Note 10.1 certainly outclasses the Surface 2 in every way, even if by a little. 

Processor

As flagship tablets, both of them are equipped with the latest and greatest silicon. The Surface 2 has a Tegra 4 processor, while the Galaxy Note 10.1 goes with a Snapdragon 800 beast for the LTE variant or Samsung's own Exynos 5420 processor for the Wi-Fi only version.

The Tegra 4 is NVIDIA's latest system-on-chip, and utilizes the 4-PLUS-1 architecture originally introduced in the Tegra 3, what that means is that there is one main CPU cluster, which is composed of four Cortex-A15 cores clocked at up to 1.9GHz with one core active (and 1.7GHz with more than one core active) and one additional shadow A15 core targeted for low frequency (up to ~825MHz) and low power consumption. When the CPU workload is very light, for example, when the device is idling, all processing transfers to the shadow core and the quad-core A15 cluster is power-gated, so that the shadow core can process these light tasks while consuming very low power, enhancing battery life. Performance-wise, the Cortex-A15 is one of the best performing mobile CPUs in existence, so CPU performance on the Surface 2 should be on par with the industry's greatest. 

The GPU in the Tegra 4 is a bit more disappointing. The shader architecture is the only one in the current mobile industry that is discrete rather than unified, and mind you, that's the architecture that most similarly resembles the Geforce 6000 series (which is very old indeed). Basically this means that, instead of each shader in the GPU being able to process pixel or vertex instructions based on the workload, there are separate pixel and vertex shader units. With a total of 72 shader cores (48 shader, 24 vertex) with a pretty high clock speed of 672MHz, the Tegra 4 actually packs a lot of processing power, despite its old architecture. As benchmarks will show, Tegra 4's GPU performance is somewhat behind the Snapdragon 800 and the Exynos 5420 processors used in the Galaxy Note 10.1, but considering that the Surface 2's GPU needs to push roughly half the amount of pixels compared to the Note 10.1, they're actually well-balanced performance-wise.

The Galaxy Note 10.1's Wi-Fi version is packed with an Exynos 5420 processor, more commonly known as Exynos 5 Octa. This processor uses ARM's big.LITTLE CPU architecture which, similarly to NVIDIA's 4-PLUS-1, has one high-performance CPU cluster and a second power-saving CPU cluster. The main cluster is very similar to the Tegra 4, containing four Cortex-A15 cores with a clock speed of 1.9GHz. Unlike the Tegra 4 though, which uses only one core in the power-saving cluster, Samsung went rather overkill and crammed in four low-power Cortex-A7 cores running at up to 1.3GHz. I'm not sure rather the Quad-core A7 @ 1.3GHz is more or less efficient in saving power than a single A15 @ 825MHz, but both solutions should have a similar effect on power consumption. In benchmarks, however, the Exynos 5420 is certainly very close to the Tegra 4, since their high-performing CPU clusters are practically identical. 

On the GPU side, the Exynos 5420 packs ARM's Mali-T628 GPU, which benchmarks prove to be a very powerful GPU and adequate for the Galaxy Note 10.1's high-resolution duties. Unlike the Tegra 4, the Mali-T628 is as modern as mobile GPUs go, as the shader architecture is unified and the GPU boasts full support of OpenGL ES 3.0.

The LTE Galaxy Note 10.1 is equipped with the industry leading Snapdragon 800 processor. This CPU in the Snapdragon 800 is a Quad-core configuration of Qualcomm's own Krait 400 CPU core, running at a max clock speed of 2.3GHz. The CPU is power-efficient enough so that an extra low-power CPU cluster isn't necessary here. In fact, one interesting ability of the Krait 400 core is that each core can run at a different clock speed depending on the workload put on each core, unlike the Cortex-A15, which has the same clock speed on all active cores. What's the advantage of that? For example, if the current workload requires two cores active, using one at full power but only processing light tasks on the second core, a Cortex-A15 CPU would put both cores on their highest clock speed, say, 1.9GHz, even though the second core is processing a light task and doesn't need the full 1.9GHz, while a Krait core, with the same workload, would put the first core on full power, in this case, 2.3GHz, and the second core at a lower clock speed adequate for its current task, say, 1.0GHz. This unique feature really helps increase power efficiency, and renders extra low-power CPU cores unnecessary. In terms of performance, its high clock speed and its strong core architecture make the Snapdragon 800 one of the fastest CPUs around, if not the fastest. 

The GPU in the Snapdragon 800 is the company's own Adreno 330. Since Qualcomm never discloses information about its GPU architectures, I'm left with very little to say about it, however, we do know that, like the Mali-T628, it has a modern architecture, with a unified shader architecture and full OpenGL ES 3.0 support. In general, the Adreno 330 does perform a bit better than the Mali-T628, but its performance is still pretty close to the Mali-T628. 

Now, with all that technical babble about architectures out of the way, let's get to actually testing these processors' performance in benchmarks, starting with Geekbench 3, which measures CPU and memory performance.

Note: The Galaxy Note 10.1's LTE edition isn't commercially available, so I had to take Snapdragon 800 benchmark results from the Note III, which runs the same software as the Note 10.1 and should therefore have almost identical results to the actual S800-powered Note 10.1. However, the difference in resolution (1080p vs 1600p) between the Note III and the Note 10.1 means I can't include onscreen GPU benchmark results for the Snapdragon 800.

Note (2): Unfortunately Geekbench 3 isn't available for Windows RT, so I can't include results for the Surface 2, so I took the performance results from an Android tablet whose processor most closely resembles the Surface 2's, the ASUS Transformer Pad TF701T, which is powered by a slightly higher-clocked Tegra 4.


The main current SoC flagships all have surprisingly similar CPU performance. The Tegra 4, Snapdragon 800 and Exynos 5420 offer very similar single-threaded performance, despite the architectural differences between the Krait 400 core and the Cortex-A15. The only clearly distinguished competitor here is the 64-bit Apple A7, but this is out of the scope of this comparison. When it comes to heavily-threaded tasks the Tegra 4 and the Exynos 5420 are practically identical (seeing as their CPUs ARE identical), along with the Apple A7, while the Snapdragon 800 takes the lead, though not by a big margin.

Evidently, the Galaxy Note 10.1's and the Surface 2's respective processors perform very similarly when it comes to CPU performance, so now let's check out some performance scores regarding the GPU performance with the popular cross-platform application, GFXBench. The two following tests are Offscreen test. That means that the GPU renders at a non-native, fixed 1080p resolution, so that differences between the devices' resolutions don't impact their performance.
In this test the Apple A7 takes the lead, followed closely by the Snapdragon 800. The Exynos 5420-toting Note 10.1 falls a bit behind, and the Tegra 4 in the Surface 2 receives a mediocre score compared to its competitors. Thankfully, the Surface 2 has less pixels to push than the Note 10.1, which will give it a performance advantage in the onscreen tests. 

In this test the Snapdragon 800 and the Exynos 5420 take the lead, with the A7 hot on its heels and the Tegra 4, again, yielding a rather mediocre score. 

Since the Note 10.1 has to push about a couple million pixels more than the Surface 2, it was the Tegra 4 that was faster in the Onscreen T-Rex HD test, but not by as much of a huge margin as two million less pixels would otherwise imply. In fact, I expect the Snapdragon 800-powered Note 10.1 will be able to match or even outperform the Surface 2 in this test, despite having twice the resolution. 

For the Egypt HD Onscreen test, not even the Surface 2's lower resolution helped it outperform the Note 10.1. Even with double the resolution, the Note 10.1 managed a much higher score than the Surface 2. In comparing the Surface 2's GPU performance to other Tegra 4 devices, I realized that the Surface 2 is in fact one of the worst performing Tegra 4 devices. That is probably because Microsoft chose one of the slower Tegra 4 SKUs for the Surface 2, and the CPU clock speed reduction from 1.9GHz to 1.7GHz might have come along with a GPU clock speed reduction from the full 672MHz, perhaps to 600MHz like on the 1.8GHz Tegra 4 in the Tegra Note tablet. At any rate, it's clear that the GPU performance in the Surface 2 isn't as good as the Galaxy Note 10.1

Power Consumption

From these two devices being compared, it's probably the Galaxy Note 10.1 that draws the most power. The first and most obvious cause is that it has much higher resolution screen to power. Also, while the SoC should be very power efficient due to its 28nm process (the Exynos 5420 is built on a 28nm High-K Metal Gate process, while the Snapdragon 800 uses a 28nm HPM process), but given the heavy duties that the GPU will be in charge of due to the high pixel count, it could become quite a power hog, especially when playing games. But the battery is very decently sized, which leads to Samsung's 10 hours of usage claim. 

The Surface 2 has, obviously, much less pixels to power, so the power consumption of the display is significantly lower. The Tegra 4 processor is also built on a 28nm process, so it won't get too hot or consume too much power when processing heavy tasks, like gaming. And while I can't compare the Surface 2's and the Galaxy Note 10.1's battery sizes directly, since Microsoft gives the battery size in watt-hours, and Samsung uses mAh, I can only compare them by estimating the Surface 2's capacity in mAh. Assuming that it's a 3.7V battery, the Surface 2 has around an 8,500 mAh capacity, which is actually slightly larger than the Samsung's 8,220 mAh battery. Then again, the battery voltage in the Surface 2 could easily not be 3.7V, which would lead to a different value altogether, but the safest bet (and that in itself isn't very safe) is 3.7V. Well, the extra thickness of the Surface 2 had to offer some advantage aside from the kickstand. Anyways, despite the lower display resolution and the (maybe) larger battery, Microsoft claims the same 10 hours of usage for the Surface 2, but in practice I'd expect the Surface 2 to outlast the Galaxy Note 10.1, even if only by a little. 

Pricing and Conclusion

Of course, the Surface 2's slightly weaker specs compared to the Galaxy Note 10.1 is justified by their different prices. The entry-level 32 GB Surface 2 sells for $449, while the 16 GB Wi-Fi only Galaxy Note 10.1 has a hefty $549 price tag. That's a $100 dollar difference for the same user-available storage capacity (as the Surface 2's OS leaves it only with about 17.5 GB of free disk space). The 64 GB Surface 2 (that has about 47 GB of free space initially) matches the 16 GB Note 10.1 at $549, while the 32 GB Note 10.1 will cost you $599. So clearly these tablets are competing at different price points. 

The Surface 2 is hands down the best tablet for productivity. With its handy two-stage kickstand and the new backlit Touch and Type covers, which act as much as keyboards as screen covers, as well as the Windows 8.1 RT operating system and the inclusion of Microsoft Office 2013 Home and Student make it by far the most productive tablet available. However, the relatively new Windows Store has many important apps available, but is still missing some key apps, like Instagram, and has nowhere near the amount of apps that the Google Play Store and the Apple App Store offer.

The Galaxy Note 10.1 2014 Edition is the polar opposite of the Surface 2. With a higher-resolution display and a very useful S-Pen stylus, as well as the vast app and media ecosystem offered by the Android OS and its associated app store, the Note 10.1 is up there with the iPad Air as one of the best tablets available for entertainment.

So it really comes down to whether you want a tablet that is geared towards productivity, i.e. a laptop replacement, or a tablet that offers the best for entertainment purposes. The difference in pricing is also a factor, as many will probably find the Galaxy Note 10.1 too expensive.

sábado, 16 de novembro de 2013

Apple A7 vs NVIDIA Tegra 4 vs Snapdragon 800: SoC Wars


Mobile SoC performance has become one of the most competitive aspects in the mobile sector. Since 2010, when the iPad made it clear how important processing power is for mobile devices, performance in mobile devices has had exponential growth, and SoC vendors began to compete more and more. In 2013, the main SoC manufacturers can be narrowed down to Qualcomm, Apple, NVIDIA, and to a lesser extent, Samsung. TI used to be a big player in the SoC market, but this year it practically disappeared from the SoC sector. Now that these companies have their latest silicon shipping in commercially available products, in time for the holiday season, it's time to put their best offerings to the test and see who has the best offering.

Apple A7 NVIDIA Tegra 4 Snapdragon 800
 Process Node   28nm HKMG   28nm HPL  28nm HPM
 Die Size  102mm2  ~80mm2 118.3mm2
 Instruction Set   ARMv8 (64-bit)   ARMv7 (32-bit)   ARMv7 (32-bit)
 CPU  Dual-core Cyclone @ 1.3/1.4GHz   Quad-core Cortex-A15 @ 1.9GHz + Low Power Cortex-A15 @ 825MHz  Quad-core Krait 400 @ 2.3GHz
 GPU  PowerVR G6430 @ 450MHz  72-core ULP GeForce @ 672MHz  Adreno 330 @ max 550MHz
 RAM  32-bit Dual-channel LPDDR3-1600 (12.8GB/s)  32-bit Dual-channel LPDDR3/DDR3L-1866 (14.9GB/s)  32-bit Dual-channel LPDDR3-1866 (14.9GB/s)



The CPU: Dual-core vs Quad-core

Apple's most impressive feat on the mobile performance sector so far is that, in an age of quad-cores with insane clock speeds, Apple has not once shipped a device with more than two CPU cores and with a relatively low clock speed, and has still managed to at least keep up with the latest competition. Let's see how Apple's latest CPU, the dual-core Cyclone with a max clock speed of 1.4GHz, stacks up against NVIDIA's latest offering, the Tegra 4's four Cortex-A15s @ 1.9GHz and the Snapdragon 800's four Krait 400 cores @ 2.3GHz

Architecturally speaking, Apple's CPU is far superior to the Cortex-A15 and the Krait 400. That's because the A7 CPU runs on a brand new 64-bit ARMv8 architecture. The luxury of 64-bit allows the Cyclone CPU to be able to address memory much faster, giving it a tangible performance gain in some cases over traditional 32-bit solutions. Not only that, but Apple has made the Cyclone core much wider than its predecessor, the Swift core. In fact, I think it's the widest mobile CPU so far. The wider architecture plus 64-bit give the Cyclone cores much better single-threaded performance over any of its competitors, and remember that in most use cases single-threaded performance is the most important. Kudos to Apple for competing against monstrous quad-cores with only a dual-core. 

The NVIDIA Tegra 4's CPU uses NVIDIA's Variable Symmetric Multi-Processing architecture, which was introduced with the Tegra 3. Like ARM's big.LITTLE architecture, the Tegra 4 consists of a main CPU cluster, composed of four high-performance Cortex-A15 cores running at a max 1.9GHz, and a shadow A15 core than can go up to 825MHz. When CPU demand is low, the Quad-core A15 cluster is power-gated, and all processing transfers to the shadow A15 core, and it remains like this as long as demand from the CPU is low enough. The advantage of this is, of course, reduced power consumption.

Qualcomm's Snapdragon 800 uses Qualcomm's own modification of the Cortex-A15 core, dubbed Krait 400. Since Qualcomm likes to keep its mouth shut about its CPU architectures, not much is known about the Krait 400. What we know is that the Krait 400 is mostly the Krait 300 core in a 28nm HPm process. However, the move from 28nm LP in the Krait 300 and 28nm HPm in the Krait 400 means that there's been some relayout in the Krait 400. Other differences from Krait 300 include lower memory latency. Apart from that, we only know that, like the Cortex-A15 upon which it's based on, the Krait 400 is a 3-wide machine with OoO (Out-of-Order) processing capabilities. The move to HPm means the Krait 400 can achieve higher clocks than its predecessor, which accounts for the insane 2.3GHz max clock speed. Put that four of those monster cores together and you potentially have the most powerful mobile CPU to date. Unfortunately, it still remains that it also lags behind the Apple A7 in single-threaded performance, which is also very important in mobile OSes. 

Now let's put in some quantitative information to see how these CPUs compare in their actual performance: 

What I said before about single-threaded performance shows here. Apple's Cyclone cores can deliver at least 50% more performance on a single core than any of its competitors. But due to the fact that the A7 has only two cores while all of its main competitors have four of them, in multi-threaded situations the A7 loses its advantage, but can still keep up with all of its competitors. It's very impressive how Apple always manages to match quad-core performance with only two cores. 

The GPU and Memory

Apple has always put more emphasis on the GPU rather than the CPU on its SoCs, and the A7 is no different. Apple continues to license GPUs from Imagination Technologies, like it has been doing since its first iPhone. This time around, Apple is using a PowerVR "Rogue" series GPU, which is based on ImgTech's latest technology and, of course, supports OpenGL ES 3.0. The exact model of the new PowerVR GPU in the A7 is the G6430 variant, which contains four GPU modules with 32 unified shader units on each module. That equates to a total of 128 shader units with at a clock speed of 450MHz. 

Ironically, the NVIDIA Tegra 4's GPU is the least fancy of the current high-end mobile GPUs. Designed by NVIDIA, the GPU in the Tegra 4 is based on the ancient NV40 architecture (the same used in the GeForce 6000 series), hence, its the only modern GPU that uses discrete pixel and vertex shaders. In this case, there are a total of 72 shader units, 48 of which are pixel shaders and the remaining 24 are vertex shaders. The GPU runs on a max clock speed of 672MHz. The biggest limitation of the Tegra 4's GeForce GPU is that it only supports OpenGL ES 2.0. Right now, this isn't really a problem, as game developers haven't yet migrated to OpenGL ES 3.0 for their games, but that practically destroys the future-proofing of the Tegra 4.

Finally, we have the Snapdragon 800 with its Adreno 330 GPU. Like I said before, Qualcomm likes to reveal as little information as possible about its SoCs, and the Adreno line of GPUs are probably the biggest mysteries I'm faced with now. All I can say is that it's a unified shader architecture compatible with the latest OpenGL ES 3.0 API. The Adreno 330, in its highest configuration, runs at 550MHz, but the vast majority of Snapdragon 800 devices have their GPUs clocked at 450MHz. By the way, the benchmark results I'll show later on reflect the Adreno 330's performance at 450MHz, since no devices have released yet with the 550MHz bin of the Adreno 330. 

Snapdragon 800 Apple A7 NVIDIA Tegra 4 NVIDIA Tegra 4i
 GPU Name   Adreno 330  PowerVR G6430   72-core GeForce  72-core GeForce
 Shader Cores
 ?
 4  4 Pixel; 6 Vertex  2 Pixel; 3 Vertex 
 ALUs/Core
 ?
 32  12 Pixel; 4 Vertex  24 Pixel; 4 Vertex
 Total ALUs
 ?
 128  72 (48 Pixel, 24 Vertex)  60 (48 Pixel; 12 Vertex)
 Max Clock Speed  550MHz  450MHz  672MHz  660MHz
 Peak GFLOPS
 ?
 115.2  96.8  79.2


Peak theoretical compute power puts the Tegra 4 behind the A7, but the Tegra 4 is still close enough to the A7 to call it competitive. However, be aware that, while the A7's unified shader architecture allows it to have its peak 115.2 GFLOPS performance available to it in any situation (the same applies to the Adreno 330), the story is quite different with the Tegra 4. The discrete pixel shader architecture means that the GPU's peak 96.8 GFLOPS can only be achieved when the mix of pixel and vertex shader requests matches the ratio between pixel and vertex shader hardware (2:1), so most of the time the GPU achieves less than 96.8 GFLOPS.

There may not be a huge gap in theoretical compute between the A7's and Tegra 4's GPU, but the architectural difference is astounding. You can hardly put a unified shader architecture that supports OpenGL ES 3.0 in the same league as a discrete pixel and vertex shader architecture that is limited to OpenGL ES 2.0. While these differences may not affect real-world performance, the omission of OpenGL ES 3.0 is bad for future-proofing. 

Interestingly, every current high-end SoC uses pretty much the same memory interface. The Tegra 4, Apple A7 and Snapdragon 800 have dual-channel DDR3L solution, except that the Tegra 4 and the Snapdragon 800 allow for a slightly higher clock speed (933MHz) versus the A7 (800MHz), giving the A7 12.8 GB/s peak theoretical memory bandwidth, versus 14.9 GB/s on the Tegra 4 and Snapdragon 800. While the A7 has technically less theoretical memory bandwidth than its competitors, it counteracts this with a very interesting solution. It turns out the A7 has 4 MB of SRAM on-die, acting as a L3 cache, which can be used to unload instructions off the main memory interface and hence increase the bandwidth. You may recall that a similar solution is used in the Xbox One's SoC to increase memory bandwidth. 

Considering the 4MB SRAM on the A7's die, it may turn out that the A7 can deliver significantly more memory bandwidth than the Tegra 4, but still, both have enough memory bandwidth to power ultra high-resolution (>1080p) tablets comfortably. 

The T-Rex HD test shows the Tegra 4 significantly behind the Apple A7 and also puts it as the slowest of the high-end mobile GPUs. The Apple A7, however, is only beaten by the Snapdragon 800, however only by a very small margin. 

The less intensive Egypt HD test also shows the Tegra 4 behind the A7 and other high-end mobile SoCs, but by a smaller margin. The A7 is the second slowest of these SoCs in this test, achieving slightly lower scores than the Mali-T628 in the Exynos 5420 and the Adreno 330 in the Snapdragon 800. Both tests show the Snapdragon 800 as the supreme mobile GPU.
ImgTech GPUs have always had industry leading fill rate capabilities, and it shows in the A7. The PowerVR G6430 GPU has a much higher fill rate than any of its competitors. On the ther end of the spectrum, we have the Tegra 4. Tegra GPUs have a tendency of being substandard in terms of fill rate, and it shows. The Tegra 4 manages a significantly lower fill rate score than every one of its competitors, especially the Apple A7. That's a problem, because the Tegra 4 is currently used to power some of the few tablets which boast 1600p displays, for example, the ASUS Transformer Pad TF701T. On devices with 1080p screens or less however, even the Tegra 4 probably won't run into any bottlenecking due to the limited fill rate. The Snapdragon 800 also doesn't do very well, as it's also outperformed by the Mali-T628 in the Exynos 5420.



Here, the Tegra 4 and the Apple A7 are in the lead, with the Apple A7 pulling ahead slightly.



Adding lighting per vertex for some reason causes the Apple A7 to lag behind all of its competitors, leaving the Tegra 4 on the lead.


When using per pixel lighting, the A7 once again falls behind everyone else, and this time the Tegra 4 also joins it with the second lowest score.

Even though in some cases the Apple A7 lags behind its competition severely, I highly doubt this is going to make performance suffer in any way, since most mobile games aren't very geometry bound. 

The Snapdragon 800, while not at the top spot in most of these tests, shows strong scores across the board, outperforming the whole competition by a significant margin in the fragment lit test. 

Power Consumption

All of the current high-end SoCs should have low enough power consumption, since they all use 28nm silicon. On the CPU side, the A7 enjoys a low core count as well as a low clock speed, so I don't expect the CPU to draw too much power. The Tegra 4, on the other side, has four power-hungry Cortex-A15 cores with a much higher clock speed, however, the shadow A15 core has potential to counteract the extra power consumed when the main A15 cluster is active. The S800 doesn't have any extra low power cores, and relies on the efficiency of the main Krait 400 cores to yield good battery life. But given Qualcomm's record of making CPUs with low idle power, this is definitely not a problem.

One optimization that Qualcomm makes to reduce power consumption is that it can have different clock speeds on each active core. The competitors' architectures only allow them to run every active core at the same clock speed, even if unnecessary. So, for example, if there are two cores active, one of them fully loaded and the other running a much lighter task, the Krait 400 will have the first core on its max clock speed, while the second core could have a much lower clock, while its competing CPUs will run both cores at the max clock speed, even if the second core doesn't really need it. This is one of the many optimizations that make the Krait 400 core very power efficient. 

I can't really tell whether it's the 72-core GeForce GPU, the PowerVR G6430 or the Adreno 330 that consumes less power, but given ImgTech's record of making the most power efficient mobile GPUs, it's not a stretch to assume that the G6430 is the GPU that draws less power. 



Conclusion

While the Tegra 4, the Apple A7 and the Snapdragon 800 have completely different architectures, I'd say that they're pretty close to each other, based on the performance they've showed on synthetic benchmarks. The differences between the CPUs are the most astounding. While Apple focused on keeping core count and clock speed low while driving up single-core performance, NVIDIA's (or rather, ARM's) and Qualcomm's solution offsets the relatively lower single-threaded performance by using more cores at a higher clock speed. While the former is probably better for overall system performance, as mobile OSes tend to rely much more on single-threaded performance, the latter is probably better for multi-tasking. In any case, it's evident that all current high-end SoCs are surprisingly close together when it comes to peak multi-threaded performance.

Comparing the Tegra 4, Apple A7 and the Snapdragon 800 as well as the rest of the high-end competition, it's clear that the only one that is truly distinguished is the A7. The Tegra 4 and the Exynos 5420, for instance, both have four Cortex-A15 cores with a similar clock speeds (1.9GHz vs 1.8GHz, respectively), and they also have a separate CPU cluster for handling light tasks with low power (the Tegra 4 has a single A15 core at its disposal, while the Exynos 5420 uses a quad-core Cortex-A7 cluster for the same purpose). The Snapdragon 800 uses a unique architecture, the Krait 400, in a quad-core configuration and even takes the clock speed beyond the norm with an insane 2.3GHz, but unlike two of its competitors, it doesn't need extra low power cores, but has other solutions to keep idle power consumption low.

In GFXBench's high-level GPU benchmarks, it seems that all four main high-end SoCs are more or less on the same level, with only the Snapdragon 800 slightly pulling head of the A7. In both high-level tests, however, we can see the Tegra 4 lagging behind all of its competition. How ironic.

GFXBench's Low-level tests show a huge difference between the current high-end mobile GPUs, however. In the fill rate department we see the Apple A7 blowing all of its competitors out of the water, and we also see the Tegra 4 on the bottom of the chart and the Snapdragon 800 slightly ahead of the Tegra 4, but still behind the Exynos 5420 and the Apple A7.

The verdict of this comparison is that, while pretty much all of the current flagship SoCs are pretty close in terms of CPU power, the Tegra 4 falters slightly when the GPU is put to the test. The Apple A7 does very well on the GPU side, but it's just slightly outperformed by the Adreno 330 GPU on the Snapdragon 800. But really, they're all so close it's hard to pick one as a definite winner. You could call the Snapdragon 800 the overall inner, but I say it's too close to call.