Computational Domain
The domain of previous Nvidia graphics processors with unified architecture consisted of 8 clusters, each including 16 unified execution units supporting all types of shaders. G200 not only has more ALU (the number of ALU has been increased from 128 to 240), but also has 10 clusters instead of 8 and uses a new way of grouping ALU into clusters.
Before that, each of the computational clusters included two shader processors (streaming processors in Nvidia’s terminology) each featuring 8 ALU. Now each cluster consists of three processors like that. Besides the L1 cache and instructions dispatcher, each still contains 8 ALU and a little local memory for exchanging data.

So, the computational capacity of the new G200 has increased significantly compared with G80 and G92. It is still far behind ATI RV770 with 800 ALU grouped in 160 shader processors, however, higher working frequency of Nvidia shader domain partially makes up for this difference. Since each G200 shader processor contains 24 ALU and 8 texturing units, the new core features 3:1 ALU:
Besides more execution units and new approach to grouping them into clusters, Nvidia also introduced a number of optimizations. Namely, they increased the number of threads a shader processor can process simultaneously from 768 to 1024; increased the number of general purpose registers and the internal buffers capacity; introduced support for dual-point precision floating-point calculations (FP64), which required a corresponding unit to be integrated into each of the 30 shader processors. The performance rate here is pretty modest though: Nvidia G200 in FP64 mode provides only about 90 Gflops, while ATI RV770 can theoretically hit 240 Gflops at dual-point precision calculations.
At the same time, new Nvidia chip has no DirectX 10.1 support, which seems to be more of a political move, rather than an implementation issue.






