Bookmark and Share

Articles: Video

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 ]

Computational Domain

The domain of previous Nvidia graphics processors with unified architecture consisted of 8 clusters, each including 16 unified execution units supporting all types of shaders. G200 not only has more ALU (the number of ALU has been increased from 128 to 240), but also has 10 clusters instead of 8 and uses a new way of grouping ALU into clusters.

Before that, each of the computational clusters included two shader processors (streaming processors in Nvidia’s terminology) each featuring 8 ALU. Now each cluster consists of three processors like that. Besides the L1 cache and instructions dispatcher, each still contains 8 ALU and a little local memory for exchanging data.

So, the computational capacity of the new G200 has increased significantly compared with G80 and G92. It is still far behind ATI RV770 with 800 ALU grouped in 160 shader processors, however, higher working frequency of Nvidia shader domain partially makes up for this difference. Since each G200 shader processor contains 24 ALU and 8 texturing units, the new core features 3:1 ALU:TEX ratio, which is lower than the same ratio by G80/G92 and RV770 of 4:1. Nvidia is evidently favoring texturing performance that is gradually losing its importance in contemporary games to mathematical performance. However, as we have already said before, G200 is pretty good at that, too: Nvidia claims that the peak computational capacity of the new GPU is around 700-800 Mflops, which is close to ATI RV770. Of course, we are talking about single-point precision calculations here.

Besides more execution units and new approach to grouping them into clusters, Nvidia also introduced a number of optimizations. Namely, they increased the number of threads a shader processor can process simultaneously from 768 to 1024; increased the number of general purpose registers and the internal buffers capacity; introduced support for dual-point precision floating-point calculations (FP64), which required a corresponding unit to be integrated into each of the 30 shader processors. The performance rate here is pretty modest though: Nvidia G200 in FP64 mode provides only about 90 Gflops, while ATI RV770 can theoretically hit 240 Gflops at dual-point precision calculations.

At the same time, new Nvidia chip has no DirectX 10.1 support, which seems to be more of a political move, rather than an implementation issue.

Pages: [ 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 ]

Discussion

Comments currently: 3
Discussion started: 08/05/08 10:22:58 AM
Latest comment: 08/10/08 03:31:23 AM

View comments

You must log in to add comments.

Forgot password? Registration

remember me