Texture Processors and Caches
The texture processor subsystem was the main bottleneck of the R600 and RV670 graphics cores.
There were only 16 texture processors grouped into four big blocks. This was not enough even though math1ematics-heavy special effects prevail over high-resolution textures in modern games due to the multiplatform nature of many projects. Moreover, there was only one filter unit per each texture address units, which reduced the efficiency of the texture processors when performing texture filtering, especially anisotropic filtering. Anisotropic filtering is used widely today and is unlikely to be abandoned in near future.
The developers took these drawbacks into consideration and endowed the RV770 with new texture processors.

The texture processors have a completely new design. Each TMU now contains 16 FP32 texture samplers, four address units and four filter units. The efficiency of sampling seems to be low but this is compensated by the doubled bandwidth of the bus between the TMUs and the texture caches. ATI managed to increase the speed of filtering of 32-bit and 64-bit textures by 2.5 and 1.5 times respectively. It sounds good theoretically and should be as good in practical applications.
The texture processors are still united into large modules with four TMUs in each. Each such module services one of the ten SIMD cores. The TMUs have been optimized to contain fewer transistors and do not increase the overall size of the GPU much.
The cache subsystem is an important part of the GPU’s texture-processing system. It has been modernized in the RV770:

First of all, the caches have become faster. The speed of fetching textures from the L1 caches is now an impressive 480GBps. The L1 and L2 caches can communicate at a speed of 384GBps. Second, each SIMD core now has a dedicated L1 cache for efficient data storage. Third, the L2 caches are coordinated with the memory controllers. Fourth, the RV770 features a separate cache for storing vertex data. The improvements are not as obvious as with the texture processors, yet are expected to contribute to the performance of the Radeon HD 4800 in games. ATI’s new GPU has surely got rid of the main bottleneck of the Radeon HD architecture and can challenge Nvidia’s solutions at texture operations. ATI’s approach to designing GPUs is at its best here: optimizations instead of just increasing the amount of resources.



