ATI Radeon HD 4800 Graphics Architecture: Long Anticipated Revenge?

We have been waiting for new graphics architecture from ATI for a long time, especially since the previous ATI Radeon HD generation couldn’t compete successfully against high-performance Nvidia solutions. Will ATI Radeon HD 4800 prove up to everyone’s expectations? Let’s try answering this question from the theoretical prospective and also check out the first graphics accelerator based on the new ATI RV770 chip – Power Color Radeon HD 4850 512MB.

by Alexey Stepin , Yaroslav Lyssenko, Anton Shilov
06/25/2008 | 07:09 PM

AMD’s graphics department ATI has not been doing quite fine lately. The company has not been able to introduce a graphics architecture that would be truly competitive against Nvidia’s GeForce 8/9, especially in the top-performance sector. Even though not a complete failure, the ATI Radeon HD 2900 was far from a success. This hot and uneconomical GPU was noticeably inferior to Nvidia’s G80-based solutions mostly due to questionable architectural solutions such as the superscalar architecture of the execution units and the lame design of the texture modules coupled with the insufficient amount thereof.

ATI’s profits had declined dramatically but the release of the next GPU, codenamed RV670 and installed on the ATI Radeon HD 3800 series, helped improve the situation somewhat. The new GPU was free from the drawbacks of the R600 chip but lacked any really new features of its own. The only breakthrough was the 55mm manufacturing process. Otherwise, the RV670 inherited the R600 architecture, acquired support for UVD and DirectX 10.1, but lost the 512-bit external memory bus. The ATI Radeon HD 3870 and 3850 became bestsellers among gamers who could not afford to spend over $250 for their graphics card as these solutions delivered acceptable performance at a reasonable price. And it is no secret that mainstream products account for the largest share in total sales.

Later on, the ATI Radeon HD 3000 series was complemented with the unique dual-processor Radeon HD 3870 X2. ATI proved its ability to be a technological leader, even though for a little while. The release of the Nvidia GeForce 9800 GX2 made it obvious that the potential of ATI’s Radeon HD 2000/3000 architectures had been exhausted. It also became more and more obvious that the era of monolithic monster chips was approaching its end. Trying to reach higher performance the developers were increasing the size, complexity and power consumption of the GPUs but the chip yield was lowering while the manufacturing cost was growing up. As a result, a graphics card with such a monster chip just could not be inexpensive.

Drawing on its experience of developing the ATI Radeon HD 3870 X2, AMD decided to default from the next round of the race for packing more and more execution subunits into the single graphics processor. It was not AMD’s goal to develop the largest, powerful, and most expensive GPU in the world although AMD might have tried to do that. The company has enough development resources whatever Nvidia can say. Instead, the RV770 is the result of a new strategic concept. The key of the concept is a relatively inexpensive but fast GPU that should meet the requirements of the bulk of gamers. Graphics cards with the new chip are expected to cost about $300 whereas enthusiasts who want to have maximum performance irrespective of the price can be satisfied by means of multi-chip solutions such as ATI Radeon HD 3870 X2.

As we noted in our ATI Radeon HD 3870 X2 review, this approach has its downsides, yet it has a lot of advantages, too. First of all, it targets the main category of users. Second, solutions with higher performance can be created easily. Designing a dual-chip graphics card takes more effort and time than developing a higher-performance monolithic graphics core from scratch. Clearly, ATI’s new chip must be as fast as the best single-chip solutions of the previous generation for this strategy to work.

This review is going to give you the answer about the potential of the new graphics solutions from ATI Technologies. We’ll discuss the details of the RV770 architecture and we’ll start with telling you the characteristics of RV770-based graphics cards.


ATI Radeon HD 4800: Newest Graphics Processor from ATI

The RV670 (Radeon HD 38x0 series) had a new number in the series name although differed but slightly from the previous R600 core (Radeon HD 2900). The RV770 has more reasons to have a new number in the series name as it is indeed a new product even though with many traits inherited from its predecessors. The new series is called ATI Radeon HD 4800 and the naming system first used for the Radeon HD 3800 series is maintained. The first numeral means the graphics architecture generation. The second numeral stands for the family and the last two denote the specific graphics card model.

Two models were revealed at the moment of the announcement of the new graphics architecture: ATI Radeon HD 4850 with a recommended price of $199 and ATI Radeon HD 4870 with high-speed GDDR5 memory and a recommended price of $299.

Click to enlarge

Click to enlarge

The RV770 core incorporates an impressive 956 million transistors. Nvidia’s GT200 incorporates 1.4 billion, though. However, this is hardly an achievement for Nvidia as its GPU is manufactured on less advanced 65nm tech process. Coupled with the huge size and complexity of the GPU, there are fewer GPUs that can be made out of one wafer and the chip yield is lower. Consequently, the manufacturing cost of one chip is high. Nvidia has been following this approach for the last few years, though. GT200-based cards will hardly get cheaper as opposed to ATI’s new RV770-based solutions. So, ATI’s approach seems to work better here.

The GPU clock rates have been lowered considerably in comparison with the RV670-based cards because of the higher complexity of the new core. This shouldn’t be a problem as the new chip features high computing and texture-mapping capacity. Another noteworthy thing, the fast GDDR5 memory installed on the senior Radeon HD 4800 helps achieve high bandwidth without widening the memory interface as ATI did in the last year and Nvidia does today. When the memory bus is wider than the traditional 256 bits, the PCB becomes complex and more expensive to make. Of course, GDDR5 is more expensive than GDDR3 but this difference seems to be made up for by the simpler PCB design. This is indicated by the relatively low recommended price of the Radeon HD 4870. It is only $299, i.e. $100 lower than the recommended price of the Nvidia GeForce GTX 260 and $130 lower than the price of the GeForce GTX 280. There can only be one problem about the memory. GDDR5 is not yet widespread, and there may be a shortage of graphics cards due to the lack of the required memory.

The specifications table shows that the RV770 is stronger in both computing and texture-mapping capacities. The higher performance of the texture processors is the most important innovation because they used to be the weak spot of the RV670. There now being 40 of them instead of 16, the TMUs won’t be a bottleneck anymore even if their architecture hasn’t changed. For comparison, the Nvidia GT200 core incorporates 80 TMUs in its maximum configuration but their performance in real-life applications is only half of the potential. So, the RV770 should be equal in this respect even though the Radeon HD 4870 is not actually positioned as an opponent to the Nvidia GeForce GTX 280/260. We’ll check this out in our practical tests soon.

The superscalar Radeon HD architecture is known to be sensitive to shader code optimizations notwithstanding the special-purpose task dispatcher available in every GPU of this generation. With the amount of ALUs increased from 320 to 800, the GPU will not slow down too much if the driver is not optimized for a specific 3D application. In the worst case, one out of each five ALUs will operate, which means 160 operating ALUs in total (as opposed to the RV670’s 64). These 160 ALUs should deliver about the same performance as the GeForce 9800 GTX provides. In the best case, the computing capacity of the Radeon HD 4850 may reach 1 teraflop. That is, this modest $199 card can challenge the expensive GeForce GTX 280 that is declared to have a computing capacity of about 1teraflop, too.

The raster processors have been improved, too. There are as many of them as in the previous core, but they are two times more efficient at processing the Z-buffer. Thus, the RV770 can process 64 Z-values per clock cycle as opposed to the RV670’s ability to process only 32 Z-values per clock cycle.

The new card seems to have a huge potential. We’ll discuss its innovations in more detail before testing it performance.


ATI Radeon HD 4800: Say "No" to the Ring

The RV770 core viewed through a microscope:

You can see that the memory access subsystem has retained its overall topology although it is not a ring anymore. Nearly every memory controller is connected to another one with a bidirectional interface but the ring is not complete. The memory interface is still located at the die’s perimeter. Memory-sensitive functional subunits are placed nearby:

The core has got a switch connecting the subunits that are not sensitive to memory bandwidth such as the PCI Express interface, CrossFireX interface, UVD2 video-processor, display controllers, etc. According to ATI, the memory subsystem resources of previous Radeon HD GPUs were utilized with 85% efficiency while the optimizations implemented in the RV770 make it almost 100%. Coupled with fast GDDR5 memory, this helps the new card do without a wider memory bus while keeping the PCB simple.


ATI Radeon HD 4800

Ultra Threaded Dispatch Processor 3.0?

The task dispatcher is a key component of every modern GPU. Its job is to distribute the overall load uniformly among the available GPU resources in order to achieve maximum performance.

The task dispatcher first appeared in the ATI Radeon X1000 series where it could control 512 code branches with 16 pixels in each. The second version of the dispatch processor was introduced in the Radeon HD 2000: it could process more code branches in a more efficient manner as the minimum size of a branch was reduced from 16 to 5 pixels.

There is no accurate information about this aspect of the RV770 but it is clear that the number of arbiters and sequencers has been increased along with the number of SIMD arrays. Besides, each SIMD array can now use data from another one which required certain modifications of the dispatcher’s algorithm. Additionally, the Radeon 4800 architecture features a number of GPGPU optimizations which may also mean changes in the dispatcher’s operation.


When 160 Equals 800

The computing part of the R600 and RV670 chips consisted of 64 universal units each of which had five ALUs, a flow control unit, and an array of general-purpose registers. Four out of the five ALUs were rather simple, capable of executing one FP MAD instruction. And the fifth ALU was complex, capable of processing such instructions as SIN, COS, LOG, EXP, etc. In fact, each execution unit was a processor with a five-stage pipeline.

Theoretically, the GPU contained 320 execution units but this was only true when all the 64 pipelines were loaded, which was not always the case. In 3D applications many operations depend on the results of previous operations, so it is hard to keep the pipeline loaded always. Application-specific optimizations in the Catalyst driver were required for that but it is often impossible to get access to the game code until it is officially released.

As the consequence, the ATI Radeon HD architecture often found itself using but one ALU in each execution unit and lagging behind the competing G80/G92-based solutions from Nvidia. The latter not only had more independent execution modules but also worked at higher clock rates. Creating the RV770, the ATI developers solved the problem of the potential inefficiency of the superscalar architecture in the most direct way – by increasing the number of execution modules from 64 to 160. Of course, it means there are more transistors in the core, but the 55nm tech process helped keep the size of the core within reasonable limits.

The architecture of the units has not changed much. Each of them still consists of five ALUs, one flow control unit, and a few general-purpose registers.

ATI claims the execution units are now 40% more efficient, but the brutal increase in number (from 64 to 160) is already enough to make the Radeon HD 4800 competitive even under unfavorable conditions. That’s not all, though. As we mentioned above, there are more global changes on the core topology level. With the ring topology retained partially, the placement of the functional subunits has been optimized. The RV770’s execution subunits have been joined into 10 SIMD cores (the previous GPU had 4 such cores) with 16 modules (80 ALUs) in each:

Each execution core has a dedicated control logic, 4 TMUs and L1 cache. The cores can communicate locally as well as globally.

Note that the ratio of computing and texture-mapping units has remained the same at 4 to 1. ATI thinks it optimal. You may argue the point but the argument makes no sense for the RV770 because, unlike its processor, this GPU shouldn’t feel a lack of computing or texture-mapping power. Of course, the new chip offers full support for DirectX 10.1.


Texture Processors and Caches

The texture processor subsystem was the main bottleneck of the R600 and RV670 graphics cores.

There were only 16 texture processors grouped into four big blocks. This was not enough even though math1ematics-heavy special effects prevail over high-resolution textures in modern games due to the multiplatform nature of many projects. Moreover, there was only one filter unit per each texture address units, which reduced the efficiency of the texture processors when performing texture filtering, especially anisotropic filtering. Anisotropic filtering is used widely today and is unlikely to be abandoned in near future.

The developers took these drawbacks into consideration and endowed the RV770 with new texture processors.

The texture processors have a completely new design. Each TMU now contains 16 FP32 texture samplers, four address units and four filter units. The efficiency of sampling seems to be low but this is compensated by the doubled bandwidth of the bus between the TMUs and the texture caches. ATI managed to increase the speed of filtering of 32-bit and 64-bit textures by 2.5 and 1.5 times respectively. It sounds good theoretically and should be as good in practical applications.

The texture processors are still united into large modules with four TMUs in each. Each such module services one of the ten SIMD cores. The TMUs have been optimized to contain fewer transistors and do not increase the overall size of the GPU much.

The cache subsystem is an important part of the GPU’s texture-processing system. It has been modernized in the RV770:

First of all, the caches have become faster. The speed of fetching textures from the L1 caches is now an impressive 480GBps. The L1 and L2 caches can communicate at a speed of 384GBps. Second, each SIMD core now has a dedicated L1 cache for efficient data storage. Third, the L2 caches are coordinated with the memory controllers. Fourth, the RV770 features a separate cache for storing vertex data. The improvements are not as obvious as with the texture processors, yet are expected to contribute to the performance of the Radeon HD 4800 in games. ATI’s new GPU has surely got rid of the main bottleneck of the Radeon HD architecture and can challenge Nvidia’s solutions at texture operations. ATI’s approach to designing GPUs is at its best here: optimizations instead of just increasing the amount of resources.


Render Back-Ends

Raster processors or render back-ends (RBEs) in ATI’s terminology have never been a weak spot of the Radeon HD architecture, but the RV770 features certain improvements in this area, too. The number of these units hasn’t changed. The chip contains four raster back-ends equivalent to 16 classic ROPs.

The developer’s goal was to increase performance when performing full-screen antialiasing and increase efficiency when processing the Z-buffer/stencil buffer. The number of appropriate subunits has been doubled in the latter case.

As the result of the modernization, the scene fill rate with enabled FSAA has doubled for both 32-bit and 64-bit color while the number of Z/stencil values processed per clock cycle has increased from 32 to 64, which is more than the G92 can do when using FSAA. In other words, ATI has outperformed the opponent from the aspect where its solutions have always been on the losing side!

The render back-ends of the RV770 support classic fixed multisampling modes including the interesting mode that combines classic MSAA with edge antialiasing to achieve a level of antialiasing equivalent to 12-24x MSAA. This mode was announced back in the Radeon HD 2000.

The CFAA modes introduced earlier employed the wide and narrow tent filters to sample subpixels outside the pixel without counting in the edges of polygons. This improved the overall image quality but also made the image fuzzy. The CFAA edge detect mode helps avoid that fuzziness.

The programmable sampling filter of this mode is set up in such a way as to sample subpixels in the vicinity of a polygon edge only. This improves the quality of antialiasing, especially on small details such as hanging cables, but without the undesirable fuzziness typical of the less intellectual CFAA algorithms. The graphics memory usage is the same as with ordinary 4x/8x MSAA modes. Note that the new CFAA mode has become available for the owners of earlier Radeon HD cards.


Tessellation Unit Evolved

The hardware tessellation unit was one of the most questionable parts of ATI Radeon HD 2000 and HD 3000 graphics processing units. The tessellator consumed a part of transistor budget, but did not provide any benefits at that time, moreover, at the same time ATI/AMD made a rather controversial decision to rely on software multi-sample antialiasing (MSAA) resolve, which many considered as one of the main performance handicaps of the two families. A year has passed and the tessellation unit got evolved.

Back in May, 2007, it was a big surprise for us that the developer decided to program the tesselator using vertex shaders, not using DirectX 10 geometry shaders. At that time ATI explained that the ATI R600’s tessellation processor was taken from Microsoft Xbox 360 game console, which features DX9-class graphics chip developed by ATI Technologies. The tessellation unit of the ATI RV770 graphics processing unit can be programmed by both geometry and vertex shaders, hence, it is completely backwards compatible.

No tesselation

With tesselation

According to officials from AMD’s graphics product group, there is a number of games that take advantage of ATI’s programmable tesslation unit incoming and all of them were developed on ATI Radeon HD 2000 and HD 3000 hardware, which means that quite a number of end-users will be able to enjoy improved image quality and, perhaps, even feel themselves happy for choosing ATI Radeon over Nvidia GeForce.

But while the actual video games are enroute, those who have ATI Radeon HD 3000 or 4000 may enjoy The Froblins demo, which uses DX10.1’s global illumination technique, HDR, lighting and post-processing along with tessellation for frogs-goblins and terrain. Moreover, in the demo ATI Radeon HD hardware can even compute artificial intelligence, something new for graphics processors, isn’t it?


DirectX 10.1: Supported by Electronic Arts, Sega and Counting…

Perhaps, DirectX 10.1 is not an immediate success, but it is on track to find home in several titles, which is good news for ATI, graphics product group of AMD.

Usually, any super-set of a DirectX release does not have a lot of chances to become popular among video game creators and publishers unless it is supported by all developers of graphics hardware. This happened to DirectX 8.1 and pixel shaders 1.4 in 2001, the same happened to shader models 2.0a and 2.0b of DirectX 9.0 in 2003/04, whereas the shader model 3.0 only became more or less wide-spread two years after its release and the launch of ATI Radeon X1000 lineup along with Microsoft Xbox 360 video game console.

The destiny of DirectX 10.1 seems to repeat the destinies of DirectX 8.1 and 9.0c: video game developers would hardly embrace a new application programming interface that is supported by only one independent hardware vendor (IHV) unless AMD supports them in some way. In fact, the first title that took advantage of DirectX 10.1 – Assassin’s Creed made by Ubisoft Montreal – quickly lost it after, as it is widely believed, Nvidia pressured the developer of this title that belongs to the company’s The Way It’s Meant to Be Played initiative... What is interesting to note in the particular case is that Nvidia will support DirectX 10.1 automatically once it launches DirectX 11-compatible graphics chip and will be able to take advantage of all the pros of the 10.1.

Fortunately for ATI, not all PC games belong to the aforementioned program and there are at least two coming in within the next six months to nine months that take advantage of DirectX 10.1: BattleForge developed by Phenomic and published by Electronic Arts as well as an unnamed title from Sega.

 

BattleForge by Phenomic EA is a fantasy online real-time strategy (watch the video here), which involves loads of battle units. Image quality in the game is truly impressive: all the units have high-quality geometry, terrain and vegetation look pretty realistic and special effects are remarkable. According to officials from Phenomic Studio, the BattleForge game runs about 30% faster on DX10.1 compared to DX10 thanks to lower amount of rendering passes needed (perhaps, not in all types of scenes). The game is set to emerge already this year, but the release date is unknown.

 

Little is known about the DirectX 10.1 game title set to be published by Sega in early 2009 and formally unveiled sometime in July, possibly at the E3 convention. According to Chris Southall, technical director of Sega Europe, the title will only have DirectX 10 and 10.1 rendering paths, hence, will not work on DirectX 9-compatible hardware and will not function under Windows XP operating system. What is the most important, the unnamed title is PC-exclusive. According to Sega, DirectX 10.1 allowed the developer to create its title “easier”, make it look “prettier” and make it work “faster”, though, no actual details were unveiled.

Obviously, two or three games cannot make an API truly successful/popular and there will be many months before DirectX 10.1 will truly be required to play games with high-quality effects and decent frame-rates. Nevertheless, the support of DirectX 10.1 is an indisputable advantage of ATI Radeon HD 3000 and HD 4000 series at the moment.


ATI Radeon HD 4800: Introducing the New Video Engine

Besides a number of innovations inside the ATI RV770 graphics processing unit, the chip also sports numerous enhancements when it comes to video engine. In particular, graphics product group of Advanced Micro Devices implemented a new audio controller into the ATI RV770 GPU as well as added certain software improvements.

The main enhancement of ATI Radeon HD 4800 high-definition video feature-set is a new audio controller from Realtek, which now supports 7.1 channel audio with up to 6.144Mb/s bitrate and 192KHz sample rate along with AC3, DTS, Dolby True-HD and DTS-HD support. The new audio controller will allow ATI’s new hardware to output better quality audio via HDMI port, which is likely to become an important feature for home-theater personal computers. Furthermore, ATI Radeon HD 4800 is now the only graphics card that natively supports 7.1 audio output over HDMI, a tangible advantage over the competing products.

The video playback engine of ATI Radeon HD 4800 features universal video decoder 2 (UVD), which now supports dual-stream video playback (which is useful while watching Blu-ray or HD DVD movies with picture-in-picture feature enabled) with bitstream processing present for all codecs used nowadays: VC-1, H.264 and MPEG-2 HD. We are unsure whether this is a hardware-based innovation or an improvement of software, but we will check this out at the earlier opportunity.

An interesting feature that ATI/AMD advertises is high-quality upscaling of DVD content to HD and up-conversion of HD content to beyond HD resolution. The company is tight-lipped regarding patterns its products use, therefore, they both should be checked before making any conclusions.

It worth a note that ATI, just like Nvidia several months ago, has implemented dynamic contrast adjustment for video with the new drivers, something, with which videophiles will hardly be pleased. The content is filmed and adjusted in accordance with directors’ decisions and some other requirements, not with decisions of driver developers, which means that dynamic contrast tuning is a very debatable thing in general.

Finally, the new ATI Radeon HD 4800 can encode high-definition video into H.264 or MPEG2-HD formats with its stream processors using Cyberlink’s PowerDirector software. According to ATI, a Radeon HD 4800-series graphics chip can transcode one hour of 1080p video in 32 minutes, whereas it takes 9 hours 54 minutes to do the same task using Intel Core 2 Duo E8500 (3.16GHz) central processing unit, 19 times faster.

We have just discussed theoretical aspects of the new ATI Radeon HD 4800 series. It is high time we got to real products and introduced to you an ATI Radeon HD 4850 graphics card from Power Color.


Power Color ATI Radeon HD 4850 512MB GDDR3: Package and Accessories

Tul Corp., which owns Power Color brand-name, has been ATI-exclusive partner for many years now, hence, it is completely expectable that the company is among the first to unveil its ATI Radeon HD 4800-series lineup. Regrettably, the realities of the today’s graphics board business do not allow even first-tier partners to truly differentiate themselves during the initial product launch from each other. What is good, is that ATI Radeon HD 4800-series graphics boards from Power Color are already available, whereas modified versions are being prepared.

Power Color HD 4850 512MB (model AX4850 512MD3-H) comes in a moderate-size box that resembles packages from the company in the last year. The illustrations on the front side of the box give an idea regarding the chip that is onboard, the amount of memory as well as support for dual-link DVI and HDMI.

The new motto of Power Color’s graphics cards is “Mesmerizing 3D Graphics for True Gamers”, but while we can say that we are impressed with the architecture of the ATI’s Radeon HD 4800, we are hardly mesmerized with the lady who holds a sword drawn on the box. There are so many packages with robots, monsters and warriors that they are hardly eye-catching nowadays.

The box contains a cardboard tray. The graphics card and accessories lie in the compartments of this tray. Here is what was found inside:

The set of accessories is quite common for this class of graphics adapters, it includes virtually everything what is needed to use the graphics board, however, it does not contain any bonuses that would be appreciated, such a games or free software.

It is especially regrettable that the graphics board does not feature software player for playback of high-definition video on Blu-ray or HD DVD media that takes advantage of ATI Radeon HD 4800’s decoding engine and post-processing capabilities. Usually, software high-def players from companies like Cyberlink cost approximately $50 in retail, but bundled versions cost manufacturers a lot less than that, therefore, it is a surprise to see that even with the new generation of graphics cards Power Color decided not to include such player into the bundle.

To sum it up, we definitely liked the Power Color HD 4850 512MB graphics card: it comes in a moderate sized box, which means that the manufacturer does not want end-users to pay for stock storage space and/or think that he/she acquires something extraordinary that should come in a large package; the product bundle includes virtually everything that is needed to use the product and does not include certain things that would be redundant (nobody needs a YPbPr cable if HDMI connection is used). However, we regret that the product lacks high-definition video player.


Closer Look at ATI Radeon HD 4850

PCB Design and Specs

Unfortunately, we received the senior model, Radeon HD 4870, a little too late to make it to our today’s article, so today we will only discuss the junior card, Radeon HD 4850. Since all performance-mainstream and high-end graphics boards that are released at the launch day of a product are made under supervision of their developer by contract manufacturers, from now on we will refer Power Color HD 4850 graphics card as ATI Radeon HD 4850 to simplify the read.

ATI Radeon HD 4850 is less interesting from a technical standpoint as it uses common GDDR3 memory. On the other hand, it may be the more appealing product in the buyer’s eyes because it comes at a lower recommended price ($199). We will discuss the technical features of the ATI Radeon HD 4870 and Radeon HD 4870 X2 later, when these graphics cards hit the shops.

Thanks to the meticulous optimizations of the graphics architecture, the developer has managed to come up with a new-generation graphics card that is theoretically comparable to the Nvidia GeForce 9800 GTX but is no larger than the ATI Radeon HD 3850. The ATI Radeon HD 4850 does not look serious. With its modest appearance of a mainstream product, it doesn’t seem to contain an impressive potential.

The new Radeon HD 4850 is compact, its PCB being no longer than the PCB of the Radeon HD 3850. As a matter of fact, there are few external differences between these two cards. The only difference that catches the eye is the fan with numerous blades. Of course, the most exciting things are hidden beneath the cooler and we removed it to have a look at the new card.

The power circuit of the Radeon HD 4850 is astonishingly simple despite the declared power consumption of 110W. Like on the previous card, the GPU voltage regulator is based on the dual-phase PWM controller uP6201 from uPI Semiconductor – we have seen this chip on other products from ATI. The load-bearing section consists of eight Infineon OptiMOS 3 transistors, four in each phase. So, there are no differences from the Radeon HD 3850 here. Notwithstanding the two phases only, you shouldn’t worry about the power circuit. Experiments with extremely overclocked Radeon HD 3870 proved that this circuit can easily cope with loads much higher than 150W. A separate regulator with an uPI UP6101 controller and two power transistors is responsible for the memory. Having familiar components to deal with, enthusiasts are sure to attempt to modify the power circuit with the purpose of overclocking. Perhaps one of our upcoming reviews will be dedicated to extreme overclocking of ATI Radeon HD 4850.

The power circuit is equipped with only one external 6-pin PCI Express 1.0 power connector. Coupled with the PCI Express slot, this is enough to feed the Radeon HD 4850 whereas the ATI Radeon HD 4870 has two external power plugs and features a more advanced multiphase power circuit.

GDDR5 memory is yet behind the scenes because the Radeon HD 4850 comes with common and ordinary GDDR3. Eight 512Mbit chips (16Mbit x 32, Qimonda HYB18H512321BF-10) make up a 512MB local memory bank. This amount is in fact the required minimum for today. Fortunately, ATI’s solutions use their memory efficiently as opposed to Nvidia’s ones, so we shouldn’t expect performance hits in the most demanding games. Curiously, ATI returned to the traditional L-shaped placement of the memory chips in the new card instead of placing them in a semicircle around the GPU.

The memory voltage is 2.0V. The -10 suffix denotes an access time of 1.0 nanoseconds and a rated frequency of 1000 (2000) MHz. The card’s memory frequency is 993 (1986) MHz, providing a memory bandwidth of 64GBps. Not much if compared with Nvidia’s new solutions, but having a wide memory interface is not enough. The key is in using it effectively. This point has already been proved by the ATI Radeon HD 3800 that was no worse than the Radeon DH 2900 notwithstanding the twice narrower memory bus. And the RV770 features higher efficiency of using the memory bandwidth than the RV670. So, the Radeon HD 4850 is unlikely to suffer a lack of memory bandwidth.

The RV770 die looks large. Indeed, the 956-million-transistor core can’t be small. On the other hand, it measures 270 sq. mm which is only 37% larger than the previous-generation RV670 core (190 sq. mm). Moreover, the RV770 is more complex but smaller than Nvidia’s 55nm G92b core. This is the result of the architectural optimizations coupled with 55nm tech process. You can only wonder how smaller the RV770 is if compared with the Nvidia GT200 which not only incorporates 1.4 billion transistors but is also manufactured on 65nm tech process.

As opposed to the RV670, the RV770 has a protective metallic frame that prevents the cooler from misaligning and damaging the GPU die. The core is marked in an incomprehensible way. Most of it is occupied by a Radeon logo. The rest of the marking is a mysterious set of symbols plus the manufacturing date.

The ATI Radeon HD 4800 is not expected to come in cut-down configurations. Every subunit is enabled in the core: 160 superscalar execution modules with five ALUs in each, 10 large texture processors equivalent to 40 TMUs, and four raster back-ends equivalent to 16 classic ROPs. The GPU frequency is 625MHz. Lower than that of the ATI Radeon HD 3850, but the difference is made up by the new card’s having more TMUs and shader processors as well as featuring various architectural improvements.

Besides that, the core incorporates the CrossFireX interface logic (to support multi-processor subsystems including up to four Radeon HD 4800 cards), display controllers (CRT, DVI, HDMI, DisplayPort) and the UVD 2 video-processor. The latter supports the capabilities described by BD profiles 1.1 and 2.0, particularly the decoding of two video streams necessary for such features as Picture-in-Picture. The execution section of the RV770 chip is utilized for laying one picture on top of the other. The functionality of the integrated audio core is enhanced. It can now output eight-channel sound in 24bit/192kHz format and supports Dolby TrueHD and DTS-HD, being compliant with the HDMI 1.3 specification. Nvidia’s solutions can’t match that as yet.

The left part of the PCB is almost empty. The card is equipped with two dual-link DVI-I ports (with support of display resolutions up to 2560x1600), a standard mini-DIN port and a couple of CrossFireX connectors.


Cooler

The cooling system of the reference Radeon HD 4850 raises our apprehensions as it resembles the cooler of the Radeon HD 3850 although the new card should have much higher heat dissipation. Nvidia’s blunder when the company installed too weak a cooler on early GeForce 8800 GT 512MB is well known to everyone. Is the ATI Radeon HD 4850 safe? We’ll discuss the temperature factor shortly. Right now let’s have a closer look at the cooler.

In fact, the single difference from the Radeon HD 3850 cooler is the multi-blade fan and the different shape of the heatsink. Otherwise, it is the same single-slot design that seems to be copper but is actually mostly anodized aluminum. You can scratch the base to make sure of that. The only copper things in it are the core that contacts with the GPU die and the heat pipe in the base. The heatsink consists of thin aluminum plates and is quite large but it has to dissipate up to 110W of heat. The modified shape of the ribs is meant to help the heatsink do its job well: the ribs are directed away from the mounting bracket, exhausting the hot air towards the side panel of the system case where a vent grid is usually located.

The cooler is equipped with a new fan that has as many as 19 blades. This should increase its static pressure and cooling performance. The fan uses a 4-pin connection with a tachometer and PWM-based speed regulation that has become traditional for all modern graphics cards. The noise parameters of the cooler will be discussed in the next section.

The down side of the cooler is quite ordinary. The positions of the pads for the memory chips are different from the Radeon HD 3850. The elastic pads are the same, though. You can also see such pads at the places of contact between the cooler and the power transistors and inductors of the power circuit. The base forms a small needle-shaped heatsink on the face side against this spot. Dark-gray thermal paste serves as the thermal interface between the GPU’s copper core and the cooler.

The cooler is fastened firmly. Besides four poles with spring-loaded screws and a metallic X-shaped back-plate, it is secured with eight additional screws on the PCB. The metallic frame on the GPU provides additional protection.

The cooler of the Radeon HD 4850 card is quite good overall but it may not be enough for a card with a power consumption of 110W. We’ll check this out right now.


Power Consumption, Temperature, Noise and Overclocking

The ATI Radeon HD 4850 is declared to have a peak power consumption of 110W. We checked this out using our special testbed with the following configuration:

We tried to use 3DMark Vantage to create a 3D load but this benchmark proved to load the card less than the first SM3.0/HDR test from 3DMark06. That’s why we adhered to our earlier methodology and tests. As usual, we ran the test at 1600x1200 with 4x FSAA and 16x AF. The Peak 2D mode is emulated by means of the 2D Transparent Windows test from PCMark05.

Here are the results:


Click to enlarge

The 3D mode result coincides with the number published by ATI Technologies. 110W is quite a lot, but the RV770 consists of as many as 956 million transistors. The Radeon HD 4850 is obviously an economic product. It consumes less power than the GeForce 9800 GTX that uses a less complex GPU with more modest specs. And it is far superior to the ATI Radeon HD 3870 X2 and Nvidia’s GT200-based solutions in this respect. The power consumption in the 2D and Peak 2D modes is quite high in comparison with the ATI Radeon HD 3870 but still reasonable. It is roughly comparable to that of the Nvidia GeForce 9800 GTX.

The load distribution is just what you could expect. The external PCI Express 1.0 connector is under the highest load in 3D mode – almost reaching 75W, the maximum allowable load. We wouldn’t be surprised to see per-overclocked versions of Radeon HD 4850 come with an 8-pin connector or with two 6-pin ones (like the Radeon HD 4870).

Notwithstanding the modest cooler, the card is not too hot. The GPU temperature is 62°C when idle and 86°C under 3D load. Well, the temperature is high under load, especially as we tested the card with the side panel of the system case removed. The card was very hot to the touch. In a cramped system case or under a higher ambient temperature the GPU may easily get even hotter. It is clear that the reference cooler of the Radeon HD 4850 is not meant for overclocking, but replacing it with something better shouldn’t be a problem. Thanks to the matching mounting holes you can use every cooler that supports Radeon HD 3870/3850 cards.

Next we measured the level of noise produced by the card with a digital sound-level meter Velleman DVM1326 using A-curve weighing. The level of ambient noise in our lab was 36dBA and the level of noise at a distance of 1 meter from the working testbed with a passively cooled graphics card inside was 43dBA (it is due to the Enermax Galaxy DXX EGX1000EWL power supply, which is not very quiet). Here are the results:

The new 19-blade fan affects the noise parameters of the new card, especially at a close distance, yet the card is barely audible among the other system components. If you’ve got a quiet system configuration, you may want to replace the reference cooler with something quieter, such as Zalman VF1000.

We tried to overclock the card using the appropriate option of the Catalyst Control Center. We managed to increase the GPU frequency to the highest permitted value – 700MHz. The memory chips sped up to 1100 (2200) without losing stability. We didn’t test the card at the overclocked frequencies because the frequency gain was small and we didn’t have quite enough time for that.

We found no compatibility issues between the new card and mainboards. The ATI Radeon HD 4850 successfully started up on our PCI Express 1.0a mainboards as well as on a modern Intel X38 based mainboard with support of PCI Express 2.0.


Testbed and Methods

To study the theoretical potential of the new ATI Radeon HD 4850 and to compare it against previous generation ATI and Nvidia graphics accelerators we assembled the following testbeds:

According to our testing methodology, the drivers were set up to provide the highest possible quality of texture filtering and to minimize the effect of software optimizations used by default by both: AMD/ATI and Nvidia. Also we enabled transparent texture filtering. As a result, our ATI and Nvidia driver settings looked as follows:

ATI Catalyst:

Nvidia GeForce:

To study the theoretical performance of our today’s testing participants we used the following applications:


Performance in Synthetic Benchmarks

Fillrate

“Pure” fillrate of ATI Radeon HD 4850 is lower than that of ATI Radeon HD 3870. it can be explained by lower graphics core frequency and as a result, lower RBE/ROP frequency as well as lower memory bandwidth.

When it comes to Z-buffer operations, we see a definite performance increase. Even though RV770 has twice the Z processing units, the performance numbers do not double, as you may have expected them to. The performance gain is about 13% - not quite what we expected from the newcomer. However, taking into consideration lower core frequency of the 4850 model compared with 3870, the modest performance gain we see is quite natural.

From 2 textures on, ATI Radeon HD 4850 starts showing its advantages over the rivals, which is definitely the result of ATI engineers’ optimizations. Only in case of four textures 4850 yields a little to Nvidia GeForce 9800 GTX, but even in this case the lag is less than 10%.

As you can see, raster processors are no bottleneck for the new ATI Radeon HD 4800: even the youngest model in this lineup can compete on equal terms with Nvidia GeForce 9800 GTX, despite lower core frequency.


Pixel Shaders and Physics

Marko Dolenc’s Fillrate Tester doesn’t support even Shader Model 3.0, but suits very well for analyzing graphics architecture performance with older shader code versions. The results obtained in this benchmark may be pretty valuable for those who still play old games. Besides, they give a more or less good idea of the GPU potential.

With simple shaders ATI Radeon HD 4850 yielded not only to Nvidia GeForce 9800 GTX but also to ATI Radeon HD 3870. However, since the speed of shader model 1.1 and 2.0 processing has been stumbling upon raster units performance for quite some time now (in this benchmark), the results are hardy surprising considering what we have just seen in fillrate tests. Pretty low performance of ATI Radeon HD 4850 in this shader model is very unlikely to affect the real gaming performance in any way.

As for the per-pixel lighting shader, the results here are quite logical considering the increased math1ematical potential of the new ATI RV770.

Shader Particles test from 3DMark06 suite is not a fully-fledged graphics test as it emulates a physical model of massive particle systems’ behavior. Collision calculations are performed using pixel shaders and the result is displayed on the screen using vertex shader texture samples. Nevertheless, this test can measure the math1ematical GPU performance just fine.

As you see, ATI Radeon HD 4850 demonstrates a 50% advantage over Nvidia GeForce 9800 GTX and almost 70% higher speed than ATI Radeon HD 3870 with half the math1ematical capacity of the ATI Radeon HD 4850.

Similar test from 3DMark Vantage suite for some reason doesn’t reveal any significant advantage of the new ATI architecture. However, ATI Radeon HD 4850 is at least as fast as Nvidia GeForce 9800 GTX. Looks like only every fifth ALU of our RV770 is really working in this test, but it turns out more than enough for results parity.

You all know Perlin Noise test from 3DMark06 suite as a “maximum” test for Shader Model 3.0 hardware. It generates a texture using 48 texture samples and 447 math1ematics instructions, which is maximum SM 3.0 hardware can handle.

Thanks to 800 streaming processors, ATI RV770 doesn’t disappoint us here: ATI Radeon HD 4850 is almost twice as fast as ATI Radeon HD 3870 and 70-75% faster than Nvidia GeForce 9800 GTX.

3DMark Vantage POM test shows stable advantage of our ATI Radeon HD 4850 over previous generation single-processor graphics cards. New ATI solution is almost twice as fast as its predecessor and 50% faster than Nvidia GeForce 9800 GTX at displaying the complex landscape with parallax occlusion mapping method.

Shader Math test is a slightly more complex version of Perlin Noise from 3DMark06. So, no wonder that ATI Radeon HD 4850 is also far ahead of the competitors here.

X-bit Mark test still illustrates remarkably well both: math1ematical GPU performance and their architectural success.

As you see, ATI Radeon HD 4850 is an indisputable winner in every subtest, where ATI solutions have never got even close to Nvidia GeForce 9800 GTX.

We can see a definite performance advantage over Nvidia GeForce 9800 GTX in all benchmarks. It is twice as fast in NPR shader with 10 texture samples, and more than twice as fast in shaders using complex calculations with loops and conditional branching. We didn’t have any doubts about the computational potential of the new ATI RV770 from the very beginning, but now we see for sure that the former curse of all ATI Radeon HD solutions – slow texture processors – has vanished without a trace. As a result, the new ATI Radeon HD generation deals brilliantly not only with pure math1ematics, but also defeats Nvidia solutions in their own element.


Geometry Performance

Simple Vertex Shader test from 3DMark06 is too simple, but at the same time it can beautifully show some peak geometry performance. The test barely suits to estimate the graphics architecture scalability, but ATI Radeon HD 4850 is the winner here.

As the resolution increases, ATI Radeon HD 3870 slows down. ATI Radeon HD 4850, in its turn, starts off slow and in 1280x1024 yields to the predecessor. However, by 1920x1200 it almost catches up with 3780. We don’t know up until now why RV670 behaves like that here.

This benchmark estimates the graphics architecture potential during vertex and geometry shaders processor and becomes the first benchmark where ATI Radeon HD 4850 falls far behind Nvidia GeForce 9800 GTX, although at the same time outperforms the previous generation ATI Radeon HD. It is pretty strange, even assuming that 4/5 of ATI Radeon HD 4850 computational capacity is idling because of poor drivers or even 3DMark vantage optimization. 160 active shader processors should be more than enough for the solution to successfully compete against Nvidia GeForce 9800 GTX.

All in all, the performance of ATI Radeon HD 4850 is pretty logical and speaks for itself. And first of all it indicates that new ATI architecture is almost free from any bottlenecks except a few doubtful results in several tests. Do these bottlenecks still exist? Only a test session is real gaming applications will answer this question, but it is going to be a topic for another article.

Although the computational capacity of the new RV770 has increased significantly, ATI Radeon HD 4850 behaves very strangely in Xbitmark geometry test: it suddenly yielded to ATI Radeon HD 3870. The defeat was most dramatic in scenes with 8 light sources. But even in this case, it ran faster than Nvidia GeForce 9800 GTX.


FSAA Quality and Performance

As we have already mentioned in our article called Highly Defined: ATI Radeon HD 2000 Architecture Review, although new anti-aliasing techniques using narrow tent and wide tend post-filters improve anti-aliasing of small details, they create a washed-out effect for the entire picture. In some cases it may result in unsatisfactory image quality. According to ATI, the new edge detect CFAA algorithm allows to avoid washed-out effect and retain excellent anti-aliasing quality of small objects.

To find out how true this statement is, we checked the anti-aliasing quality of contemporary ATI and Nvidia solutions in a blitz test session. For ATI we used traditional MSAA modes as well as multi-sampling mode with edge-detect filter. For Nvidia GeForce 9800 GTX we took screenshots for all MSAA and CSAA modes including 16xQ.

ATI Radeon HD 4850 
MSAA 4x

Nvidia GeForce 9800 
MSAA 4x

Nvidia GeForce 9800 
MSAA 4x + CSAA (CSAA 8x)

Both cards provide sufficient 4x multisampling quality to satisfy most gamers out there, although anti-aliasing of small objects and details could have been a little better.

We should note that CSAA 8x, which is supported by Nvidia GeForce 8/9 hardware, has nothing to do with true multi-sample AA 8x. In fact, this is one of the coverage-sampled antialiasing modes that stores only 4 color/Z samples, but uses 8 color lookups within its coverage sample. As you can see from the screenshot, this mode is practically the same as the regular MSAA 4x in quality. At least, it is extremely difficult to notice any differences in ideal conditions, not to mention a real game.

ATI Radeon HD 4850 
MSAA 4x + edge detect filter 
(FSAA 12x)

Nvidia GeForce 9800 
MSAA 4x + CSAA 
(CSAA 16x)

ATI’s edge detect CFAA 12x mode shows excellent results, outperforming MSAA 4x in anti-aliasing quality. You can clearly see it on the columns in the screenshot from TES IV: Oblivion. Adaptive filter does its job and the edges do indeed look much neater, without any washed-out effect. Now we have to find out how resource-hungry this mode is and if it can be used in games without any performance losses that could threaten comfortable gaming.

In the meanwhile, CSAA 16x mode turns out closer to the classical MSAA 4x than to MSAA 8x, because “16” in its name refers only to the coverage resolution, just like the “8” in CSAA 8x. Polygon edges are detected better, however the final pixel color can be calculated more precisely only with fully-fledged MSAA 8x.So, the anti-aliasing quality is much higher in the latter case.

ATI Radeon HD 4850 
MSAA 8x

Nvidia GeForce 9800 
MSAA 8x (CSAA 8xQ)

In MSAA 8x mode anti-aliasing is way better although it is hard to notice at first, especially in real games rather than on static screenshots.

Nvidia’s mode called “8xQ”, which is indeed a classical MSAA 8x, provides the same great anti-aliasing quality as the similar mode supported by ATI Radeon HD.

ATI Radeon HD 4850 
MSAA 8x + edge detect filter 
(FSAA 24x)

Nvidia GeForce 9800 
MSAA 8x + CSAA 
(CSAA 16xQ)

Csaa_16xQ/cfaa_24x

 

Although ATI’s edge detect 24x mode uses the same number of color and Z samples, it demonstrates much better anti-aliasing quality than Nvidia GeForce’s CSAA 16xQ. The “smart” post-filter allows it to ensure practically ideal anti-aliasing quality, but at what cost?

Unfortunately, the price is way too high: enabling edge detect FSAA 12x cuts the performance of ATI Radeon HD 4850 in half, and enabling 24x mode makes the performance 4 times slower. At the same time, the owners of multi-GPU systems with ATI Radeon HD 4850/4870 will certainly appreciate super-quality FSAA mode. With three or four GPUs in the system the performance will rise to acceptable level and the image quality will be truly unmatched.

Nevertheless, it makes perfect sense to enable MSAA 8x on ATI Radeon HD 4850 even in 1920x1200 (at least it is true for Half-Life 2 Episode 2). However, the same mode on Nvidia GeForce 9800 GTX pushes the average performance level close to the minimum.


Conclusion

The third generation of DirectX 10 compatible graphics accelerators from ATI has finally seen the light of day. Someone will call it another evolutionary step, someone will regard the innovations and higher performance as a clear indication of a revolution in image quality and GPU market. So, it is high time we summed up everything we have discussed today in our theoretical study of ATI Radeon HD 4800 and made some conclusions.

ATI RV770 evolved from ideas that were first brought up back in R520/R580, implemented in R600 and RV670, and perfected almost to ideal in the third generation of ATI’s DirectX 10 compatible processors. It is important to understand that R600→RV770 evolution is a logical process combining higher capacities and thorough architectural optimizations. In other words, ATI didn’t just add some muscle to its graphics processor, but made sure it looked good and worked efficiently, so that it didn’t make an impression of a steroid-pumped athlete.

According to the results of our theoretical benchmarks, this smart approach provided ATI Radeon HD 4850 with a much greater potential than that of Nvidia GeForce 9800 GTX at the same level of heat dissipation, considerably simpler and more compact design and, most importantly, low recommended price of $199 (thanks to a relatively small die size). Just for your information, when Nvidia GeForce 9800 GTX launched, its MSRP equaled $349 with much less opportunity for price maneuvering. So, when the price of this bulky and complex graphics accelerator dropped down dramatically, Nvidia and partners were hardly benefiting from it.

However, the fact that ATI RV770 has 956 million transistors while Nvidia G92 has 754 million indicates much higher complexity of the newcomer. Looks like AMD indeed put to good use DirectX 10.1 support, hardware tessellator, built-in audio-controller and other innovations.

ATI/AMD bet on the mass consumer, which seems to be a very smart and strategically far-sighted move. Especially since current statistical data demonstrate lowering of average selling price for graphics adapters despite the slowly growing supply volumes: by the end of 2007 the price was $169, and by the end of Q1 - $145! In this respect $199 seems to be an optimal price for ATI Radeon HD 4850, since its simple design and relatively low production cost may give the developer quite a bit of a margin for further price reduction.

Besides serious gaming potential, simple design and low price, ATI Radeon HD 4850 graphics accelerators boast a number of unique qualities that are absent by their main competitor. Namely, they support DirectX 10.1, VC-1 decoding, advanced BD profiles and have a fully-functional audio core capable of outputting multi-channel HD sound. This makes ATI Radeon HD 4850 an ideal solution for a digital media center with advanced multi-media features and excellent gaming performance.

So, looks like ATI has every chance to come back and regain its influence in the desktop discrete graphics market. However, we will be able to make a final conclusion only once we test these new solutions in the popular gaming applications. Only then we will have a complete objective picture of the ATI Radeon HD 4850 performance.

Power Color Radeon HD 4850 512MB: Summary

Power Color HD 4850 (model AX4850 512MD3-H) graphics card appears to be a very good adapter overall and an excellent choice for its money. Let’s summarize its highs and lows.

Highs:

Lows: