Overview of AMD RDNA 4 architecture: the red punch

In previous articles, we’ve discussed video card architectures NVIDIA RTX 3000 and 4000, RTX 5000 Blackwell. They even managed to inspect the Intel Xe 2 Battlemagebut there was still one more «red» competitor. So it’s time to close this GPU review Gestalt with AMD’s latest fourth-generation Radeon New Architecture (RDNA 4).

RDNA 4 architecture

AMD has been using the RDNA architecture since 2019 with the release of the RX 5700 series graphics cards. As you may have noticed, the first generation of RDNA was made on a monolithic crystal. The second and third generations, like modern AMD Ryzen processors, used a chiplet structure of the crystal: each of the elements (the video processor or its elements, memory) could be made on different processes. With the release of RDNA 4, AMD returned to a monolithic layout. The exact reasons for the return are not known. Time will tell whether the new AMD processors will also be monolithic.

According to AMD, the new architecture provides better support for Ray and Path Tracing, in-memory data compression, Machine Learning assistance for image rendering, and improved programmable shaders.

The Compute Unit (CU) has been the main part of graphics and AI computing in AMD Radeon graphics cards for many years. Visually compared to competitors, it is still more confusing than Intel’s at first.

Dual SIMD32 vector units (Dual SIMD Vector Units) consist not of two, as you might think, but of three different parts: two ALUs (Arithmetic Logic Units), each of which processes FMA or FMA/INT values, and one TLU (Transcendental Logic Unit) to interact with transcendental numbers. A vivid example of such numbers is the number π (Pi).

In the RTX 5000, NVIDIA has combined similar units in terms of functionality into one.

The scalar unit consists of the following elements: a register of saved files and special registers, an internal ALU for mathematical operations, a unit for calculating nonlinear floating-point and fixed-point functions, and a data type conversion mechanism. With the new architecture, the Scalar Unit now supports operations with the Float32 type.

This file may be of interest to some expert commentators. Or just to increase general understanding.

The cache (highlighted in red) is distributed between two structures: Scalar Cache (16 KB) and Shader Cache for instruction execution (32 KB). And looking ahead, the cache is also different and has received additional improvements: the third generation Infinite Cache is 64 MB, the L2 cache is 8 MB, and the total CU cache is now 2 MB.

The scheduler (highlighted in blue) distributes the load between the blocks described above and below, and has received partitioned and named barriers, accelerated fill/fill operations, and improved instruction prefetching.

Additionally, memory handling has been improved (reducing queue delays, which will improve performance for ray tracing operations), a dynamic register allocator has been added (leading to better handling of memory latency with potential performance improvements for the entire shared core), and the efficiency of the CU itself has been increased.

NVIDIA, like Intel, has a special shader sorting and sending unit for better data execution and localization. Accordingly, NVIDIA’s Shader Execution Reordering is responsible for this (with the Blackwell architecture, it received the 2.0 update), and Intel’s Thread Sorting Unit is responsible for this. This raises a question to which I cannot give a precise answer: «Do the described AMD technologies replace analogs from competitors?»

And here is the answer to: «Which option is more productive?», can be answered. NVIDIA wins in terms of overall performance —.

The Raytracing Accelerator has doubled the number of Ray Intersection blocks: Box and Triangle Intersections (the familiar Box and Triangle Intersections); improved BVH size compression (by 60% compared to RDNA 3); added Oriented Bounding Boxes technology, which reduces the number of ray intersections inside 3D graphic objects. In general, ray tracing processing has become twice as fast as in RDNA 3.

AI Accelerator is responsible for working with artificial intelligence. For this purpose, it has been given more support for data types and accelerated work with the already known ones: twice as fast as in FP16 and four times faster with Structured Sparsity (Sparcity); four times with INT8 and eight times with Sparsity. Of course, support for FP8 and ML Super Resolution has been added.

Thanks to this block, the new FSR 4 is processed. Well, that’s a logical explanation for the lack of support for older video cards. Or is it not? Read more about FSR in the section below.

A single CU won’t do much, so AMD «under one wing» NAVI 48 has combined as many as 64 of them. AMD uses the term Shader Engines to combine CUs. The chip is manufactured at TSMC fabs using 4 nm technology, has 53.9 billion transistors, and the crystal area is 356.5 mm2. For comparison, the chip area for the RTX 5090 (GB202) is 750 mm2, and the RTX 5080 / 5070 TI (GB203) is — 378 mm2. And the RTX 5070 is even smaller at 263 mm2.

The chip won’t sell itself, so AMD is putting it into two new generation graphics cards. The first is the flagship (no plans to release a better solution have been heard) AMD Radeon RX 9070 XT with full NAVI 48 and RX 9070 with one Shader Engine disabled (8 CU). If you want, you can check out the editorial review Asus AMD Radeon RX 9070 XT 16GB Prime OC and Gigabyte Radeon RX 9070 GAMING OC. Watching the video below, it becomes clear that AMD is one generation behind NVIDIA in terms of ray tracing.

Here is a table comparing the characteristics of two generations of AMD graphics cards: RDNA 3 and RDNA 4. Comparing them, we see a similar situation as with Intel – the chip has become smaller, the number of transistors has also increased, and the performance has remained at the same level or even slightly increased. It seems that AMD and Intel have paired up and used the «forbidden ~~street~~ the magic of Optimization».

	RX 7900 XTX	RX 7900 XT	RX 9070 XT	RX 9070	RX 7900 GRE
Chip	NAVI 31	NAVI 31	NAVI 48	NAVI 48	NAVI 31
Chip size	529 mm2	529 mm2	357 mm2	357 mm2	529 mm2
Number of transistors	57.7 million	57.7 million	53.9 million	53.9 million	57.7 million
CU	96	84	64	56	80
Ray Accelerators	96	84	64	56	80
AI Accelerators	192	168	128	112	160
Frequency	2500 MHz	2400 MHz	2970 MHz	2520 MHz	2245 MHz
TDP	355 W	315 W	304 W	220 W	260 W
Video memory	24 GB GDDR6	20 GB GDDR6	16 GB GDDR6	16 GB GDDR6	16 GB GDDR6
Video memory bandwidth	960 GB/s	800 GB/s	640 GB/s	640 GB/s	576 GB/s
Video memory speed	20 Gbps	20 Gbps	20 Gbps	20 Gbps	18 Gbps

Games are games, but what about work programs? The editorial review tried rendering a test interior in Realistic Interior Lighting. The result is disappointing, as the RX 9070 took 64% longer than the RTX 5070 Ti. It also failed to work with the usual offline neural image generators. A commonplace reason for the applications’ inoperability is the lack of developer support and the initial introduction of new machine learning technologies from AMD. It seems that only this RX 9000 generation of graphics cards has fully launched productive ML.

AMD FidelityFX Super Resolution (FSR) 4

After NVIDIA, AMD made its own upscaler back in 2021. Every year, the technology improved and gradually increased the quality of image scalability on users’ monitors. And over these four years, FSR has been added to more than 400 games! The new FSR 4 is already supported in 30+ well-known games and will be in 70+ more this year. This is a truly incredible result that must be respected.

Having mentioned FSR 4 Super Resolution, it is necessary to talk about the principle of operation developed by AMD. Each game has its own ML model, which is first optimized on AMD Instinct server accelerators. That’s right, AMD itself does it.

After that, the assembled model is sent to your video card. All developers have to do is add FSR 4 support to the code so that the video card driver can apply the resulting model to each individual game.

Subtitles from AMD presentations. The scaling is done from a four times lower resolution. Apparently, FullHD resolution is used for 4K.

Along with the presentation of the new generation of RDNA 4 architecture, a new fourth version of FSR was also shown. However, gamers did not understand one point — the lack of support for older graphics cards, even RDNA 3. If you missed the previous section, I will briefly explain.

Older generations of RDNA 3 have too little support for ML models used for all kinds of neural networks, etc. While NVIDIA was developing this area and capturing the market, AMD was improving old technologies. However, as time has shown, most users want to see high fps with a high level of graphics, as well as the game itself to be interesting, but this is not a question for video card developers.

AMD has just discovered the way of neural rendering, which wrote earlier.

However, support for games with non-AI scaling of FSR 3.1 is not removed. Both versions should complement each other: if a user has an old video card, they will have FSR 3.1, while a new RX 9000 will have FSR 4.

AMD HYPR-RX

NVIDIA’s scaling technology is called DLSS, and they added a lot of things to it, which was confusing at first. Intel has created its own analog in the form of XeSS 2, which also confused users a bit. It’s a good thing that AMD made the right choice and called its technologies under a separate name HYPR-RX. It includes:

AMD FSR — image scaling;
AMD Fluid Motion Frames 2 — frame generator.
AMD Anti-Lag 2 — Reduces lag when using a keyboard, mouse, or gamepad;
AMD Boost — scaling for RX 6000 graphics cards;
AMD Super Resolution — is another scaling technology, only for the RX 5000;

From the slide, it becomes clear that HYPR-RX is an option in the Adrenaline driver menu to quickly launch the improvements needed by the user. It’s supported on RX 5000, RX 6000, RX 7000, RX 9070, and AMD Ryzen AI 300 hybrid APUs. Of course, support for the new RX 9060 will be added later (if it ever comes). Let’s go through the built-in technologies (we have already talked about FSR above).

The main task of AMD Fluid Motion Frames (FMG) 2 is to generate frames. When fully enabled, you can increase the fps by more than three times. That is, one or two frames will be generated. The improved version of FMG 2.1 should definitely work on the new RX 9000 graphics cards, but there are some doubts about previous generations.

AMD Anti-Lag 2 reduces mouse, keyboard, or gamepad input lag by up to 20% in games like Counter Strike 2, Apex Legends, and, unexpectedly, Ghost of Tsushima.

There will be more soon.

In general, AMD is doing well at the moment. The processors are very popular both for ordinary users and especially for gamers. Sony PlayStation consoles will obviously remain on AMD. With Xbox, the question remains open (AMD or ARM), but Microsoft has always been on its own.

The video card market is not doing well right now, and all players in the market are having problems. NVIDIA is clearly overpricing new graphics cards, as the company generates the bulk of its revenue from its own developed AI server solutions. As long as there are business customers, NVIDIA will cover this market with its products. And gamers will get the remainder because the same processors are used for both markets.

AMD seems to be trying to flirt with gamers, but there is exactly one drawback — technological. First, it’s a one-generation lag in ray tracing. Secondly, FSR 4 technology should already be at the level of the new DLSS 4 with the new Transformer model, but it looks like it’s still DLSS 3. Thirdly, if AMD showed that the RX 9070 XT easily outperformed the RTX 5080 without tracing at one and a half to two times the price — no questions asked. Or at least it was stronger in pure 4K than the RTX 5070 Ti always in tests — no questions asked. And so AMD is fighting a battle «on raytracing enemy field».

Bottom line. Now AMD needs to invest more in working software developers (rendering, neural networks) to increase the interest of potential buyers. Not only games need to do well, but also applications. Users are unlikely to want to buy two graphics cards: one for gaming and the other for software.

In a year or two, we expect a new generation of consoles, with a new analog of PSSR 2 for Sony with improved Super Resolution scaling and a new frame generator. And let’s not forget about new «portables» with Strix Halo APU.