Рубрики ArticlesTechnologies

Intel Arc Xe2 Battlemage architecture overview: Battlemage replaces the all-metal Alchemist

Published by Vladyslav Vasylenko

Intel hasn’t been very good with processors lately. Perhaps they are focusing on creating video cards and competing with NVIDIA/AMD? Today we’re going to take a look at their latest architecture, Intel Arc Xe2, codenamed Battlemage. And for those who are already eager to see the practical results, we have already released an editorial review SPARKLE Intel Arc B580 TITAN OC on this architecture.


If you follow the cycle review of architecturesI need to clarify one point. NVIDIA and Intel have their own marketing names for their technologies. For better understanding, I will compare the terms. In this way, I will try to increase the overall level of understanding.

NVIDIA Intel
SM blocks Xe-nuclei
Tensor kernels XMX (Xe Matrix eXtension Engines)
RT cores RTU (RayTracing Unit)
GPC Render Slice
DLSS XeSS

Architecture of He 2 Battlemage

$ is the cash icon.

Each second-generation Xe core consists of 8 Vector Engines, 8 XMX Engines, 256 KB of shared L1 cache with Shared Local Memory, and various additional control and interaction units. The second-generation Xe core natively supports SIMD16 operations.

For Lunar Lake mobile processors, the combined L1 cache and SLM is 192 KB.

SIMD (Single Instruction/Multiple Data; One Instruction/Multiple Data) is a computational method that provides processing of multiple data using a single instruction. Whereas the usual sequential approach of using one instruction to process one single data is called Scalar operations. One of the examples for their improvement is AVX instructions.

SIMD operations can only be parallelized at the hardware level. This capability improves the performance of vectorized computing, which is widely used in mathematical, scientific, and graphical applications.

Vector mechanism (Xe Vector Engine, XVE) — is a block that executes instructions. In each XVE, the basic computing units are SIMD floating point. Another better known name for XVE is Arithmetic Logic Unit (ALU). However, XVE is an improved version of ALU. XVE supports floating point and integer instructions such as MAD or MUL, as well as advanced math instructions such as sin, cos, exp, log і rcp.

As mentioned above, the updated Vector Engine has received support for SIMD16 and SIMD32 and an increased number of mathematical operations. Of course, there is support for INT2, INT4, INT8, FP16, BF16, and TF32.

Matrix expansion mechanism (Xe Matrix eXtension Engines, XMX) is used, as the name implies, to calculate matrix operations required for AI. Compared to the previous generation, the performance of INT8 and FP16 has been improved.

The Xe core is just a small and important «cog» in the graphics card architecture. It is the Render Slice that is the priority high-level hardware unit in all Battlemage GPUs. Each Render Slice includes four Xe cores, four Xe Ray Tracing Units (RTUs), and function blocks that support rendering, geometry, tessellation, mesh and pixel dispatching and processing logic.

Each of these small blocks has received additional optimizations to reduce internal latency, increase throughput, and reduce texturing errors.

L2 cache is the top-level cache in the memory hierarchy. Memory requests from all rendering segments and Xe-core are directed to the L2 cache. It has also been improved: support for a new level of data compression «8 to N» and fast cleaning of unnecessary resources.

The Xe Ray Tracing Unit (RTU) processes ray tracing requests received from the XVE. Each RTU has a Bounding Volume Hierarchy (BVH) cache to reduce the average latency of data to the BVH and is capable of handling multiple ray tracing requests for improved overall efficiency.

Additionally, each RTU is supported by a Thread-Sorting Unit (TSU), a dedicated hardware unit that can sort and resend trace requests to shader threads to maximize SIMD coherence from divergent rays.

Compared to the previous generation, the number of units for calculating Traversal pipelines and Box intersections has increased by one and a half times, and the memory capacity of BVH and Triangle intersections has doubled.

Putting all this together, Intel offers potential users a new video processor. The BMG-G21 chip is a new (and so far the only) low-cost graphics processor in today’s generation of consumer graphics cards from Intel. The video processor is supplied as part of two Intel Arc B580 and Intel Arc B570 graphics cards. While the B580 contains the full number of Xe cores and RTUs of 20 units each, the B570 contains only 18.

Compared to the previous generation of Xe1 Alchemist, the new Xe2 Battlemage has received a very strong increase in performance in some areas:

Xe1 vs Xe2
Compute Dispatch XI +700%
Draw XI +1250%
Pixel Blend rate +210%
Adjusting the shader mesh (Mesh shader dispatch) +410%
Vertex index cut +200%
Vertex processing +150%
Tessellation +120%
Ray Triangle intersection with a triangle +210%
Trace rays +160%
Sampler feedback +270%

Here is a table comparing the characteristics of both generations of Intel graphics cards. At first glance, you can see that «Battlemage» is significantly inferior in «Iron» than «Alchemist». For some parameters, the difference is more than three times!

A770 А750 А580 B580 B570
Chip ACM-G10 ACM-G10 ACM-G11 BMG-G21 BMG-G21
Xe nuclei 32 28 24 20 18
Render Slice 8 7 6 5 5
RTU 32 28 24 20 18
XMX 512 448 384 160 144
XVE 512 448 384 160 144
Frequency 2100 MHz 2050 MHz 1700 MHz 2670 MHz 2500 MHz
TDP 225 W 225 W 185 W 190 W 150 W
Video memory 16 GB GDDR6 8 GB GDDR6 8 GB GDDR6 12 GB GDDR6 10 GB GDDR6
Video memory bandwidth 560 GB/s 512 GB/s 512 GB/s 456 GB/s 380 GB/s
Video memory speed 17.5 Gbps 16 Gbps 16 Gbps 19 Gbps 19 Gbps

Is it really the case that the new generation is weaker than the previous one? No, it’s just that Intel has done an amazing job optimizing drivers and working closely with game developers to support new technologies. Even in editorial review of the Intel B580 performed well in games.

If you look at the names of the processors, something doesn’t add up. The BMG-G21 is presented by name and characteristics resembles a weaker version of ACM-G11. Here are two possible solutions:

  • Intel to present BMG-G20 chipwhich will be even more powerful to compete with the NVIDIA RTX 5060;
  • Intel will not release any more chips based on this architecture. There are many reasons for this, but the obvious one is preparation for the third generation, which is still a year and a half away.

Intel claims that the new generation of graphics cards is highly scalable, but without mid- or high-end solutions, it is not very interesting. We will probably see this scalability for business and server solutions.

Perhaps the three main players in the market have already divided the market shares among themselves: NVIDIA takes the top-end solutions and the Nintendo Switch 2, AMD gets mid-range solutions and PS consoles with Xbox, and Intel gets budget solutions. This is exactly the picture in terms of custom graphics cards

Intel Xe Super Sampling 2 (XeSS 2)

After the successful emergence and development of DLSS and FSR technologies by competitors, Intel needed its own implementation of image scaling in AI-based games. In 2022, along with the first-generation Alchemist architecture, the company introduced Xe Super Sampling, which uses deep learning to synthesize images that are close in quality to native high-resolution.

For two years, Intel has been working hard to help game developers better implement XeSS. Even more than 150 games have started to support it. Such as: Mortal Kombat 1, Tekken 8, Dying Light 2, Forza Horizon 5, Remnant 2, and others.

With the release of Battlemage graphics cards, Intel has updated its XeSS 2 upscaler. It contains additional technologies (Doesn’t really remind you of anything?):

  • XeSS-SR (Super Resolution) — improved image scaling;
  • XeSS-FG (Frame Generator) is a new frame generator from Intel;
  • XeSS Low Latency — Reduced latency when clicking on the keyboard or mouse;

However, one question needs to be answered: will XeSS 2 be supported on previous generation Intel graphics cards? Yes, it will, and here’s a confirmation from Reddit and a podcast in Youtube.

However, there are not many games that support XeSS 2. One of the upcoming releases is Assassin’s Creed Shadows. Intel now needs to expand the library again. The full list can be found here link.

XeSS Super Resolution

XeSS-SR is an alternative variant of Intel’s image scaling to compete with NVIDIA’s DLSS. Interestingly, we were shown two models on the slide: a full-fledged and a simplified (Lite) one. It seems that the first version will be used only on Intel graphics cards, while the simplified one will be used on all others. Also, XeSS-SR performs its «» magic before the frame generator.

XeSS Frame Generation

The new XeSS-FG frame generator allows you to add interpolated frames using optical flow and motion vector reprojection to provide a smoother gaming experience. These newly generated frames are inserted between the regular frames that have been classically rendered. A similar principle was applied in DLSS 3 FG.

It is not clear how many frames will be generated. According to the slide, the F1 24 game generates about 1.6 frames. It is noticeable that the User Interface (UI) is formed after the frames are created and added. It seems to me that, as with XeSS-SR, there are two models for FG in the SDK — full and simplified. If there is only one, then it will only work with Intel graphics cards.

XeSS Low Latency

Xe Low Latency should reduce latency when using player input. This technology is needed to overcome «jelly» when playing with a frame generator. Analogous to NVIDIA Reflex.

The slide shows that the delay in the CPU rendering queue is precisely removed. This means that a mouse click goes directly to the GPU. Of course, the length of the delay reduction shown by Intel is too short. But the fact that when the click is triggered, the action will be performed on two frames at once — looks plausible.

PresentMon Display Latency — an application for monitoring fps, temperatures, latency, and more. An analog of many monitoring programs, only from Intel.


What should Intel do now? According to review«childhood diseases» related to neural networks and stability. Nowadays, video cards are used not only for their intended purpose of playing games, but also for rendering videos and creating beautiful girls or landscapes in Stable Diffusion or other alternatives. Therefore, Intel needs to support this area more as well.

With the upcoming release of the new low-end NVIDIA RTX 5060 and AMD Radeon RX 9060 (exact names unknown), Intel will have to release new B770/B750 graphics cards to compete with them. Let’s not forget that we need to increase the number of games that support XeSS 2 technologies.

Intel also needs to bridge the «new player» gap in the minds of end users. They will have to answer questions: Why buy Intel when users can buy NVIDIA/AMD? This also applies to portable consoles other than the MSI Claw. Apart from the «Claw», there are no more consoles from Intel. There is also a lot of work to be done in this relatively new area.


This article was prepared based on official Intel materials.