This thesis analyzes SVE, a novel SIMD extension for Arm architectures. SVE removes the concept of vector size from the ISA, which allows CPU implementations with different vector sizes to execute the same SIMD instructions. In our work, we ask what consequences this has on application developers and computer architects. We select a set of standard HPC benchmarks and applications for the analysis and rely on Gem5, a state-of-the-art simulator for computer architecture research.We first evaluate the SVE ISA by looking at the vectorization of specific loops and searching for new vectorization opportunities. Afterward, we analyze how the VLA concept translates to algorithms and kernels in HPC. Finally, we study how different SVE lengths impact the execution and behavior of components in the microarchitecture. Our results show that the VLA paradigm in many algorithms naturally extends the fixed-width SIMD implementation. A larger SVE size results in better performance, especially in compute-bound kernels. At the same time, we show that different SIMD widths in a CPU can significantly affect the out-of-order execution and influence bottlenecks in various microarchitectural components.
The document is publicly available on the WWW