Vector length agnostic SIMD parallelism on modern processor architectures with the focus on Arm's SVE / Author: Bine Brank. Wuppertal, April 2023
Content
- Abstract
- Acknowledgements
- Contents
- List of Figures
- List of Tables
- Abbreviations
- Symbols
- 1 Introduction
- 2 Background
- 2.1 SVE
- 2.2 Applications
- 2.3 Hardware
- 2.4 Tools
- 2.5 Gem5 simulator
- 2.6 Related Work
- 3 Methodology
- 3.1 Auto-vectorization
- 3.2 Application setup
- 3.2.1 Computational patterns
- 3.2.2 Application hot spots
- 3.2.3 Benchmark preparation
- 3.2.4 Region Of Interest
- 3.3 Gem5 model
- 3.4 SVE static analysis
- 3.5 Architectural exploration
- 4 Porting of applications
- 4.1 OpenBLAS
- 4.1.1 BLAS3 general algorithm
- 4.1.2 Preserving the VLA feature
- 4.1.3 SVE assembly kernel
- 4.1.4 SVE intrinsic packing functions
- 4.1.5 Triangular matrices
- 4.2 GROMACS
- 4.3 GPAW
- 4.4 MiniFE
- 5 Results & analysis
- 6 Summary & conclusions
- A SVE examples
- B Gem5 configuration
- Bibliography
