

Bergische Universität Wuppertal Fakultät für Mathematik und Naturwissenschaften Fachgruppe Physik

# Development of a Detector Control System Chip

PhD Thesis of: Niklaus Lehmann

Submitted on 25.06.2019

Defended on 02.09.2019

The PhD thesis can be quoted as follows:

urn:nbn:de:hbz:468-20191113-112127-6 [http://nbn-resolving.de/urn/resolver.pl?urn=urn%3Anbn%3Ade%3Ahbz%3A468-20191113-112127-6]

DOI: 10.25926/m1ns-xc90 [https://doi.org/10.25926/m1ns-xc90]

# Abstract

The Large Hadron Collider (LHC) at CERN will be updated to the High-Luminosity LHC by 2026. The goal of this update is to achieve higher intensities in the collisions and collect ten times more luminosity than with the LHC. This gives higher statistics to measure with greater precision the parameters of the standard model in particle physics. The ATLAS experiment will receive a completely new inner tracker for operation at the High-Luminosity LHC. This ATLAS ITk detector is a full silicon tracking detector with pixel and strip sensors. A serial power approach is foreseen for the ITk Pixel detector. This reduces the number of services and material, however, has also risks and new challenges.

The task of the detector control system (DCS) is to monitor the health of the experiment and control the operation. An integrated circuit was developed for this task. The socalled pixel serial power & protection (PSPP) chip measures the voltage and temperature of a module in the serial power chain. Additionally, it includes a bypass transistor to deactivate a single module if necessary. The bypass is activated automatically in case of over-temperature or over-voltage. This gives full control over each module and allows to recover a serial power chain in case of a faulty module.

Based on an existing prototype, new versions of the PSPP were developed for this thesis. They include all required functionalities and can switch a current of 8 Å. The developed prototype is functional to a total integrated dose of 800 Mrad, which was tested in X-Ray irradiations. Further, tests were performed to verify the protection against single event upsets causing bit flips in the internal registers. The cross-section of the triplicated registers in the PSPP was measured with a proton test beam and is smaller than  $1.7 \times 10^{-17}$  cm<sup>2</sup>. The PSPP prototype successfully resisted temperatures between (0 and 60) °C in a 42-day long climate chamber test. No failure was observed.

A system test with prototype modules was built at CERN to verify the concept of the serial power chain. This used realistic services and mechanical structures. The PSPP chip was included in the system test and proofed to be very useful during commissioning and debugging. The bypass and its protection function prevented damage to detector modules. The PSPP delivered useful monitoring data to refine the requirements of the serial power chain.

# Abstrakt

Der Large Hadron Collider (LHC) am CERN wird bis 2026 zum High-Luminosity LHC ausgebaut. Diese Erweiterung hat zum Ziel höhere Intensitäten bei den Kollisionen zu erreichen um die gesammelte Luminosität um einen Faktor 10 zu erhöhen. Mit dem grösseren Datensatz können die Eigenschaften des Standard Models der Teilchenphysik genauer vermessen werden. Die Experimente müssen dafür aktualisiert und aufgerüstet werden.

Beim ATLAS Experiment wird der komplette innere Detektor für den Betrieb am High-Luminosity LHC mit einem neuen Silizium-Spurdetektor ersetzt. Dieser, ATLAS ITk Detektor genannt, besteht aus mehreren Lagen mit Pixel- und Streifensensoren. Für den ITk Pixeldetektor wird erstmals auch eine serielle Stromversorgung an einem LHC Experiment verwendet. Die serielle Versorgung hat den Vorteil, dass Leitungen und dadurch Material eingespart werden kann. Jedoch gibt es auch Risiken und neue Entwicklungen werden benötigt.

Das Detektorkontrollsystem (DCS) hat die Aufgabe den Detektor und seinen Zustand zu überwachen. Das DCS kontrolliert auch den Betrieb des Detektors. Eine Integrierte Schaltung wurde speziell dazu entwickelt. Dieser Pixel Serial Power & Protection (PSPP) genannte Chip misst die Temperatur und Spannung von einem Modul in einer seriellen Versorgungskette. Weiter hat der Chip einen Bypass-Transistor, welcher das Modul kurzschliessen und damit deaktivieren kann. Das erlaubt es einzelne Module in der seriellen Versorgungskette zu steuern, während die anderen Module weiterhin funktionieren. Die Aktivierung des Bypasses kann automatisch erfolgen, sollte die Temperatur oder Spannung des Moduls zu gross werden.

Auf Basis eines existierenden Prototyps wurden während dieser Arbeit weitere Versionen des PSPP entwickelt. Diese beinhalten alle benötigten Funktionen und können einen Strom von 8 A schalten. Der entwickelte PSPP wurde bis zu einer totalen ionisierenden Dosis von 800 Mrad erfolgreich getestet. Weiter wurden Tests der Resistenz gegenüber strahlenbasierten Bit-Flips durchgeführt. Es wurde ein Wirkungsquerschnitt kleiner  $1.7 \times 10^{-17}$  cm<sup>2</sup> gemessen. Ein Chip wurde auch in einer Klimakammer bei Temperaturen zwischen (0 und 60) °C während 42 Tagen erfolgreich betrieben. Während dieses Dauertests wurden keine Fehlfunktionen beobachtet.

Der PSPP wurde ausserdem in einem Systemtest mit Sensormodulen und realistischer mechanischer Struktur eingesetzt. Die Funktion des PSPPs war hilfreich bei der Inbetriebnahme und Fehlersuche. Die automatische Bypass-Aktivierung bewahrte die Module vor Schäden. Mit Hilfe der vom PSPP gemessenen Daten wurde die Spezifikation der seriellen Versorgungskette verbessert.

# Contents

| Int | trodu                         | ction    |                                                        | 1  |  |  |
|-----|-------------------------------|----------|--------------------------------------------------------|----|--|--|
| 1   | Phy                           | sics at  | the Large Hadron Collider                              | 3  |  |  |
|     | 1.1                           | The L    | arge Hadron Collider                                   | 3  |  |  |
|     | 1.2                           | The A    | TLAS experiment                                        | 4  |  |  |
|     |                               | 1.2.1    | Overview of the detector                               | 4  |  |  |
|     |                               | 1.2.2    | Upgrade of the inner tracking detector                 | 5  |  |  |
|     | 1.3                           | Standa   | ard Model of elementary particle physics               | 7  |  |  |
|     |                               | 1.3.1    | Matter Particles                                       | 8  |  |  |
|     |                               | 1.3.2    | Force carriers                                         | 9  |  |  |
|     |                               | 1.3.3    | Missing elements                                       | 10 |  |  |
| 2   | Silicon tracking detectors 11 |          |                                                        |    |  |  |
|     | 2.1                           | Silicon  | $u \ detectors  . \ . \ . \ . \ . \ . \ . \ . \ . \ .$ | 11 |  |  |
|     |                               | 2.1.1    | PN-junction                                            | 11 |  |  |
|     |                               | 2.1.2    | Silicon sensor modules: Hybrid detectors               | 13 |  |  |
|     |                               | 2.1.3    | CMOS detectors: Monolithic Active Pixel Sensors        | 14 |  |  |
|     | 2.2                           | Read of  | out electronics                                        | 15 |  |  |
|     | 2.3                           | Detect   | or power supply                                        | 16 |  |  |
|     |                               | 2.3.1    | Individual power                                       | 17 |  |  |
|     |                               | 2.3.2    | Parallel power                                         | 18 |  |  |
|     |                               | 2.3.3    | Serial power                                           | 18 |  |  |
| 3   | Des                           | ign of r | adiation hard ASICs                                    | 21 |  |  |
|     | 3.1                           | Radiat   | tion damage in integrated circuits                     | 21 |  |  |
|     |                               | 3.1.1    | Cumulative radiation effects                           | 22 |  |  |
|     |                               | 3.1.2    | Single event effects overview                          | 24 |  |  |
|     |                               | 3.1.3    | Single event upsets in logic                           | 27 |  |  |
|     |                               | 3.1.4    | Single event effects in analog elements                | 28 |  |  |
|     |                               | 3.1.5    | Simulations of single event effects                    | 28 |  |  |
|     | 3.2 Radiation hard circuits   |          |                                                        |    |  |  |
|     |                               | 3.2.1    | Protection against TID effects                         | 29 |  |  |
|     |                               | 3.2.2    | Protection against SEU and SET                         | 30 |  |  |
|     |                               | 3.2.3    | Protection against multi-bit upsets                    | 34 |  |  |
|     |                               | 3.2.4    | Methods to prevent latch-up                            | 35 |  |  |

| 4 | Dete                                                | ector c | ontrol system                                           | 37 |  |  |
|---|-----------------------------------------------------|---------|---------------------------------------------------------|----|--|--|
|   | 4.1                                                 | Contro  | ol and monitoring of ATLAS                              | 37 |  |  |
|   |                                                     | 4.1.1   | LHC operation                                           | 37 |  |  |
|   |                                                     | 4.1.2   | DCS state machine                                       | 40 |  |  |
|   | 4.2                                                 | DCS f   | or the ITk Pixel Detector                               | 41 |  |  |
|   |                                                     | 4.2.1   | Safety path                                             | 41 |  |  |
|   |                                                     | 4.2.2   | Control & feedback path                                 | 42 |  |  |
|   |                                                     | 4.2.3   | Diagnostic path                                         | 43 |  |  |
|   | 4.3                                                 | Contro  | ol of a serial power chain                              | 43 |  |  |
|   |                                                     | 4.3.1   | DCS controller                                          | 44 |  |  |
|   |                                                     | 4.3.2   | PSPP chip                                               | 44 |  |  |
| 5 | Pixel Serial Power & Protection chip 4 <sup>r</sup> |         |                                                         |    |  |  |
|   | 5.1                                                 | Requi   | rements                                                 | 45 |  |  |
|   | 5.2                                                 | Previo  | ous prototypes                                          | 47 |  |  |
|   | 5.3                                                 | Next g  | generation Pixel Serial Power & Protection chips        | 48 |  |  |
|   |                                                     | 5.3.1   | Pixel Serial Power & Protection chip version 3 (PSPPv3) | 48 |  |  |
|   |                                                     | 5.3.2   | PSPP Add-on Regulator & Comparator chip (PARC)          | 49 |  |  |
|   |                                                     | 5.3.3   | Pixel Serial Power & Protection chip version 4 (PSPPv4) | 50 |  |  |
|   |                                                     | 5.3.4   | PSPP Asynchronous TMR Test chip (PATT)                  | 52 |  |  |
|   | 5.4                                                 | Serial  | control bus                                             | 52 |  |  |
|   |                                                     | 5.4.1   | From I2C-HC to SCB                                      | 52 |  |  |
|   |                                                     | 5.4.2   | Physical layer                                          | 53 |  |  |
|   |                                                     | 5.4.3   | Protocol                                                | 54 |  |  |
|   | 5.5                                                 | Logic   | core                                                    | 58 |  |  |
|   |                                                     | 5.5.1   | Protocol Unit                                           | 58 |  |  |
|   |                                                     | 5.5.2   | User Unit                                               | 62 |  |  |
|   |                                                     | 5.5.3   | Protection against single event upsets                  | 64 |  |  |
|   |                                                     | 5.5.4   | Communication and logic test with the PSPPv3            | 64 |  |  |
|   |                                                     | 5.5.5   | Logic function updates for the PSPPv4                   | 66 |  |  |
|   |                                                     | 5.5.6   | Test of PSPPv4 logic                                    | 67 |  |  |
|   |                                                     | 5.5.7   | Asynchronous triple modular redundancy                  | 67 |  |  |
|   |                                                     | 5.5.8   | Clock detection circuit                                 | 69 |  |  |
|   | 5.6                                                 | ADC     |                                                         | 70 |  |  |
|   |                                                     | 5.6.1   | Voltage measurement                                     | 70 |  |  |
|   |                                                     | 5.6.2   | Temperature measurement                                 | 71 |  |  |
|   |                                                     | 5.6.3   | Vglobal reference                                       | 71 |  |  |
|   |                                                     | 5.6.4   | Internal monitoring channels                            | 72 |  |  |
|   |                                                     | 5.6.5   | ADC update in PSPPv4 and PATT                           | 72 |  |  |
|   |                                                     | 5.6.6   | ADC test                                                | 72 |  |  |
|   | 5.7                                                 | Comp    | arator for over-voltage and over-temperature protection | 73 |  |  |
|   |                                                     | 5.7.1   | Comparator implementation in the PSPPv3                 | 73 |  |  |
|   |                                                     | 5.7.2   | Radiation hard comparator                               | 74 |  |  |
|   |                                                     | 5.7.3   | Comparator enhancements                                 | 75 |  |  |

|   |      | 5.7.4     | Test results with the comparators                                     |   | 76  |
|---|------|-----------|-----------------------------------------------------------------------|---|-----|
|   | 5.8  | Bypass    | s transistor                                                          |   | 77  |
|   |      | 5.8.1     | PSPPv3 bypass                                                         |   | 77  |
|   |      | 5.8.2     | Bypass design improvements                                            |   | 79  |
|   |      | 5.8.3     | Bypass performance tests                                              |   | 83  |
|   | 5.9  | Bandga    | ap reference                                                          |   | 87  |
|   |      | 5.9.1     | Diode based bandgap reference                                         |   | 87  |
|   |      | 5.9.2     | Transistor based bandgap structure                                    |   | 89  |
|   |      | 5.9.3     | Usage of the transistor-BG in PSPPv3                                  |   | 91  |
|   | 5.10 | Radiat    | ion hard regulators                                                   |   | 91  |
|   |      | 5.10.1    | Shunt regulator                                                       |   | 92  |
|   |      | 5.10.2    | Linear regulator                                                      |   | 92  |
|   |      | 5.10.3    | Regulator functionality test                                          |   | 93  |
|   |      | 5.10.4    | PARC regulator irradiation                                            |   | 94  |
|   |      | 5.10.5    | $Updated\ regulator\ concept\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\ .\$ |   | 95  |
|   |      | 5.10.6    | PSPPv4 regulator tests                                                |   | 96  |
|   | 5.11 | Power-    | on reset                                                              |   | 96  |
|   | 5.12 | Shift re  | egister for SEU tests                                                 |   | 98  |
| 6 | One  | untion of | and norfermance macaurements                                          |   | 00  |
| 0 | 6 1  | Initial   | test setup                                                            |   | 00  |
|   | 6.2  | Outor     | herrel demonstrator program                                           | · | 100 |
|   | 0.2  | 6 2 1     | Chip prohing                                                          | • | 101 |
|   |      | 622       | Electrical prototype                                                  | · | 101 |
|   | 63   | Irradia   | tion tests                                                            | · | 102 |
|   | 0.0  | 631       | 2017 TID irradiation                                                  | • | 102 |
|   |      | 632       | 2019 TID irradiation                                                  | • | 103 |
|   |      | 6.3.3     | SEU cross-section                                                     | · | 105 |
|   |      | 634       | Unsets in the PSPPv4 logic                                            | • | 109 |
|   | 64   | Stabilit  | ty and long term operation                                            | • | 113 |
|   | 0.1  | 6 4 1     | PSPPv3 long term test                                                 | • | 113 |
|   |      | 6.4.2     | PSPPv4 climate chamber test                                           |   | 113 |
|   |      |           |                                                                       |   |     |
| 7 | Risk | analysi   | s for serial power                                                    |   | 119 |
|   | 7.1  | PSPP      | failure modes and effects analysis                                    | • | 119 |
|   | 7.2  | Failure   | probability of chain                                                  | • | 120 |
|   |      | 7.2.1     | Without bypass                                                        | • | 120 |
|   |      | 7.2.2     | With bypass                                                           | • | 121 |
|   |      | 7.2.3     | Probability discussion                                                | • | 121 |
|   | 7.3  | Decisio   | on by the collaboration                                               | · | 122 |
| 8 | Cond | clusion   |                                                                       |   | 123 |
| - | 8.1  | Status    | and summary                                                           | , | 123 |
|   | 8.2  | Toward    | and summary is the PSPP                                               | • | 124 |
|   | 0.2  | roward    |                                                                       | · | 147 |

| Acknowledgments                                         | 127                                           |
|---------------------------------------------------------|-----------------------------------------------|
| Bibliography                                            | 129                                           |
| Acronyms                                                | 139                                           |
| List of Figures                                         | 142                                           |
| List of Tables                                          | 146                                           |
| A Introduction to ASIC design                           | <b>149</b><br>149<br>151<br>152<br>153<br>157 |
| B List of ASICs designed at the University of Wuppertal | 161                                           |
| C Failure Mode and Effects Analysis                     | 163                                           |

# Introduction

To understand what the universe is made of is one of the main goals of physics. Many discoveries and theoretical progress, which have been made in past decades, led to a more profound understanding of how matter reacts on a large range of energy levels. Models have been made to explain the movement of galaxies and to describe the interaction between subatomic particles. There remain still many unanswered questions and physicists all over the world try to answer them.

One very successful theory is the standard model of elementary particle physics. This theory explains three of the fundamental forces: the strong force, the electromagnetic force and the weak force. Only gravitation as the fourth fundamental force is not yet included. Precise measurements are needed to verify if the theory is correct or where something unknown could be hidden. To obtain such measurements, particle accelerators and detectors are built.

The Large Hadron Collider (LHC) is so far the largest machine built by mankind and is used to explore the standard model of particle physics. It is operated by CERN and located close to Geneva in Switzerland and France. To construct, equip it with the latest technology and operate the LHC accelerator and experiments like the ATLAS detector, many people have to cooperate. The ATLAS collaboration alone has more than 8000 contributors [1]. Existing technologies are pushed to their limits and novel approaches are developed to fulfill the requirements of the experiment. This goes from data processing and storage of huge quantities, data transmission and synchronization, power distribution and cooling as well as lightweight mechanical structures. All of this has to operate reliably in a highly radioactive environment, where access is limited.

An upgrade for the LHC is planned which will increase the collision rate to collect even more data starting from 2026. This will allow to investigate the properties of the standard model even further and to probe theories beyond the standard model. The experiments will also undergo upgrades to continue the excellent operation. A completely new inner tracking detector is in development for the ATLAS experiment, consisting of a silicon strip and pixel detector. The pixel detector will use a serial power approach to reduce the number of cables and thus material. To control and monitor single modules in a serial power chain, a new detector control system (DCS) is used. The DCS is based on three independent paths, which differ in availability and granularity. The control & feedback path is used to monitor and control the detector on module level. For this path, a new application specific integrated circuit (ASIC) is required. This pixel serial powering & protection (PSPP) called ASIC will be located close to the pixel modules. Therefore, it requires the same radiation hardness as the module.

Based on a proof-of-principle prototype, a fully operational PSPP chip prototype was realized in this work. It can operate independently in a serial power chain and monitor the module voltage and temperature. The PSPP provides further a bypass transistor capable to pass up to 8 Å of current. This bypass allows deactivating a module in the serial power chain. Irradiation studies verified the function of the PSPP at the expected dose levels. All required elements were tested and with some updates and fixes, could be used for a production chip.

In chapter 1, the background and motivation are given by introducing the LHC accelerator and the ATLAS detector. Furthermore, the basics of the standard model are explained. Chapter 2 gives an overview of tracking detectors and how they operate. Chapter 3 introduces the effects of radiation on integrated circuits and the methods to guarantee stable operation of the circuits. The detector control system used to monitor and control the ATLAS experiment is explained in Chapter 4. The pixel serial powering & protection (PSPP) chip and its development are described in Chapter 5. Measurements and irradiation tests performed with the ASIC are analyzed in Chapter 6. Chapter 7 looks at risks in a serial power chain, with and without a PSPP chip. An outlook and concluding summary finalize this thesis in Chapter 8.

# Chapter 1 Physics at the Large Hadron Collider

To study subatomic particles, accelerators and colliders are used. Leptons or hadrons are accelerated to nearly the speed of light and collided with each other. Particle detectors can record the path, type and momentum of the particles created in the collisions. This allows measuring the cross-section for the different decay modes. These measurements are used to verify the predictions from theories or determine the properties of particles.

## 1.1 The Large Hadron Collider

The Large Hadron Collider (LHC) is a circular particle collider located on the swissfrench border. It has a circumference of 26.7 km and is located about 100 m below the surface. The machine was designed to collide two proton beams with a 14 TeV center-ofmass energy and a peak luminosity of  $10^{34}$  cm<sup>-2</sup> s<sup>-1</sup>. Besides protons, the machine can also collide lead (Pb) ions [2]. It is the last in a long chain of accelerators operated by the European Center for Nuclear Research (CERN).

The LHC started delivering physics data in 2011 with a beam energy of 3.5 TeV. The two beams circulating in opposite directions are filled in bunches, each consisting of about  $10^{11}$  protons. The collisions occur every 25 ns, i.e. at a rate of 40 MHz. The energy and beam intensity was then increased during several updates. Between 2015 and 2018 the collider operated at a center-of-mass energy of 13 TeV and provided 160 fb<sup>-1</sup>. Since the start of the operation, a total of 189 fb<sup>-1</sup> were delivered by the LHC [3].

#### High Luminosity upgrade

Currently, the accelerators at CERN are in shutdown for maintenance and a further upgrade. The LHC is foreseen to run at design energy after this shutdown. Regarding luminosity, the LHC already exceeded the goals and reached twice its design luminosity with a record instantaneous luminosity of  $2.06 \times 10^{34} \text{ cm}^{-2} \text{ s}^{-1}$  [4].

An upgrade of the accelerator is under development to be installed in 2024. This high luminosity LHC (HL-LHC) will start operation in 2026 with an increased number of collisions by a factor of five or more. With an instantaneous luminosity of  $7.5 \times 10^{34}$  cm<sup>-2</sup> s<sup>-1</sup> the HL-LHC will collect about 4000 fb<sup>-1</sup> during a planned operation of ten years. This is more than ten times the integrated luminosity of LHC operation [5]. The HL-LHC dataset will improve further the precision measurements of standard model parameters, like Higgs couplings. It will also increase the sensitivity to events with low production cross section, allowing to investigate new physics with direct and indirect searches [6].

#### LHC experiments

There are four interaction points in the LHC, where the beams are colliding. An experiment is located at each point performing different physics analysis. A Large Ion Collider Experiment (ALICE) is measuring the properties of gluon fusion plasma created by heavy-ion collisions. The Large Hadron Collider Beauty (LHCb), focuses on b-quark physics and cp-violation. A Toroidal LHC ApparatuS (ATLAS) is a general-purpose detector investigating the standard model of particle physics. The Compact Muon Solenoid (CMS) has similar goals as ATLAS but with a different design. Therefore CMS and AT-LAS compete together. However, they also verify each other.

## 1.2 The ATLAS experiment

ATLAS is a general-purpose detector and was built to measure proton-proton collisions [7]. The ATLAS experiment started with the letter of intent in 1992 [8] and the detector is taking data since 2009. So far, the biggest achievement was the detection of the Higgs particle in 2012 together with the CMS experiment [9, 10]. It is also used to investigate new physics beyond the standard model.

#### 1.2.1 Overview of the detector

The ATLAS detector has a diameter of 25 m, a length of 44 m and a weight of 7000 t[7]. Figure 1.1 shows an overview of the full detector. It is constructed rotation symmetric around the beam pipe as well as symmetrically in the forward and backward direction<sup>1</sup>.

ATLAS consists of several detector systems, which can be split into further subdetectors, and has two magnetic systems [7]. Each system is labeled in Figure 1.1.

The muon chambers are the outermost system. They detect generated muons that travel through the inner systems almost unaffected. The tracks reconstructed from the chambers allow determining the momentum and energy of muons as they are bent by the magnetic field from the toroid magnets. The information from the muon system is also used to generate trigger signals.

The toroid magnet is made out of three elements: two end-cap magnets and the barrel toroids made out of 8 coils. They generate a toroidal magnetic field between (0.5 to 1) T.

The hadronic calorimeter is located inside the toroid magnets. It is used to measure the energy of protons and neutrons. It consists of a barrel section and end-caps in the forward region. The barrel calorimeter uses scintillating tiles as active elements and steel

<sup>&</sup>lt;sup>1</sup>ATLAS uses a right-handed coordinate system with its origin at the nominal interaction point (IP) in the center of the detector, and the z-axis along the beam line. The x-axis points from the IP to the center of the LHC ring and the y-axis points upwards. Cylindrical coordinates  $(r, \phi)$  are used in the transverse plane,  $\phi$  being the azimuthal angle around the beam line. Observables labeled "transverse" are projected into the x - y plane.



Figure 1.1: A computer-generated overview of the ATLAS detector [7].

as absorbers. The end-caps use liquid argon as active material and flat copper plates absorbers.

The energy of electrons and photos is measured by the electromagnetic calorimeter. It is designed with an accordion-shaped structure with lead absorbers in both the barrel and end-caps section. With this, full coverage in  $\phi$  without any holes is obtained [11]. Like the hadronic end-caps, the electromagnetic calorimeter uses liquid argon as active material. Trigger signals are also generated with data from both calorimeters.

A second magnet, the solenoid is built further inside the calorimeter. The solenoid creates an axial magnetic field of 2 T. This field is surrounding the inner detector (ID).

The ID is a vertex tracking detector and built in three parts. It is used to identify photons, electrons, muons, tau-leptons and to reconstruct tracks of hadronic decays. The ID follows the same structure as the calorimeter and muon chambers with a barrel section and end-caps. The outermost part of the ID is the transition radiation tracker (TRT). It uses gas-filled straw tubes as tracking devices. Going further inwards come four layers of silicon strip sensors, known as semiconductor tracker (SCT). The center part is the pixel detector made from four layers of silicon pixel sensors and three discs in each end-cap. Originally the pixel detector had only three layers. The fourth innermost layer, called insertable B-layer (IBL) was inserted in the long shutdown of 2013 and 2014 [12]. Elements from the front-end chip for IBL (FE-I4 [13]) were used in this work.

#### 1.2.2 Upgrade of the inner tracking detector

Several parts of ATLAS will be replaced or improved for the operation at the HL-LHC. There will be new small muon wheels in the end-caps, updated calorimeters and a new inner tracker replacing the current ID [5, 14–17]. Additionally, the readout systems of the sub-detectors are upgraded to achieve the higher bandwidth required [18]. I will focus on the pixel detector of the new inner tracker because my work is intended to be used in this sub-detector.

Not only will the current inner detector be at the end of its lifetime, but it also could not handle the increased collision rate at the HL-LHC. An estimated 200 inelastic protonproton interaction per bunch crossing will be seen during the HL-LHC operation. This is 4 to 5 times as many as in the current operation. To keep the same performance as the current ID under these conditions, a better resolution is required in the tracker. The ATLAS inner tracker (ITk) is designed to reach or even exceed this while having a lower radiation length.

#### Design of the ITk detector

The ITk is a full silicon vertex detector. It consists of four layers of double-sided strip detectors and six double-sided end-cap disks. Inside is a pixel detector with five layers in the barrel section and several rings in the forward direction. Figure 1.2 shows the layout as presented in the technical design report for the pixel detector [5]. The layout covers up to a pseudorapidity of  $\eta < 4.0$  and each track has at least 11 hits in pixel and strips combined.

The ITk implements new concepts for construction and power. It is built with a lightweight mechanical structure using carbon-based support elements. For further material reduction, the layout was optimized to have good coverage with fewer sensors. To achieve this, inclined modules were introduced in the ITk Pixel detector between the flat barrel section and the end-caps. The two innermost layers of the ITk Pixel detector are foreseen to be replaced after half the lifetime. Otherwise, the sensors would be degenerated by radiation damage and could not be efficiently operated.

Furthermore, the ITk Pixel detector uses a serial power approach. This reduces the



Figure 1.2: a) A computer-generated representation of the full ITk detector [5]. b) Schematic layout of the ITk [5]. One quadrant of active material is shown. The strip detector is shown in blue while the pixel detector is in red.



Figure 1.3: Services of the ITk Pixel detector [5].

number of cables for further material reduction. On the other hand, this introduces challenges and requires a new detector control system (DCS). The Serial power concept is explained in section 2.3.3 while the DCS is described in Chapter 4.

Figure 1.3 shows an overview of the services for the readout, power and DCS. To configure the modules or select relevant events, commands and triggers are sent from the readout electronics on the clock and command line. The data line is used to transmit the data from the modules to the readout electronics. The electrical to optical conversion is performed in the so-called optoboxes, located outside the ITk detector volume. The off-detector readout electronics and power supplies are located in the electronics cavern. This cavern is adjacent to the experiment hall and approximately 100 m of cables are required to the modules. Commercial components that are not required to be radiation hard, can be used there. The cables are split and regrouped at several locations, called patch panel (PP). They are enumerated with the lowest number PP0 at the end of the mechanical support structure to PP3 on the wall of the experiment hall.

The silicon sensors have a better performance at lower temperatures as explained in section 2.1. A two-phase cooling system with  $CO_2$  as a coolant is used to keep the sensors at low temperatures.

### 1.3 Standard Model of elementary particle physics

The standard model is a theory unifying three of the four fundamental forces. It is a quantum field theory, where the fundamental fields change only in quantized packages. These packages are commonly referred to as particles. The mathematical framework is based on quantum electrodynamics which has been unified with the weak interaction into the electroweak theory. Together with quantum chromodynamics, this forms the standard model of elementary particle physics. The unification with gravity is still missing and long sought-after, without success so far.

The standard model of particle physics states that there is a limited number of ele-



Figure 1.4: Summary of the elementary particles of the standard model [22].

mentary particles. These are summarized in Figure 1.4. The particles are grouped into fermions, which have a spin 1/2 and form the matter particles discussed in section 1.3.1, and bosons, which are force carriers and have an integer spin (see section 1.3.2). This introduction is loosely based on [19–21].

#### 1.3.1 Matter Particles

The matter particles are grouped into three generations, where each new generation has a higher mass than the previous. Only the first generation is stable and forms the matter in everyday life. The other particles decay into elements from the lower generations within  $(10^{-24} \text{ to } 10^{-6})$  s [20]. In addition to the particles shown in Figure 1.4 exists an anti-particle for each of the fermions.

The leptons consisting of the electron, muon and tau plus the respective neutrino are point-like particles. They interact with the weak and electromagnetic force, but not with the strong force. Electrons are found in the atomic shell. They play an important role in the chemical behavior of elements.

Quarks are particles with fractions of the electron charge, either -1/3 e or 2/3 e. In addition to the electric charge, quarks have a color charge, explained by the strong force in the next section. Due to confinement quarks only exist in bound states as hadrons. The energy increases with the distance between two quarks because of the gluon self-interaction, leading to the creation of new hadrons when sufficient energy is added.

Each hadron has an integer amount of the electron charge and is color neutral. They can be further classified into two categories: mesons and baryons. Mesons are built by a quark and an antiquark, while baryons are formed from three quarks. More than 100 hadrons are known today [19]. Examples are the pion, a meson consisting of an up-quark and an antidown-quark or the proton formed by two up- and one down-quark. Only the proton is stable<sup>2</sup>. The neutron is also stable when it is bound in an atomic nucleus. It consists of one up and two down quarks. Except for the top quark, which decays to fast, all quarks are found in hadrons. The latest hadrons found are pentaquarks discovered by LHCb [24].

#### 1.3.2 Force carriers

The fundamental forces act by an exchange of carriers between two particles. These carriers are the bosons in the standard model. The three forces, strong, electromagnetic and weak, are indicated with shapes around the fermions and corresponding gauge bosons in Figure 1.4 on the facing page.

#### Strong force

The carrier of the strong force is the gluon. This is a massless particle that transports the color charge. Quantum chromodynamics states that the quarks have a color charge, which can be either red, blue or green. The antiquarks have the corresponding anticolor, i.e. antired, antiblue or antigreen. Quarks of the same color repel each other, while opposite colors are attracted. The quark and antiquark in a meson have opposite colors and are thus color neutral. A baryon is also color neutral, with each constituent quark have one distinct color (or anticolors in case of an antibaryon).

The gluon has itself a color and a different anticolor charge. Due to the color charge, gluons couple together and not only to quarks. Therefore, the strong force has only a limited range of approximately  $10^{-15}$  m, which is about the size of an atomic nucleus. It increases with distance. When two quarks are pulled apart, a new quark/antiquark pair is formed in a process called hadronization.

The strong force is by far the strongest force. The electromagnetic force is about 100 times weaker, while the weak force is even  $10^5$  times smaller than the strong force [20].

The gravitational force has a strength  $10^{39}$  times smaller than the strong force. Because of this very weak interaction, it is mostly neglected in particle physics.

#### Electromagnetic force

Of the three forces described by the standard model, the electromagnetic force is the most known in common applications. It explains e.g. visible light or radio transmission. All charged particles interact with each other through the electromagnetic force.

The photon is the charge carrier of this force. As the photon has no mass, it travels with the speed of light. Furthermore, its range is not limited so that the electromagnetic

<sup>&</sup>lt;sup>2</sup>At least it has a lifetime  $>10^{31}$  years [23].

force interacts on an infinite range. The strength of the electromagnetic force decreases with the square of the distance from a source. As the photon is also chargeless, it doesn't interact with the other forces and is therefore stable.

#### Weak force

The weak force has three force carriers, the two charged W<sup>+</sup> and W<sup>-</sup> bosons and the neutral Z<sup>0</sup> boson. The mass of the three bosons limits their lifetime and therefore also the interaction range to  $\sim 10^{-3}$  fm. It has, therefore, no macroscopic effects. All fermions can interact weakly. The neutrinos interact through the weak force, but not the other two forces included in the standard model.

The weak force is responsible for nuclear decays and allows flavor changes. This means that a fermion can decay into another fermion through the weak force. The flavor changes have so far only been observed with the  $W^{\pm}$  bosons.

A spontaneous symmetry breaking is introduced by the Higgs mechanism to give the  $W^{\pm}$  and  $Z^{0}$  mass. This introduced an additional field, called Higgs-field. The Higgs mechanism also explains the mass of the fermions. One of the properties of this new field is that one or more additional bosons can manifest. In 2012 a candidate for a scalar Higgs boson with a mass of 125 GeV was found by ALTAS and CMS [9, 10]. This boson is also listed in Figure 1.4 on page 8.

#### 1.3.3 Missing elements

The standard model of particle physics as know of today is a very successful theory. It predicted several particles that have all been found. Nevertheless, there are still elements that are not explained by the standard model.

- The standard model doesn't explain why we observe a large matter-antimatter asymmetry. Almost all the observed universe consists of matter, even though matter and antimatter should have been created in similar amounts.
- Another question is why there are three generations of fermions, and where the huge mass difference comes from. The top-quark is 10<sup>5</sup> times more massive than the up-quark.
- Dark matter was so far not directly observed and the standard model doesn't include a candidate for it. From cosmology, it is known that there is much more dark matter than matter, based on observations that showed the revolution of stars in the outer regions of galaxies is much faster than the observed matter would allow.
- The gravitational force is not included. It is not yet answered if a unifying theory of all forces exists.

Some other theories go beyond the standard model. One much-investigated theory is super-symmetry, which gives a super symmetrical partner to each of the standard model particles. Some of these additional particles are candidates for dark matter. So far none of these have been observed though.

The increased data set from the HL-LHC and the higher precision of the ATLAS ITk detector will allow probing further into these questions.

# Chapter 2 Silicon tracking detectors

Physics experiments require knowledge about the particle types, their energies, momentum and direction. This information can be acquired by reconstructing particle paths in the magnetic field of a detector. Different kind of detectors technologies are used depending on the experiment. See for example the ATLAS detector in section 1.2.1. The tracking detectors are crucial to identify which particles originate from the same collision or decay. They are close to the interaction point and have a high resolution to precisely measure particles and their decay products.

An overview of the history of tracking detectors can be found in a lecture from Carl Haber [25]. Here an introduction in silicon tracking detectors is given. More details can be found in [26].

### 2.1 Silicon detectors

Many modern experiments have a silicon tracker in close proximity to the interaction point. The usage of semiconductors as radiation detectors goes back to the 1950's for applications in nuclear spectroscopy [25]. Silicon detectors have a high position resolution (order of  $100 \,\mu\text{m}$ ) and a very fast reaction time (below  $25 \,\text{ns}$ ). Further can modern silicon detectors be operated at high radiation doses.

#### 2.1.1 PN-junction

Most silicon detectors use a PN-junction in reverse bias as detection volume. The PNjunction is formed out of two doped semiconductors. One side is p-doped by adding impurities having one valence electron less than the semiconductor, forming holes where free electrons can drift to. The other side has impurities with five electrons for n-doping, introducing free electrons in the crystal. There might be other impurities or lattice defects which lead to intermediate states between the conduction and valence band.

When connected together, the electrons from the n-doped region can drift into the p-doped region and fill the holes. This creates a potential across the PN-junction, which prevents further drift and forms a depletion zone. A schematic representation of the PN-junction and the energy levels is given in Figure 2.1 on the following page. By applying an external voltage with higher potential on the anode (p doped side) the depletion zone is removed and the junction becomes conducting. Applying a voltage in reverse, i.e.



Figure 2.1: a) Graphical representation of a PN-junction used as a sensor.b) Illustration of the energy levels across the junction without external potential.

higher potential on the cathode (n doped side), the potential barrier and the depletion zone is increased. The width of the depletion can be calculated with equation 2.1.

$$W = \sqrt{2\mu\rho\epsilon \cdot (V_{BI} + V_{HV})} \tag{2.1}$$

Where  $\mu$  is the electron mobility and  $\rho$  the resitivity of n type material which is in the order of (1 to 10) k $\Omega$  cm. The dielectric constant  $\epsilon$  of silicon is  $11.9 \epsilon_0$ .  $V_{HV}$  is the externally applied reverse bias voltage and  $V_{BI}$  the built in potential which forms without external supply. Later is about 0.8 V for a silicon sensor. The sensor is biased to deplete the active area from charge carriers, providing maximum efficiency. The bias voltage that needs to be applied depends on the sensor thickness, sensor type and radiation damage. The sensor capacitance depends on the depletion width and is expressed per unit area with:

$$C = \frac{\epsilon}{W} \tag{2.2}$$

A low pixel capacitance reduces the power consumption and gives a large signal voltage  $(V_{signal})$  for the same charge collected. This can be expressed equation 2.3, where  $C_{tot}$  is the equivalent total capacitance of the sensing element.

$$V_{signal} = \frac{Q}{C_{tot}} \tag{2.3}$$

The total capacitance depends not only on the sensor capacitance but also on parasitic capacitors to neighboring sensor nodes and the input of the readout circuit.

#### Detection in the PN-junction

The depleted volume in the PN-junction is free from mobile carriers and forms an ionization chamber. Any charge in the volume drifts towards the electrodes under the applied electric field. Ionizing particles create electron-hole pairs in the semiconductor, which drift and diffuse to an electrode. This is also indicated in Figure 2.1 on the preceding page. A minimum ionizing particle creates about 25 000 electron-hole pairs in a 300 µm thick sensor [25]. The charge is measured and the height of the amplitude give information about the particle energy. A low noise readout circuit is required (see section 2.2).

#### 2.1.2 Silicon sensor modules: Hybrid detectors

Silicon sensors can be fabricated with photolithographic processes and therefore use structures in the order of  $100 \,\mu\text{m}$ . Instead of having a single diode, the anode is fabricated with several implants in the bulk. The implants can have different shapes which give their name, i.e. strip or pixel sensors.

Strip sensors are shaped like a line segment, i.e. the length is much larger than the width. For example, the strip sensors of the ATLAS ITk Strip detector uses a pitch of 75.5 µm and a length of 24.1 mm or 48.2 mm [14]. A strip sensor has therefore a higher resolution in one direction. By creating strips at a slight angle and forming double sided modules, a higher x and y resolution is achieved. To connect the sensing elements with the readout electronics, wire bonds are used for the strip sensors. Figure 2.2 shows a picture of a prototype for the ATLAS strip sensor. On the right are the front-end (FE) chips shown, while the sensor is seen on the left.

Pixel sensors are usually realized in a more rectangular or quadratic shape. In the IBL they have the size of  $250 \,\mu\text{m}$  by  $50 \,\mu\text{m}$  [12]. For the Phase-II upgrade of the ATLAS experiment, pixel sizes of  $(50 \times 50) \,\mu\text{m}^2$  or  $(100 \times 25) \,\mu\text{m}^2$  are considered [5]. The large number of pixels makes use of wire bonds impossible for the connection to the FE chip.



Figure 2.2: Prototype strip module with wire bonds to connect the FE chip. Picture taken during work at LBNL in 2014.



Figure 2.3: Drawing of a silicon pixel sensor with bump-bonded FE chip [27].

Instead, the FE is directly bump bonded to the sensor forming hybrid modules. The connection is made by solder bumps between an analog front end input and the sensor pixels as shown in Figure 2.3 on the preceding page.

A hybrid module consists of the sensor soldered to the FE chips and a flexible printed circuit board (PCB) glued to the backside of the sensor. On the flex are the passive elements mounted, which are required to operate the FE chips and connectors.

#### 2.1.3 CMOS detectors: Monolithic Active Pixel Sensors

Complementary metal oxide semiconductor (CMOS) imaging technologies are already widely used in industry for optical and X-ray imaging. Such detectors collect charge in a thin epitaxial layer, located below the layer integrating electronic circuits. These sensors, also known as monolithic active pixel sensors (MAPS), depend mainly on diffusion for charge collection. CMOS detectors are investigated by the high energy physics community and are becoming more widely used. For example, the ALICE detector plans to replace upgrade the inner tracker with MAPS devices [28].

New developments in CMOS technologies allowed to create faster sensors by using higher number of wells and increasing the voltage tolerance. Applying higher voltages allows to create depleted CMOS sensors and combine the readout circuitry in the same chip. Such devices have several benefits over the traditional hybrid approach: They have a faster turnaround time and cheaper production, as the bump bonding step falls away. The material is reduced as only one device is required. depleted monolithic active pixel sensors (DMAPS) can be thinned to 100 µm or below to further reduce the material. Smaller pixel sizes can be achieved as no solder contacts are required.

CMOS sensors are considered for usage in the outer layers of the ITk Pixel detector of the ATLAS experiment [5]. From the benefits above, they would be more suited for the innermost layers though. But the required radiation hardness and data rates are not yet achieved. As the outer layers have the most surface area, benefits in reduced production cost are greater there and the CMOS sensor prototypes can achieve the requirements.

There are different technologies used to build DMAPS and two main principles for designing the charge collecting electrode are used. Figure 2.4 shows these two methods.



Figure 2.4: Principle of DMAPS with collection node design using small (a) and large (b) fill factor [29].

In the large fill factor, the collection is done by a deep N-well. This well embeds also all the active circuits. The advantage of the large fill factor is a large depletion zone and has a fast charge collection due to small drift distances.

The small fill factor uses an electrode outside the active area. This reduces the pixel capacitance by a factor 10 compared to the large fill factor [29]. Due to the smaller electrode, the collection distance is larger and less efficient. There are optimizations by adding additional wells to improve the charge collection [30].

## 2.2 Read out electronics

The task of the read out electronics is to capture and measure the signal generated by ionizing particles. Because of the high number of channels in tracking detectors, the readout electronics is integrated in an application specific integrated circuit (ASIC). The usage of deep submicron technologies has the advantage, that also digitization and fast readout can be integrated in the same chip. However as the ASIC has to be close to the sensor, it also needs to withstand the radiation effects. Details about radiation hard ASIC design are given in Chapter 3.

Figure 2.5 shows a simple block diagram of an analog readout circuit. In case of the pixel detector, the sensor diode is directly connected through the bump bond to the analog front end and the return is through the FE chip. Alternatively the readout can also be done through a capacitor by adding an oxide layer between the sensor contacts and implants. This is the case for the ATLAS strip sensors [14].

The charge created by a passing particle is amplified and integrated by a preamplifier. The signal pulse is then optimized by a pulse shaping circuit before it is digitized. It can then be readout directly as analog value and digitized off detector. In most of the recent FE chips, it is digitized directly inside the chip. This allows to apply data compression and the digital values can be buffered until a trigger signal arrives.



Figure 2.5: Basic blocks of an analog front end electronics for the readout of a silicon sensor.  $C_F$  defines the final gain of the amplifier stage. Such an electronic is implemented for each channel of the sensor. Based on [26].

Different digitization methods can be used:

- **Time over threshold**: The time which the pulse from the sensor is above a set threshold is measured. This gives an indication of the amplitude of the impulse.
- Multi bit analog to digital converter (ADC): The amplitude of the signal is converted with an ADC with a given number of bits. The exact pulse height can be recorded, but this requires a very linear amplifier.
- **Binary readout**: Only detect if there is a hit or not by comparing the signal to a threshold. The amplifier does not need to be linear above the threshold, which makes the design simpler. However the pulse height information is lost.

For the ITk strip detector an ASIC with a binary readout is used. The time over threshold method is also implemented in the ITk Pixel FE chip.

#### Noise considerations

One noise sources in electronics is thermal noise. Velocity fluctuations in resistors is a common example [26]. It has a constant power spectrum ("white" noise) and depends on the temperature.

Another noise source is leakage current in the silicon detector. Defect states in the silicon crystal from impurities or due to radiation damage cause a leakage current to flow. This current is in the order of nA cm<sup>-2</sup> before irradiation but can become several mA cm<sup>-2</sup> after irradiation. Furthermore is the leakage current temperature dependent and doubles every ~7 °C [25]. A larger leakage current results in a higher power loss, heating up the sensor and creating additional noise. The so called shot noise is proportional to  $\sqrt{I_{leak}T}$  of the leakage current  $I_{leak}$  and temperature T.

The signal-to-noise ratio compares the desired signal to the unwanted noise. It is a ratio of powers and is expressed as  $SNR = \frac{P_{signal}}{P_{noise}}$ . A large ratio is desired for a high signal resolution, leading to better tracking performance. The pulse shaping circuit (see Figure 2.5 on the previous page) adjusts the power spectrum of the signal and noise. This allows to improve the signal-to-noise ratio.

#### 2.3 Detector power supply

The sensors need a biasing voltage to deplete the active area for a high efficiency (see section 2.1). The FE electronics needs power to measure and amplify signals from sensors and the readout electronics requires power to transmit data and receive commands. Each element has a different supply requirement. Furthermore, the detector should be split in powering units. This allows better control by switching on or off each unit individually.

A powering unit can range from single to groups of modules. Figure 2.6 shows different approaches for powering N number of modules. The different schemes are described below and compared in Table 2.1. The sensor bias high voltage (HV) follows normally the powering scheme of the low voltage (LV) for the FE power.



Figure 2.6: Schematic representation of different powering principles: (a) individual power, (b) parallel power and (c) serial power.

Table 2.1: Comparison of the different powering schemes for N module. Each module requires a supply voltage of  $V_M$  and a current of  $I_M$ . The cable is assumed for each option to be the same and has the resistance R for one way. f is the conversion factor for an ideal DC/DC converter.

|                  | # cables    | P cable                                         | V supply      |
|------------------|-------------|-------------------------------------------------|---------------|
| individual       | $2 \cdot N$ | $2N\cdot I_M^2\cdot R$                          | $V_M$         |
| parallel         | 2           | $2N\cdot I_M^2\cdot R$                          | $V_M$         |
| parallel + DC/DC | 2           | $2N \cdot \left(\frac{I_M}{f}\right)^2 \cdot R$ | $V_M \cdot f$ |
| serial           | 2           | $2\cdot I_M^2\cdot R$                           | $N \cdot V_M$ |

#### 2.3.1 Individual power

Each module has its own supply and cables for powering as shown in Figure 2.6a. This allows full control over each module, as they can be monitored and powered independently of the others. Such a powering is used in the current pixel detector of the ATLAS experiment. To reduce the required number of power supplies, several modules are grouped together. There is an additional regulator in the path between the power supply and the module, which allows individual control [31].

The drawback of individual powering is, that a high number of cables is required. This goes directly into the power loss on these cables and material budget.

#### 2.3.2 Parallel power

Instead of using separate power supplies for each module, a common power supply is used to power multiple modules (see Figure 2.6b on the preceding page). This reduces the amount of cables by a factor N, the number of modules connected together. As each module requires the same current, the power loss in the cable is the same as for individual power. An additional risk is if a module fails short, it would disable all parallel modules. The entire current would flow through the shorted module and could cause a hot spot.

To reduce the current in the cable from the power supply to the modules, a DC/DC converter can be placed on each module. For an ideal converter, the input power is the same as the output power. This allows to supply the modules at a higher voltage and lower current, reducing the losses in the cable by a factor f.

$$P_{in} = V_{in} \cdot I_{in} = \frac{V_{in}}{f} \cdot fI_{in} = V_{out} \cdot I_{out} = P_{out}$$
(2.4)

Non-ideal converters have a higher input power than what is delivered to the output. Existing DC/DC converters can achieve an efficiency larger than 75 % or higher [32].

By using a converter, single modules can be controlled. The output of the converter has to be switchable, to deactivate individual modules. This gives more flexibility in the operation. Without, it would only be possible to control all modules in parallel together.

The use of a parallel power with DC/DC converter is foreseen for the ITk strips detector [14]. A radiation hard converter will be used, which was developed at CERN [33].

#### 2.3.3 Serial power

In serial power (SP), a constant current source is used to power modules in series. The same current flows through each module. This also reduces the number of lines by N. Furthermore the power loss in the cable is the same as if a single module would be powered. The current source needs to deliver a voltage level equal to the sum of all module voltages in the chain, i.e.  $N \cdot V_M$ . Depending on the chain length, this can lead to quite high voltages.

A drawback is that the current amplitude is defined by the module which requires the highest power. This means, that each module has to pass the additional current not required for its operation. Hence, this results in a larger total power consumption. To do this, a shunt regulator is used, which is described further below. In contrast to the rather large DC/DC converters for parallel power, a shunt regulator can be realized within the FE chip. This makes the integration of a SP chain much smaller.

Complicating is that each module has a different reference potential. The data lines have to be AC coupled for readout with a common data acquisition. This also complicates the HV power lines for the sensors, especially if multiple sensors should be powered together.

A large risk in the SP chain is if a module fails open. Then the entire chain would be open and could not be powered. To control individual modules, an external bypass is required to provide an alternative current path. Then a module can be switched off while the remaining chain remains operational.



Figure 2.7: Schematic of serial power for ITk Pixel, as described in the technical design report [5].

#### Serial power for the ITk Pixel detector

Due to the material constraints, it was decided to use serial power in the ITk Pixel detector [5]. A module will consist of two or four FE chips, which are operated in parallel. This approach guarantees that all FEs on the same sensor have the same reference potential. The FEs will be bump bonded to a single sensor forming so called dual or quad modules.

Each FE has two shunt regulators integrated. Using multiple shunt regulators in parallel gives additional redundancy. If one regulator fails the others must shunt the additional current. As an additional safety feature, a bypass is foreseen in parallel to each module. A schematic of the SP chain is shown in Figure 2.7, including the bypass switch and the FEs in the module. A chain with quad modules is shown, while a dual module chain would have just two FEs instead of four.

More details on the control and monitoring for a serial power chain are given in section 4.3. The development of a bypass and monitoring ASIC is described in Chapter 5. The risks of a serial power chain are analyzed in Chapter 7.

#### Shunt Regulator

In a serial power chain a constant current is delivered. The readout electronics requires however a constant voltage and has changing current demand, depending on the occupancy. The task of a shunt regulator is to create this constant supply voltage. Furthermore it passes the current not required by the readout electronics. The schematic of a shunt regulator is shown in Figure 2.8. A constant output voltage is defined by comparing a reference ( $V_{ref}$ ) to the voltage divider R1 and R2. The gate voltage of M1 is increased, when the load connected to the Vreg output doesn't draw enough current. This way, the shunt device M1 passes the current, which is not required by the load and a constant current consumption is provided.

If two parallel shunt regulators don't have the same output voltage, the one with the lower voltage draws a higher input current. Therefore R3 is added to operate multiple regulators in parallel. R3 creates an input voltage dependent on the input current which equalizes the current and power load across the devices. This method was demonstrated with the current pixel modules [34]. After the shunt regulator, a linear regulator is added to provide an additional regulated voltage.

An improved version was developed to optimize the power consumption [35] with the



Figure 2.8: Shunt regulator schematic.



Figure 2.9: ShuLDO schematic [35].

schematic shown in Figure 2.9. In the shunt low drop-out (ShuLDO) regulator the order of the low drop-out (LDO) linear regulator and shunt regulator are inverted. This allows to replace the resistor R3 with the LDO pass device.

The LDO is formed of the error amplifier A1, the pass device M1 and the voltage divider R1 & R2. The voltage drop across M1 is regulated such, that the output voltage stays constant. The shunt device M4 is in this circuit part of the load of the linear regulator. To steer the amount of current flowing through M4, the current through M1 has to be sensed. This is done by the current mirror M1 & M2, where a fraction of the LDO load current is sensed. The differential amplifier A3 compares this sense current and a reference current, defined by R3 and regulates the current passing through M4.

A further improved version of the ShuLDO will be used in the ITk Pixel front-end chip. This regulator will include an adjustable offset to better regulate the IV characteristic of the regulator. Furthermore, are some additional protection features under development, like an over-voltage clamp and under-shunt current protection.

# Chapter 3 Design of radiation hard ASICs

An ASIC is designed to perform a dedicated task in contrast to a microcontroller or FPGA which can be programmed according to the needs. Most ASICs are created in CMOS technology. This is a widely used technology in industry and the process is highly miniaturized to create complex circuits with billions of transistors. Most recent processes for integrated circuits design have minimal feature sizes of only 7 nm [36]. ASICs are also frequently used in high energy physics applications but mostly with larger feature sizes. The ATLAS ITk detector will use feature sizes of 130 nm and 65 nm [5, 14]. These older and larger technologies are still widely used and have some advantages over newer technologies. They are less expensive and have been subject to thorough analysis, in particular effects from radiation are well understood. ASICs can hence be designed for usage in extreme environments. Integrating sensing elements in commercial CMOS technology is also becoming more used as described in section 2.1.3. Commonly, the expression "chip" is used synonymously for a fabricated ASIC.

This chapter describes the different effects due to radiation, followed by possible methods to make a circuit tolerant against damages. A brief introduction of some basic circuit elements together with a description of the ASIC design process is given in Appendix A. For more detailed explanations see for example [37].

# 3.1 Radiation damage in integrated circuits

An integrated circuit exposed to ionizing radiation, like in the ATLAS experiment, will observe damages. Researchers at CERN and elsewhere made already many investigations on effects observed due to radiation [38–41].

Damage can occur in two different ways. Firstly, by cumulative effects caused by ionization and displacement. Ionizing particles create charges which are accumulated in a device and can affect the behavior, in the worst case leading to a failure of the circuit. In case of displacement damages, the structure of silicon crystals is changed by a high energy particle hitting the device. Details about cumulative effects are given in the next section.

Secondly, single event effects (SEEs) which can happen at any time and are caused by single particles. This causes for example a stored bit to change its value, also known as bit flip. Such effects are described in section 3.1.2.

The more recent CMOS fabrication technologies (e.g 250 nm, 130 nm or smaller) have some intrinsic radiation hardness, i.e. chips fabricated in these technologies are less affected by damages than chips from older technologies (see section 3.2). On the other hand, there are new effects not present in previous technologies [42, 43]. Therefore each technology has to be studied carefully.

Design techniques can be used to make transistors more resistant toward damages from radiation. Some techniques apply on circuit level while the fabrication process is altered with others [38]. Section 3.2 describes in more details some techniques to protect against damages.

#### 3.1.1 Cumulative radiation effects

The small damages from radiation are added up over time when a chip is exposed constantly. Several effects are affecting the behavior of the circuit. Other than for an SEE, where a single particle causes a failure, it is the cumulative effects of all hits which change the characteristics. These can either come from collected charge deposited by ionizing particles or displacement damage in the semiconductor or insulator materials of the circuit caused by non-ionizing particles [44].

#### Total ionizing dose effects

The electron-hole pairs created by ionizing particles can be the cause for upsets as described above. The charges may get trapped if they are generated in the oxide or the interface region. The electrons are relatively mobile in the oxide, while the holes or positive carriers are much slower. The electrons leave therefore the oxide which causes a positive net charge in the oxide from the remaining holes. In a metal oxide semiconductor field effect transistor (MOSFET) this can cause a shift in the threshold voltage of the transistor.

The gate oxide thickness is reduced with advanced scaling methods. The thin oxide doesn't trap as many charges and they can easily escape the oxide by tunneling [44]. This makes new technologies already intrinsically radiation hard and allows operations at total ionizing dose (TID) above 100 Mrad. Not all design parameters scale the same way and with node sizes of 130 nm and below other effects appear that were not seen before. [42] performed a study on a 130 nm process where different transistor structures were investigated. They showed that the shallow trench isolation (STI) is the source of most radiation-induced damages. The STI is used to isolate different transistors against each other and is large compared to the gate oxide. Trapped charges in the STI along the transistor are almost entirely positive. They attract negative charge carriers and create a conductive channel. This is seen in a rise of the leakage current  $I_{leak}$  and a reduction of the threshold voltage  $V_{\rm th}$ . Figure 3.1 on the next page shows the two values as a function of TID for different transistor structures. It can be seen that smaller transistors close or at minimum size are stronger affected than large transistors. A minimum sized transistor with W/L = 0.16/0.12 µm observes an increase of the leakage current by a factor 1000, while a 10/1 transistor observes an increase in the order of 10, i.e. two orders of magnitude less. This is because the channel created by the trapped charges in the STI makes up a larger fraction of the width in narrow transistors than for wide. It



Figure 3.1: Radiation-induced changes in the leakage current  $I_{leak}$  (a) and the threshold voltage  $V_{th}$  (b) for different transistor sizes [42].

should be mentioned though, that the larger transistor has an almost ten times higher initial leakage current. Hardly affected are enclosed layout transistors (ELTs), which are transistors with a ring-shaped gate. More details about ELT are given in section 3.2.1.

In Figure 3.1 it can also be seen that the  $I_{leak}$  and  $V_{th}$  peak at a few Mrad and decrease again. The increase at low doses is coming from positive charges trapped in the STI as described above. Negative charges trapped in the interface region start to reduce the effect of the oxide charges only with some delay. Because of the different processes, a peak can be observed. This effect is also dependent on the transistor geometry and is stronger for narrow channels (small W). The effect was observed and referred to as radiation-induced narrow channel effect (RINCE) by [42].

The height of the RINCE induced peak depends on several parameters like temperature, dose rate and other operating conditions. Studies were done for the FE-I4 chip used in IBL<sup>1</sup> showing that higher operating temperature leads to lower peak levels [45]. The same CMOS technology was used in this work. TID studies were made with the developed chips and are described in section 5.10.4 and 6.3.2.

#### **Displacement damages**

Incident particles can cause damages to the silicon lattice. Atoms are displaced and create additional defects in the silicon.

The displacement damage depends on the non-ionizing energy loss and the particle type and energy [26]. Through displacement damages, additional mid-gap states are introduced. These increase the leakage current leading to higher noise in a silicon sensor. In worst case, the additional leakage current heats a sensor causing an even higher leakage current. This thermal runaway can end in the destruction of a sensor cell [46]. A change in doping concentration is also observed, which affects the characteristics of devices.

These damages were not investigated in this work and are not further discussed. More information can be found in [26, 46, 47].

<sup>&</sup>lt;sup>1</sup>For more details see [12, 13]

#### 3.1.2 Single event effects overview

Any integrated circuit is susceptible to single event effects (SEEs). They are caused by highly energetic particles depositing charge in a circuit. Such particles are present in natural environment even though at very small rate. Circuits used in experiments at the LHC observe a higher flux of high energetic particles and therefore it is important to understand the effects.

SEEs were first investigated for space application, but with reduced feature size these effects become also important in terrestrial applications. The history of discovery and investigation into SEE is described in [38]. It is expected that SEE will occur at the radiation levels for the ATLAS ITk Pixel detector [5].

#### Productions of single events

SEE can be caused by a single interacting particle. They release charges in the silicon by direct ionization or indirect ionization described further below. The charges have to be collected by a sensitive volume to cause an upset. The sensitive volume is represented in 2D for a MOSFET in Figure 3.2. The transistor is most susceptible to charges below the gate, which can be collected by drain and source and create a short current path. [48] found that a sensitive volume of  $1 \ge 1 \ge 1 \ge 1 \ge 1 \ge 1 \ge 1$  matches best simulation and experimental data of multiple devices.

If the charges are created in the bulk of a device, electrons and holes recombine quickly and have no or only a small effect. Although, they might be trapped in oxides and have effects on a longer time scale (see section 3.1.1). MOSFETs are the most susceptible elements in case of integrated circuits, especially in logic circuits. The created charges are drifting in the electric field from active PN-junctions. The field itself can also be distorted by the particle path and becomes funnel-shaped. This leads to a faster collection of charges further away [49]. Collected charges create a current at the junction contacts or can cause a short but intense drain-source current. If this happens in a blocking transistor i.e. "off", the current mimics for a short time the "on" state.







Figure 3.3: Simplified representation of a direct ionization. The red arrow indicates the incident ion creating the charges leading to a current path in the transistor.

3.1 Radiation damage in integrated circuits



Figure 3.4: Simplified representation how an indirect ionization can cause one ore multiple upsets. The yellow arrow represents the incident hadron which interacts with a nucleus and produces a secondary particle (red arrow). Which could cause multi-bit upset (MBU) if traveling far enough.

**Direct ionization** means that an electrically charged particle frees electron-hole pairs along its path in the semiconductor and loses energy through this process. The energy loss per unit path length is described with the linear energy transfer (LET). Heavy ions can create rather large charges on their way through a circuit. This is represented in Figure 3.3. A particle with a LET of  $97 \,\mathrm{MeV cm^2/mg}$  results in a charge deposition of  $1 \,\mathrm{pC/\mu m}$  [38]. The deposited charge leads to a transient current which causes the upset. See section 3.1.3 for details on how it can affect a circuit. Heavy ions primarily deposit energy through direct ionization. All ions consisting of more than one proton are counted here as heavy ions. Such ions are frequently observed in space applications, but also radioactive material emitting alpha radiation can cause upsets. This was observed in the late 1970s when packages with radioactive contamination were used [39, 50].

**Indirect ionization** is the process used by light particles to cause SEEs. These particle e.g. hadrons created in accelerator experiments don't usually deposit enough charge to cause directly an upset. However, when they hit a nucleus in semiconductors, products of the inelastic collision could be e.g. alpha particles. These products are themselves deposing a higher amount of charge by ionization and can create an upset. Figure 3.4 shows this graphically, considering only the first transistor is hit. This was originally investigated for space application as shown in [51] where effects of upsets from protons were investigated for satellites. Indirect ionization from neutrons was studied in [52]. They showed that nuclear interactions from neutrons, which are not ionizing, can cause upsets in logic devices. [52] did also show that a neutron shower doesn't have more effect than single particles. Each particle can be individually accounted on their own.

#### High dose rate effects and multi-bit upsets

Most SEE occur due to single particles deposing some charge in the circuit as described above. MBUs are events where more than one SEE happens simultaneously. The most obvious source is multiple simultaneous particles causing each an individual upset. This depends on the particle flux and can be treated as individual hits.

Another source for MBU is the rail span collapse effect. This is observed in high dose rate environments where the supply rail breaks down [53]. The photocurrents are induced by the radiation and cause voltage drops on the power rail. Thus either reducing  $V_{DD}$ rail or increasing  $V_{SS}$  rail in a local area of a chip. The affected area is more susceptible to single event upset (SEU) because of lower supply voltage. The supply voltages might drop even below the required operation voltage leaving them in an undefined state. The cells go to their preferred state after power-up, which can be different than what was stored [54]. This mechanism can cause multiple memory cells to change their value.

Also, single particles could cause MBU as described in [55] or [56]. Charge injected in a transistor can affect a neighboring device. The second device sees less than 40% of the charge of the hit. However, parasitic bipolar transistors can amplify the charge to a critical level [56].

Another example is that a proton can generate in a nuclear event heavy ions inside a chip. The heavy ion travels normally roughly in direction of the incident proton and can travel through several sensitive volumes. The charge deposited for the different volumes can be large enough to upset each, as indicated with the second MOSFET on the right of Figure 3.4 on the preceding page. This observation has a dependency on the incident angle. If a proton hits at a small angle to the surface, a secondary ion created in the chip will travel horizontally through the chip. While ions created from a perpendicular incident proton will travel vertically through the chip and have less chance to hit multiple transistors [55].

#### Single event latch-up

Latch-ups are a failure mode of a CMOS device where a low impedance path between the supply and ground opens in parasitic devices. Such a failure disrupts the function and worse can damage the circuit. The source of a latch-up is a parasitic PNPN path as shown in Figure 3.5 on the next page. There is a lateral PNP bipolar transistor formed from the source of the PMOS device through the N-Well to the substrate. A second vertical transistor is from the NMOS source contact with the substrate and the N-Well. A current injected in the base of the PNP causes a forward biasing of its base PN-junction. This causes a larger current to flow through the substrate. Because of the substrate resistance, a voltage drop on the NPN base junction can occur and open the NPN as well. This causes positive feedback and the PNPN path becomes low ohmic. Such a path can only be closed by breaking the current flow, which can be achieved by switching off the supply.

The source of a latch-up is an over-voltage or a current spike on an input pin. An ionizing particle can also trigger such a failure, referred to as single event latch-up (SEL). Cosmic particles were related to SEL as reported in [58].



Figure 3.5: Cross section of a CMOS inverter with parasitics latch-up circuit [57].

#### 3.1.3 Single event upsets in logic

A single bit-flip in a register or latch due to radiation is called single event upset (SEU). This is a static error which is often caused by a transient effect also known as single event transient (SET). Both failures are reversible, as long as they don't bring the logic into an unknown state. A transient disappears directly when the injected charge from a hit is removed and the circuit recovers to normal operation. However, such a transient leads to an upset if it is propagated to a memory cell. An affected memory cell stays upset until it is rewritten with the correct value. Because of the similarities and common sources, I discuss both effects (SEU and SET) together.

If an upset, as described in section 3.1.2 above, is produced in a transistor used for combinational logic, it generates a transient effect. Such transients provoke glitches in the voltages of affected nodes. Connected logic elements could interpret this transient voltage change as a different logic state and therefore generate a wrong result. Figure 3.6 on the following page shows how a single particle can flip a memory cell used in SRAM. A current spike induced in M3 causes a temporary change of the state at node X2 from '1' to '0'. Before the collected charge is drained away through M2, the second inverter (M1 & M4) changes its state and thus node X1 switches too. This enforces the false state in node X2, created by the particle and thus the upset is becoming static.

The same effect described for a static random access memory (SRAM) is also valid for a data flip flop. The flip flop uses the same bistable circuit for storing the information (see Figure A.6 in the appendix).

The likelihood for an SEU to occur is expressed in terms of cross-section  $\sigma$  with the unit cm<sup>2</sup> and is defined with

$$\sigma = \frac{N_{SEU}}{F \cdot N_{bits}} = \frac{N_{SEU}}{\Phi \cdot \Delta t \cdot N_{bits}}$$
(3.1)

where  $N_{SEU}$  is the number of observed SEU, F is the total particle fluence in cm<sup>-2</sup>,  $N_{bits}$  is the number of bits in the used memory,  $\Phi$  is the particle flux in cm<sup>-2</sup> s<sup>-1</sup> and



Figure 3.6: Schematic of a SRAM cell with a particle hitting one transistor (red arrow) causing an SEU.

 $\Delta t$  is the time interval in s. The probability of a bit flip can be defined as:

$$p = \sigma \cdot \Phi \cdot \Delta t \tag{3.2}$$

This assumes that  $\Delta t$  small enough so that not more than one SEU occurs in the time interval.

#### 3.1.4 Single event effects in analog elements

The same processes as described in section 3.1.2 occur in analog circuits. Due to the normally larger transistor sizes in analog elements, the critical energy for a SEE is higher. Thus analog circuits are by design less affected than digital. Transients result mainly in additional noise. An evaluation of SET in analog elements is nevertheless important for circuits with interfaces to both analog and digital domains. Examples are ADC, comparators or voltage controlled oscillators.

#### 3.1.5 Simulations of single event effects

Simulations are used to investigate the physical processes in a transistor under irradiation. Methods of physical device models are described in [38].

To study larger circuits an approach based on Simulation Program with Integrated Circuit Emphasis (SPICE) is used. The particle hit is simulated by a charge pulse injected into the circuit at given nodes. To emulate the charge pulse, a double exponential shaped spike is commonly used [59]. The pulse can be expressed with:

$$I_{SEU} = \frac{Q_C}{\tau_d - \tau_r} \left( e^{-\frac{\tau}{\tau_d}} - e^{-\frac{\tau}{\tau_r}} \right)$$
(3.3)
$Q_C$  is the induced charge in the node while  $\tau_r$  and  $\tau_d$  are the rise and fall time constants. [59] uses such an approach and performs multiple simulations with different charges to create heuristic results. This method can be used to detect the most susceptible nodes in a circuit. They detected that transistors in the load path of a protected memory cell are a weak point because a single hit can affect the memory [60].

Based on this simulation approach, a circuit developed during this work was analyzed in section 6.3.4.

## 3.2 Radiation hard circuits

Different methods exist to address the danger of radiation-induced failures. In older technologies, it was necessary to adjust the fabrication process to achieve radiation hard circuits. The user has to qualify the process for the desired tolerance, which can be very expensive [61]. This is also called hardening by process.

With Moore's law [62] scaling towards smaller technologies nodes also the gate oxide became thinner. The radiation-induced threshold shift of transistors was found to be smaller with reduced oxide thickness [63, 64]. Therefore, commercial CMOS processes can be used without changes to the process. Guidelines should be respected to protect the circuit against radiation effects nevertheless. Design techniques within limits of the processes are used to get the circuit radiation hard. This is also known as hardening by design and includes several methods.

## 3.2.1 Protection against TID effects

Transistors with very thin gate oxides (<5 nm) observe almost no effect form TID because there are fewer trapped charges in the oxide (see section 3.1.1). Other effects due to the STI are still present though [42, 61].

Many technologies offer different transistor types, including dual gate oxide. The core transistors use thinner gates, leading to faster devices but with smaller maximal voltages. For I/O stages dual oxide transistors with thicker gates are used that support higher voltages. A circuit can be protected against TID by using only thing gate transistors, as the tolerance increases with thinner oxides. To still support higher voltages cascoded structures have to be used.

Large transistors should be used to reduce the impact of TID (see Figure 3.1 on page 23). Analog designs already use large transistors for better characteristics and matching. For digital circuits, minimal size transistors are common because of their high integration density and speed.

Another way to reduce the influence from charges trapped in the STI is to avoid the direct contact from the doped regions to the oxide where a current path can form. This can be done with a circular gate or by surrounding source or drain with the thin gate oxide. The ELT transistor design with a circular gate is more commonly used [61]. Figure 3.7 on the following page shows an example layout. The strap on the ringed gate is required as the design rules do not allow gate contacts in the active area. Further was



Figure 3.7: Layout for a ELT device with additional guard ring between two transistors [65].

a guard ring placed around one of the two transistors. This guard ring prevents leakage current between n+ diffusions to flow [65].

The disadvantages of using ELTs are that it is not commonly available in the design kits and calculating the actual W/L ratio is not straight forward. Also, the gate capacitance is larger than for a standard transistor and the layout is also not symmetrical. See [61] for more details on the modeling of an ELT.

ELT transistors were implemented in reference circuits used in the ASIC developed in this work. The corresponding circuits are described in section 5.9 and 5.11.

## 3.2.2 Protection against SEU and SET

The effect of an SEU depends on the memory cell affected. Some bits are more important than others, for example in global configuration. Different circuits and methods exist to protect against SET and SEU. The protection comes with an increase in the area and power consumption. Depending on the application on or the other might be more suited. Some of the methods are described below.

#### De-glitching logic

SET can be seen as glitches in the logic path. De-glitching methods prevent these transients from propagating, for example by adding filters. However, filters reduce the working speed of the logic.

Another method would be to duplicate the critical paths. If only one of the paths is affected by an SET, the other still holds the correct value and can correct the upset signal.



Figure 3.8: Standard (a) and protected (b) latch [67].

Figure 3.9: Schematic of a DICE memory cell with two interleaved latches [70].

### Protected latch

A capacitance is added in the feedback loop of the latch as described in Figure 3.8, similar to adding filters in the logic path. The added capacitance prevents an SEU from happening because the glitch caused in one inverter is less likely propagated and stored. The cell can still flip, but a much higher charge is required [61, 66]. The drawback of this method is a slightly larger footprint, lower speed and higher power depending on the capacitance size.

A protected latch was used in the token bit manager chip for the CMS experiment. They measured a cross-section of  $2 \times 10^{-15}$  cm<sup>2</sup> for the protected latch which is a factor 100 better than the standard latch in the same technology [67].

## **DICE** latch

Another memory structure with increased radiation hardness was proposed by [68]. The dual interlocked storage cell (DICE) is similar to the normal SRAM cell of a bistable latch. The latch was duplicated and interleaved as shown in Figure 3.9. With this new design at least two nodes have to be upset to flip the cell.

[69] measured with optimized transistor sizes and interleaved layout design a gain of 26 compared to a standard latch. Though the area increase is almost a factor 2, the DICE cell is still small enough to be integrated in larger memory structures. This allows the usage e.g. in the pixel matrix of FE chips [69, 70].



Figure 3.10: Schematic for the triplication of logic with different levels of the TMR implementation. a) only memory triplicated (simple TMR), while in b) everything is triplicated (full TMR).

#### Triple modular redundancy

A very effective way to make logic almost immune to SEEs is by triplicating the logic, also known as triple modular redundancy (TMR). The easiest method here is to just triplicate each register and adding a majority voter logic at the output. Figure 3.10 shows a schematic for possible implementations.

The voter implements the majority decision according to the following Boolean equations:

$$O = (A \cdot B) + (A \cdot C) + (B \cdot C) \tag{3.4}$$

$$E = (A \cdot \overline{B}) + (B \cdot \overline{C}) + (C \cdot \overline{A}) \tag{3.5}$$

Where A, B and C are the three register states and O is the voted output. The signal E is the error output which indicates if there is a minority, i.e. a bit flip. This could be used to trigger a reset of the register, count observed SEUs or flag a warning that something happened.

As represented in Figure 3.10 different schemes exist. Triplicating the combinational logic can also protect against SET. If a transient occurs in the combinational logic of the simple TMR, it could propagate to all registers and upset all three at the same time. In full triplication, three combinational paths exist and a transient in one path upsets at most one register. The other two still hold the correct value.

Triplication should be implemented before synthesis. It is important to include the triplicated design in the synthesis because voters add additional delays in the combinational path. These should be considered for a correct timing analysis by the synthesis tools. However, constraints are required to protect the TMR signals. The synthesis tool will remove them during the triplication otherwise.

The designer needs to be careful, that all required signals are triplicated and also that the voter output is used to generate the next state of a register. Else, the protection could be useless. The tool TMRG [71] helps to generate triplicated designs. After the logic is developed and tested, the TMRG tool creates a new file with added triplication. By adding special syntax to the Verilog<sup>2</sup> code, one can control which signal should be triplicated and where a voter has to be added. A constraint file is also created to protect the triplicated signals.

The full TMR approach was used in this work for protecting the logic (see section 5.5). This was done with the TMRG tool mentioned above. The efficiency of the protection implemented was tested in a proton beam as described in section 6.3.4.

The added protection comes with the cost of a larger area required and higher power consumption. For the synthesized logic developed during this work, an area increase of 4.8 was observed. The size increases by more than a factor three on one side due to the additional logic from the voters, but also because less optimization can be performed to respect constraints protecting triplicated nets.

A triple redundant latch was implemented by [70] for usage in a memory block. An SEU tolerance improvement by a factor of 170 was measured for this latch. In their design, they applied the simple method where only the memories were triplicated as shown in Figure 3.10a on the preceding page. [69] further improved the triple redundant latch design by triplicating the load logic and interleaving the layout. A cross-section of  $6.8 \times 10^{-18}$  cm<sup>2</sup> was measured for the updated design, which is a gain of 3920 compared to a standard data flip flop with a cross-section of  $2.8 \times 10^{-14}$  cm<sup>2</sup>. This triple redundant latch has an area 12 times larger than a standard latch. The additional increase might be explained by the fact, that also the voter is included in the latch.

**Theoretical cross-section** of the TMR protected logic can be estimated using probability calculations. Some hypotheses are made for this:

- 1. The three registers observe SEUs independent from each other.
- 2. The registers are refreshed in an interval  $\Delta t$ .
- 3.  $\Delta t$  is small enough that two SEUs in the same register are insignificant.
- 4. p is the SEU probability in a single register and is smaller than 0.5, defined in equation 3.2.

The probability for one flip in a triplicated register is defined in equation 3.6, which can be found using a probability tree [72].

$$p_{TMR} = p^3 + 3p^2 \left(1 - p\right) = p^2 \left(3 - 2p\right) \tag{3.6}$$

Assuming that the cross-section for a TMR register follows equation 3.2, the TMR cross-section can be calculated with

$$\sigma_{TMR} = \frac{p_{TMR}}{\Phi\Delta t} = \sigma^2 \Phi \Delta t \left(3 - 2\sigma \Phi \Delta t\right) \tag{3.7}$$

 $<sup>^2\</sup>mathrm{A}$  hardware description language used to describe logic. See also Appendix A.

When applying these calculations to the results from [69], quite a discrepancy is observed. From the above equation, the triple redundant latch should have a cross-section 10 decades smaller than the standard latch. Some assumptions had to be made for the calculation:  $1.9 \times 10^{10} \text{ cm}^{-2} \text{ s}^{-1}$  was used for the particle flux<sup>3</sup> and  $\Delta t$  was assumed as 100 ns given that the latch reloads the voter output as soon an error is detected. The triple redundant latch is not refreshed with a constant clock, but with a load signal. This load signal comes either from an external source to set the latch, or from the error output of the majority voter. The time interval is the propagation delay of the load signal. Hypothesis 3 should be fulfilled.

On the other hand hypothesis 1 is not guaranteed. Multiple latches could be upset together as discussed in section 3.1.2. Further, an SET created outside the latch or in the load signals renders the TMR protection useless if the signal goes to all three latches. Some of these effects are reported by [69].

They implemented several versions of the latch. The triplication of the reload logic improved the cross-section by 5, compared to the simple triplication. This indicated that SETs have a large effect. Further were two fully triplicated latches interleaved in the layout to increase the distance between two bits. This reduced the cross-section by another factor 4 compared to the non-interleaved layout. Therefore also spatial spacing helps to reduce MBUs.

Studies regarding the SEU tolerance were also made in this work and are documented in sections 6.3.3 and 6.3.4.

## 3.2.3 Protection against multi-bit upsets

Besides single effect, there are also events affecting multiple bits as discussed in section 3.1.2. It has to be expected that these effects are occurring in the HL-LHC environment. As these effects are seen in larger areas, they can not be completely prevented by the methods described above.

The rail span collapse depends on the power network. Approaches to model the effect on circuits are existing [74, 75]. A good power bus is required to dissipate the photocurrents. Adding well contacts to the power rails helps to reduce the resistance. Further, the device collection area is minimized by inserting substrate contacts between logic cells [76].

Protecting against MBU can be achieved with spatial separation of sensitive nodes. Interleaved designs, where sensitive nodes are spaced farther from each other, have increased hardness against SEU [69]. The charge sharing of transistors close together can also be greatly reduced by increasing the distance. Even better is the addition of guard rings as shown by [56].

The relaxed timing constraints and large area available for the logic designed in this work allowed to loosely place the logic blocks. The gaps were filled with additional substrate contacts for better power contact.

<sup>&</sup>lt;sup>3</sup>based on the data from the website of the IRRAD test beam facility [73].

## 3.2.4 Methods to prevent latch-up

Latch-up has been a problem since the early stages of CMOS technology. It became more problematic in smaller feature sizes because the required current to trigger a latch-up is smaller when the devices are closer together. The fabrication processes have been adjusted by adding trench oxides between devices. Though these are creating other problems with cumulative irradiation (see section 3.1.1). Also, updated design rules like control of the space between devices help to prevent latch-up. Adding enough substrate and N-well contacts or even guard rings create paths where the current can directly flow to ground or supply [77]. This increases the turn-on current necessary to open the PNPN path and therefore reduces the susceptibility for a latch-up.

Another effective method to mitigate latch-ups is by adding an epitaxial layer [77]. The epitaxial layer of the substrate, where the MOSFETs are located, is lightly doped for the best performance. The remaining substrate is highly doped and has an increased conductivity. This adds a low ohmic path in parallel to the base resistor of the parasitic NPN and therefore a higher current is required to have a sufficient voltage drop for opening the NPN transistor. This method is process dependent while the insertion of guard rings and respecting the design rules is task of the designer.

The design made in this work was done respecting all design rules to prevent latch-up. Additional Guard-rings and substrate contacts were placed around all functional blocks.

# Chapter 4

# Detector control system

The health of the ATLAS detector is important for users and operators to assure good quality of the data. Whenever beams are present, it is not possible to access the experiment, because of the radiation from the collisions. Only during times when the LHC is not delivering beam is it possible to perform maintenance work. This maintenance is further limited to the off-detector electronics and the outer elements of the experiments which are easily accessible. Many parts of the experiment, like the pixel detector, can only be accessed during long shutdowns, when there is enough time to open the detector. Therefore, it is crucial to get information about the detector status to protect the experiment before damage can occur and to understand the working conditions. This is done by the detector control system (DCS).

# 4.1 Control and monitoring of ATLAS

An ATLAS wide detector control system (DCS) is used to monitor the status of each sub-detector. It uses the supervisory control and data acquisition (SCADA) software SIMATIC WinCC Open Architecture [78] to collect status information from each sub-detector. Furthermore, the DCS interacts with the central LHC control center. The information is presented to the user in a front panel as shown in Figure 4.1 on the following page.

The status can be seen by the color of the sub-detectors. There are more detailed views available for individual sub-detectors. The SCADA tool executes commands for controlling the detector. Automated actions are defined additionally, which are implemented with a finite state machine (FSM) as explained in section 4.1.2.

## 4.1.1 LHC operation

During operation, the LHC provides collisions to the experiment as often as possible. Because the beam cannot be provided indefinitely, the machine goes through cycles shown in Figure 4.2. The individual steps of the ATLAS run are described in Table 4.1 and for the LHC in Table 4.2 on page 39.

During stable beams, the experiment is in the physics data taking mode where collisions are happening. This mode should run for as long as possible for a maximum of physics data. During the warm-start at the beginning of the data taking mode, the HV power supplies are ramped up to deplete the sensors (see section 2.1). The actual duration of



Figure 4.1: Front panel of the ATLAS DCS [79].



Figure 4.2: Cycle of the LHC and ATLAS runs [80].

a run depends mainly on the LHC operation. Beams are dumped after a certain time when the luminosity is becoming low. However, some beams are lost unexpectedly in case of malfunctions, like a magnet tripping.

The machine and experiments require maintenance to operate reliably. Moreover, upgrades are planned to improve and enhance the detector for better results as described in section 1.2.2. Each winter is the year's-end-technical-stop for maintenance and small upgrades. Every few years a long shutdown is done to install larger upgrades. Depending on the task at hand, parts of the accelerator or experiments are warmed up. If the

| ATLAS status           | Description                                                                                                |
|------------------------|------------------------------------------------------------------------------------------------------------|
| Calibration period     | No data taking is done and the detector is calibrated or tests are performed.                              |
| Standby                | The detector is safe for beam but not yet taking data. The HV of the tracker is off. The ATLAS run starts. |
| Warm start             | The HV is switched on and data collection starts.                                                          |
| Physics data<br>taking | ATLAS is collecting and storing data for physics.                                                          |
| Warm stop              | The detector is brought back to standby after a beam dump. The ATLAS data taking run is finished.          |

Table 4.1: ATLAS run cycles.

Table 4.2: Operation cycles of the LHC. stable beams is the most important operation mode.

| LHC status                          | Description                                                                              |
|-------------------------------------|------------------------------------------------------------------------------------------|
| Setup                               | The machine is prepared for beam injection.                                              |
| Injection probe<br>and physics beam | A first bunch is injected to configure the beam before the proton<br>bunches are filled. |
| Ramp                                | The beam energy is increased.                                                            |
| Flat top and squeeze                | Max energy is reached and the beam parameters are configured for collisions.             |
| Adjust                              | The magnets are adjusted to collide the beams                                            |
| Stable beams                        | The collisions are stable and good for physics data.                                     |
| Beam dump and<br>ramp down          | The beam is dumped and the energy is then ramped down.                                   |

operation is done on the vacuum pipe, a bake-out is required afterward. During the bakeout process, the pipe goes through heating cycles to perform outgassing of the vacuum vessel [81]. Without this process, it would not be possible to achieve the required vacuum, which is thinner than in interstellar space [82].

Another important aspect of the operation, are unwanted power cuts which can shut down the system in an uncontrolled way. This happens a few times every year and requires a reboot of the experiments [83]. There are uninterruptible power supplies with back-up batteries for critical systems. However, not everything can be covered by them. They have a limited capacity and can only cover the time it takes to properly switch off the experiment. The detector has to be brought back into normal operation after such a power cut.

To guarantee the safety of operators and the experiment through all these kinds of situations is the task of the DCS.

## 4.1.2 DCS state machine

A finite state machine (FSM) is used to control and operate the detector. It was originally developed for the LHC experiments [84]. At the lowest levels are the devices, which access the hardware e.g. power supplies. The devices are grouped into a control unit (CU), based on the powering groups, like a serial power chain or mechanical structures e.g. a ring in the end-caps. An example tree structure is given in Figure 4.3.

Each control unit has a status (OK, warning, error or fatal), indicating its health. The status is propagated upwards to the root, i.e. the full ATLAS detector. Normally the worst sub-unit state defines the state of the higher level. However, a user can define other propagation rules.

The control unit is further in a state, which defines what operation is ongoing. This can be Shutdown, Standby, Ready, Unknown, Transition or Not Ready. Commands to change a state, e.g. going from standby to ready in case of a warm start, are issued from top to bottom.

Users can look at different control units in the branch. It is also possible to decouple a branch from the tree for debugging or testing purposes.

Automatic operations are defined in the state machine. During operation the state changes depending on what is happening. If a failure occurs, like a module voltage going out of range, the module status changes to Warning or Error. Automated actions are performed to recover the module or error messages are generated for the operators, if human intervention is required.



Figure 4.3: Tree structure for the DCS finite state machine [84].

## 4.2 DCS for the ITk Pixel Detector

The University of Wuppertal had already built the DCS for the current pixel detector and the IBL [85, 86]. The updated concept for the DCS at the ITk Pixel detector is based on previous experiences. The concept foresees three independent paths: safety, control & feedback and diagnostics. They differ in reliability, availability, precision and granularity [86, 87]. An overview of the three paths with the main elements is shown in Figure 4.4.

## 4.2.1 Safety path

The safety path has the task to protect humans and the experiment from fatal failures [86, 87]. It has the highest reliability and acts as the last line of defense. The safety path must take action and shut down the corresponding power supplies in case anything happens that causes danger to the experiment or operators. If for example, a module temperature is too large or laser diodes for the optical data transmission are not covered. The entire safety path has to be available all the time when there is power. No configuration should be required so that the system is active, as soon as power is present.

The safety path is implemented as a hardwired interlock system. There are interlock protected devices, which are units being monitored by the safety path like a pixel module. On the other side are interlock controlled devices, mainly power supplies which receive commands from the safety path. The granularity is in the order of powering units, which are the SP chain for the ITk Pixel detector. A modular system is foreseen that can be



Figure 4.4: Overview of the DCS for the ITk Pixel detector. Based on [86].



Figure 4.5: Schematic overview of the safety path [88].

used for the ITk Pixel and Strip detector in common. Figure 4.5 shows an overview of the elements in the safety path.

The main logic is the interlock matrix which is implemented in a field programmable gate array (FPGA). This allows the required flexibility to map the input to outputs together and can allow changes. It is still completely implemented in hardware and doesn't depend on any software.

Additionally, the interlock path includes monitoring of each sensor and output. This is used to debug the system and to investigate why an interlock was set. The monitoring is independent of the interlock function for a reliable operation of the safety path.

#### 4.2.2 Control & feedback path

A second path is used to operate the experiment. Status information from the detector is provided by this path as feedback to the operator. On the other side, this path provides control options to configure the experiments, like adjusting the setting on the power supplies or switching off individual modules. The power supplies for a serial power chain are also shown in Figure 4.6 on the facing page. The LV supply provides the current for the FE chips. The HV supply is used to bias the sensors. There might be more than one HV line per SP chain, to increase redundancy and lower the failure risk should a sensor create a short. Additional power supplies are used for the optical link and the DCS elements in the detector as described in section 1.2.2.

The control & feedback path of the DCS operates on chain and module level. The feedback part of this path includes monitoring the temperature and voltage of each module. This monitoring is independent of the FE operation to observe the detector status also during shutdowns or bake out. All the monitoring values are collected by the detector control station as indicated in Figure 4.4 on the previous page.

Dedicated ASICs are foreseen to implement the control & feedback path. This is to digitize the module voltage and temperature in situ and transmit it over a communication bus to the control station. These chips are operated independent of the sensor modules and can collect data all the time. Section 4.3 describes the usage of these ASICs in more detail. With the monitoring information, a software interlock can be implemented, which is normally set to lower thresholds than the hardwired interlock of the safety path. The software interlock has higher flexibility as it can be adjusted during operation. However, it is less reliable because the computer running the software can crash.

#### 4.2.3 Diagnostic path

To calibrate and adjust the experiment a third path is used. The front-end (FE) chips are configured with this data to efficiently collect data.

The diagnostic path has the highest granularity on the FE chip level. This path collects information directly from the FE chip through the optical readout path. The FE includes radiation and temperature sensors, together with an ADC to monitor the supply and internal voltages of the chip. These values can be transmitted together with the normal physics data. A fixed fraction of the output frames is reserved for the status information, which makes about 2% of the data [5].

The data is transmitted optically to the off-detector data acquisition system, as depicted in Figure 4.4 on page 41. The optical interface of the data acquisition system has to separate the diagnostic information from the physics data and send it to the detector control station. Scans are performed additionally during calibration periods to re-tune the FE for optimal data taking [5].

## 4.3 Control of a serial power chain

New approaches are required to control and monitor a serial power (SP) chain. See section 2.3.3 for a description of the serial power concept. For full control, it is required to power on/off individual modules in the SP chain, without disturbing the other modules of the same chain.

The LV power supply can only act on the entire chain. To deactivate individual modules, an alternative current path is needed so that the remaining modules can continue to operate. Such a bypass adds the flexibility to remove single modules from the chain although adds complexity and risks. See Chapter 7 for more on the risk analysis.



Figure 4.6: Schematic representation of a serial power chain with the DCS chips.

There are two chips planned for implementing the monitoring and control of an SP chain [5]. The pixel serial powering & protection (PSPP) chip which is the front-end element for the control & feedback path. The second chip, which acts as a bridge between the off-detector electronics and the PSPP, is called DCS controller [87]. Figure 4.6 on the previous page shows the chain with the DCS ASICs.

## 4.3.1 DCS controller

The DCS controller is placed on patch panel 0 (PP0) located at the end of a mechanical structure for the modules of one or multiple SP chains. The electronics cavern houses the power supplies and off-detector electronics for the detector control station. Between the electronics cavern and PP0 are about 100 m of cables. This requires a driver strong enough to transmit the data and a protocol for reliable operation. The controller area network (CAN) was chosen for the high reliability and low line count. The CAN standard [89] implements a cyclic redundancy check in the messages and resolves conflicts. To transmit data a bidirectional differential pair is used and multiple nodes can be connected in a bus to further reduce the lines.

A prototype with a CAN node was developed in Wuppertal [90]. Since then the requirements for the DCS controller were updated. The application layer CANopen will be used to simplify the integration of the DCS controller in WinCC. CANopen is already used in ATLAS DCS and other LHC experiments to implement monitoring [91]. Per SP chain will be one serial control bus (SCB) connecting up to 16 PSPPs together. The SCB is a bus developed at Wuppertal for AC coupled nodes (see section 5.4). The DCS controller is the master of the SCB. A block diagram of the DCS controller is given in Figure 4.7.

The DCS controller includes also an ADC for monitoring the temperature on PP0. This ADC would be also used to monitor the status of the DCS controller itself. The oscillator is used to create the clock required for CAN and SCB.

## 4.3.2 PSPP chip

The PSPP chip is the front-end element of the control & feedback path. This chip includes the bypass and monitoring for individual modules. It is explained in detail in the Chapter 5.



Figure 4.7: Block diagram of the DCS controller.

# Chapter 5

# Pixel Serial Power & Protection chip

The pixel serial powering & protection (PSPP) chip is an ASIC designed for the ATLAS ITk Pixel detector control system (DCS). It is the front-end element of the control & feedback path described in section 4.2.2.

Its main purpose is to control and monitor a single module in a serial power (SP) chain. A bypass transistor is integrated, which allows deactivating a module. This bypass provides a low-resistive alternative path for the LV supply current.

The PSPP chip operates in parallel with the pixel module and therefore shares the same ground potential. On the other hand, it must be powered independently of the pixel module to operate in times when the front-end (FE) chips are switched off. This requires a power scheme that allows supplying all PSPPs of one SP chain together. Further, flexibility is required to operate with different configurations of the chain. The reference potentials of the modules and thus PSPPs changes with the LV supply current and the number of bypasses active.

The development of the PSPP chip towards a possible production is the main goal of this thesis. First, the requirements are presented, based on the ATLAS Pixel technical design report [5]. Afterward, the development of the PSPP chip and its function are described.

## 5.1 Requirements

The main tasks of the PSPP are the following:

- Operation in a serial power chain
- Independent communication and power lines
- Monitor the operating voltage of the pixel sensor module with a precision of 10 mV
- Monitor the module temperature with a precision of 0.5 K
- Switching individual modules in the serial chain
- Operation in a highly radiated environment

The independent service lines are required to operate the PSPP even when the FE chips are switched off. The temperature monitoring provided by the PSPP is also required during shutdowns for information about annealing of the sensors. To keep the number of services low, a common supply together with a communication bus for all PSPPs in a serial power chain is used.

Based on this, the following requirements are set for the PSPP:

- The PSPP must be able to bypass the maximal supply current of 8 A.
- The switching of the bypass should not create current transients of more than 5%.
- After power-up or in case of power loss the bypass should remain open.
- The bypass can be controlled remotely by command.
- Automatic activation of the bypass in case of over-voltage (OV) or over-temperature (OT). This feature can be disabled.
- The module voltage, temperature and internal values are monitored and digitized by the PSPP.
- Negative temperature coefficient (NTC) resistors are used as temperature sensors. The power for them is provided by the PSPP.
- The SP has to be qualified for 16 modules. Therefore up to 16 PSPP chips have to operate in one chain.
- The PSPP will be located on the type 0 services (see Figure 4.6 on page 43) to include the possibility of bypassing an open module connector.
- The communication bus should be able to operate with lines of 2.5 m length.
- The communication lines have to be AC coupled for operation with independent ground potential of each chip in a chain.
- The total power of the PSPP should be as low as possible to work without active cooling in all operation modes.
- As the chip is located close to the pixel modules, the PSPP must have the same radiation hardness as the FE chips.
- Operation in a high magnetic field of 2 T.
- Operating temperature range between (-40 to 40) °C.

The radiation levels listed in Table 5.1 were defined for the FE chip [92] and are therefore also applied to the PSPP. The single event upset (SEU) rate is given for all PSPP chips in the detector.

The number of bypass activities due to SEU should be kept as low as possible. One per month in the entire detector is acceptable for operation. Such an event would require

| Parameter                           | Value                                                         |
|-------------------------------------|---------------------------------------------------------------|
| Total ionizing dose                 | $500\mathrm{Mrad}$                                            |
| Non-ionizing fluence                | $1 \times 10^{16}  1 \ {\rm MeV} \ n_{\rm eq}  {\rm cm}^{-2}$ |
| Flux of $>1 \mathrm{GeV}$ particles | ${<}2\times10^8\mathrm{cm}^{-2}$                              |
| Charged particle fluence            | $150 \times 10^{-3} \mathrm{cm}^{-2} \mathrm{pp}^{-1}$        |
| Hadrons $>20 \mathrm{MeV}$ fluence  | $85 \times 10^{-3} \mathrm{cm}^{-2} \mathrm{pp}^{-1}$         |
| SEU rate for bypass                 | <1/month                                                      |

Table 5.1: Radiation tolerance requirements for the PSPP.

intervention by an operator to reset the bypass. The exact procedure on how to handle this has yet to be defined.

According to Equation 3.1 and using the charged particle fluence from Table 5.1, this would require a cross-section smaller than:

$$\sigma = \frac{1 \,\text{SEU}}{150 \times 10^{-3} \,/\text{cm}^2 / 25 \text{ns} \cdot 1 \,\text{month} \cdot 30\,000 \,\text{bits}} = 2.14 \times 10^{-18} \,\text{cm}^2 \tag{5.1}$$

The calculation assumes that there are three bits per PSPP which can directly activate the bypass, one bit controlled by command and two bits for the over-voltage and overtemperature activation (see section 5.5.2). There is one PSPP per module in the detector which is  $\sim 10\,000$  chips in the entire detector [5].

## 5.2 Previous prototypes

This work is based on previous prototypes developed by Kathrin Becker [93], Jennifer Boeck [90] and Lukas Püllen [21]. A list of all the chips designed by the University of Wuppertal is given in Appendix B.

Another important chip to mention is the serial powering & protection (SPP) chip developed at the University of Pennsylvania [94]. It was developed for the ITk Strip detector during a serial power design study where it would provide bypassing capability. It includes a comparator to automatically activate the bypass in case of an over-voltage. Some concepts, like powering of the SPP, are also used by the PSPP.

#### PSPPv1 and PSPPv2

The first PSPP prototypes provided a proof of concept for the ASIC. They implemented all required elements but were not yet radiation hard to the needed level. Further, the requirements for the PSPP were modified, e.g. an increased serial power supply current.

The first two versions of the PSPP chip include all required components, except for the temperature monitoring. These chips are not yet radiation hard, as several components are designed with thick-gate-oxide transistors. The logic block implements control of the ADC and the bypass. A dedicated physical layer was developed to support singled ended communication over AC coupling. PSPPv1 was designed and tested to bypass a current of 2.4 A [21]. The PSPPv2 includes the same elements as the PSPPv1 with some minor bug fixes.

### Required improvements of the first PSPP versions

The first versions of the PSPP showed the feasibility of the concept. Tests in a serial power chain prototype were performed with them [95]. However, some problems were observed and further improvements were required.

• Several blocks use thick-gate-oxide transistors, where it is known that they are not as radiation hard as the core transistors [42] and have to be replaced.

- The logic was not protected against SEUs.
- The ADC on both chips was not working properly. A layout error led to problems in the PSPPv1, which was corrected for the PSPPv2. Unfortunately, a production error made the ADC of the PSPPv2 unusable.
- The communication was not possible with more than five chips connected in a chain. Besides, the communication could become blocked, when a bypass was activated. This was only resolved by a power cycle of the entire chain or a reset of the chips, which also resulted in reopening the bypass.
- The power requirements of the new front-end ASIC increased by a factor four. A single chip is designed to support up to 2 A resulting in a chain current of up to 8 A [92]. The bypass of the first PSPP versions is not suited for this current.

Based on the concepts from the PSPPv1 and v2 a new prototype chip was developed and is described in the following sections.

## 5.3 Next generation Pixel Serial Power & Protection chips

To implement the required changes and updates new chips were designed. Because the predecessors were already developed in the GlobalFoundries 130nm process, formerly known as IBM 130nm process, it was decided to continue the development in this process. The FE-I4 chip was also designed in this technology [13]. It was possible, to reuse some elements in the PSPP.

During this thesis, four chips were designed and fabricated. Two chips implement the full functionality required for the PSPP while two more chips were submitted to test different elements. Here an overview of the different chips is given. A detailed description of the development and test of the different elements in the ASICs follows in sections 5.4 to 5.12.

## 5.3.1 Pixel Serial Power & Protection chip version 3 (PSPPv3)

The PSPPv3 chip includes all required elements and was designed with the updated requirements. However, it still has non-radiation hard elements, being the regulators and comparator. This chip was submitted in November 2016. Results made with this chip were included in the ITk Pixel technical design report [5].

An overview of the different blocks inside the PSPPv3 is given in Figure 5.1 on the next page. The diagram shows also the required external passive elements.

## PSPPv3 layout

A picture of the final chip is shown in Figure 5.2 on the facing page. The fabricated chip size is 3.6 mm by 3.3 mm. The size was mainly defined by the number of pads required. The bypass pads were the largest and were placed on the edge to have short wire bonds. As the bypass itself was not filling up the entire space, it was decided to place the other pads on an inner ring. This allowed to reduce the total chip size, yet requires two layers of wire bonds where the inner go above the outer pads.



Figure 5.1: Block diagram of the PSPPv3 Chip.





Figure 5.3: Fabricated PARC chip.

Figure 5.2: Fabricated PSPPv3 chip.

# 5.3.2 PSPP Add-on Regulator & Comparator chip (PARC)

To verify the remaining radiation hard elements, a test chip called PSPP add-on regulator and comparator (PARC) was submitted in February 2017. This chip includes the remaining radiation hard circuits and was intended as an add-on to the PSPPv3. The fabricated chip is shown in Figure 5.3 and has a size of 1.4 mm by 3 mm.

Additionally to the regulator and comparator, the PARC chip also includes a test structure to measure the SEU cross-section. This test logic is described in section 5.12. Further, dedicated pads are integrated to be used as a physical layer for the SCB master. See section 5.4.2 for details.

## 5.3.3 Pixel Serial Power & Protection chip version 4 (PSPPv4)

Based on the results from PSPPv3 and PARC, a further prototype was developed. This is the 4th generation, the PSPPv4 which was submitted in November 2017 during the last multi-project run available for the technology. The PSPPv4 includes all required functionalities and was designed to be fully radiation hard. Based on this design, a preproduction design should be made. Compared to PSPPv3 following major changes were made in the PSPPv4:

- Updated radiation hard voltage regulator and comparator
- Power-on reset included
- Two temperature sensor inputs, because of updated requirements
- Module voltage sensed directly at bypass
- Bump bond pads instead of wire bonds

Additional smaller changes were made to fix bugs observed in the previous generation of chips. The development and improvements are described for each component in the following sections. The block diagram for the PSPPv4 is shown in Figure 5.4. It includes the additional temperature sensor input and new power-on reset.



Figure 5.4: Block diagram of the PSPPv4 chip with intended external circuits.

## Bump bonds

There are two techniques used to connect an ASIC with other elements. The principle is the same if the chip is integrated in a package or directly placed on a PCB. The first and more common is wire bonding, where the connections are made with thin aluminum or gold wires. Except for the PSPPv4, the chips developed during this work use wire bonds, due to the easier assembly.

The other technique is to use bump bonds. Here the connection is made by small solder balls or copper pillars with a solder top. The chip is soldered to the PCB with a so-called flip-chip procedure. The bump bonding process is more complicated as it requires more steps to prepare the die with the bumps. Also, the soldering process is more difficult as the contacts can't be seen under the chip. An X-Ray image as shown in Figure 5.5 allows examining if the contacts are made.

The benefits of bump bonds are a better use of the chip area, as they can be placed over the entire design. This gives a smaller total footprint and requires less area on the support. Further, the contact resistance and inductance are smaller, giving better performance for the bypass. A better heat transfer to the support can be achieved, which reduces the cooling requirements. As the PSPP is located on the services, a direct cooling connection would require additional mechanical engineering. Cooling through convection and by using the electrical connection as heat sink is preferred. Furthermore, bump bonds are more robust than wire bonds if properly soldered. For all these reasons, bump bonds are selected for the PSPPv4.

## **PSPPv4** layout

The layout and padframe of the PSPPv4 were defined early in the design process. This was done to include opinions from the designers of the flexible PCB, where the chip will be mounted. A layout was chosen where the bypass can be accessed from both sides. This gives more freedom in the design of the PCB.



Figure 5.5: X-Ray image of a soldered PSPPv4. Picture courtesy of Bettina Otto [96]



Figure 5.6: Fabricated PATT chip. Picture courtesy of Peter Phillips [97].

The size was defined by the number of signals required for operation. Each signal has at least two bump bonds for redundancy. To allow easier routing on the PCB, there is always at least  $300 \,\mu\text{m}$  between two signals. The bumps of the same signal are placed in a grid with a pitch of  $200 \,\mu\text{m}$ .

A single bump bond can support a current of  $\sim 100 \text{ mA}$ , therefore at least 80 bump bonds are required per bypass contact. It was chosen to place an array of 105 bumps per bypass contact, arranged in 7 columns and 15 rows. The two arrays for the bypass can be seen on the right side of Figure 5.5 on the preceding page while the signal connections are on the left. The final PSPPv4 has a size of 4.6 mm by 3.6 mm.

## 5.3.4 PSPP Asynchronous TMR Test chip (PATT)

Another test chip was submitted in May 2018 on a multi-project run together with ASICs for the ATLAS ITk Strips detector. This chip, called PSPP asynchronous TMR test (PATT), implements additional protection methods in the logic and some further features under consideration.

This chip implements the full PSPP logic with an additional SEU protection feature for testing. This is the asynchronous TMR, described in section 5.5.7. There was not sufficient space available for the bypass from the PSPPv4 and the submission was with wire bonds only. Therefore instead of a bypass, a larger version of the SEU test logic was implemented. Figure 5.6 on the previous page shows the fabricated chip. The PATT chip was fabricated with a size of 2.6 mm by 3.7 mm

## 5.4 Serial control bus

All PSPPs in one SP chain have to communicate over a bus with the DCS controller. As each chip is on a different potential, all the signal lines have to be AC coupled. To reduce the number of lines, it is foreseen to have single-ended lines. This requires some dedicated circuits and protocol implementation to realize reliable communication.

## 5.4.1 From I2C-HC to SCB

A communication protocol was selected in previous work for exchanging data between the PSPP and DCS controller chips. This protocol was based on the inter-integrated circuit  $(I^2C)$  bus, but extended with a Hamming code (HC) for error detection for operation in the radiation environment [93].

The PSPPv1 and PSPPv2 implemented an AC coupled version of this bus [21]. Tests with the PSPPv2 have shown that the communication was not stable with many chips connected on the bus and after bypass activities.

It was decided to implement Manchester encoding in the protocol, to have a DC balanced signal. This helps to recover from bypass activities as shown with tests and simulations performed in this thesis. To improve the reliability of the protocol, it was decided to use two uni-directional data lines, instead of one bi-directional. This requires one additional line as a downside though.

Further, the clock will be always present, instead of only during the communication as in normal  $I^2C$ . This helps to recover voltage changes due to switching a bypass. A constant clock benefits the TMR protection because the internal registers are updated with every clock cycle. Two SEUs have to occur within one clock cycle to flip a triplicated bit (see section 3.2.2).

To distinguish the modified bus from the original  $I^2C$ , it was renamed to serial control bus (SCB).

## 5.4.2 Physical layer

## Signal lines

There are three single-ended unidirectional lines:

- SCLx2: continuous clock at 200 kHz
- SDAm: data transmitted from the master to the slaves
- SDAs: data transmitted from the slaves to the master.

The data are transmitted with Manchester encoding [98]. In this line encoding, one bit is encoded as a transition from high-to-low ('0') or low-to-high ('1'). The low and high part are of equal length. For the SCB a transmission rate of 100 kHz is used. The Manchester encoding is done with a logic XNOR between the 100 kHz clock and the data.

A clock twice the transmission rate is transmitted on the SCLx2 line. The doubled clock speed is used to prevent glitches in the decoding process. The original 100 kHz clock is reconstructed in the slaves for decoding the Manchester signal. A dedicated logic synchronizes on the correct clock edge when no communication is in progress. This logic is explained in section 5.5.1. This approach was used to operate without a clock recovery circuit in the PSPP.

The PSPP chips are the slaves on the bus. The master sends commands to all slaves on the SDAm line. All chips on the bus answer on the common SDAs line. To prevent conflicts, each slave sends only when it is addressed. If a slave is not addressed or an error occurs in the communication, the slave deactivates the output driver. The slave state machine is reset when a start or stop condition is observed to go back into a defined state as described in section 5.5.1.

#### Keeper pads

To operate with single-ended and AC coupled lines, a dedicated physical layer is required. The signal of an AC coupled input drifts away and could go out of the input region. A keeper pad asserts that the signal keeps the voltage levels after the capacitance.

Figure 5.7 on the following page shows the schematic for the keeper input and output pad. The keeper pad is based on a memory cell formed by two inverters in a loop. The inverter with the output connected to the pad is rather weak and a pulse coming from the pad can overwrite the current state and thus flip the cell to the new value. The keeper circuit was developed and tested in the PSPPv1 [21].



Figure 5.7: Simplified schematic for the keeper input (a) and output (b) pads

An improved version of the keeper input and output stage was developed in this work, using the same principle. The keeper input and output pad were designed, based on existing digital pads<sup>1</sup>. The strength of the keeper circuit allows the operation of 16 chips on one bus.

Hysteresis is used in the second inverter of the input pad to improve stability. The input pad has then an inverter buffer to drive the internal logic.

The output pad has a tri-state output driver. It further has a keeper stage, that was designed to operate with multiple pads driving the same line. When an output pad is in the high-impedance state, the keeper logic follows the level of the line. A single output pad could not properly drive the SDAs line without the keeper stage.

#### SCB master pads

Driver pads for an SCB master are included in the PARC chip. This allows testing the logic of the SCB master in an FPGA. There are two digital output pads for driving the SCLx2 and SDAm lines. A keeper input pad (see Figure 5.7) is integrated and intended as physical layer for the SDAs line. There are digital pads for each of the three lines, to be connected to an FPGA.

## 5.4.3 Protocol

As for the  $I^2C$  bus, the SCB uses a start and stop condition, 8 bit data packages and an acknowledge bit. Additionally, a 4 bit Hamming code is added to the 8 data bit for error detection. This makes each data package transmitted 13 bits long, not counting the start and stop condition. The protocol was defined by [93].

#### Hamming code

A Hamming code is a cyclic code to detect bit flips. It could also be used to correct false bits, but this is not used in SCB. In case of a failure, the communication has to be repeated.

<sup>&</sup>lt;sup>1</sup>These pads were originally developed for the FE-I4 chip used in IBL [13]

The used Hamming code was chosen by K. Becker [93] and is a (12,8) code<sup>2</sup>. This code was generated by shortening a (15,11) cyclic Hamming code. The generator polynomial is  $g(x) = 1 + x + x^4$ . The systematic generator matrix can be created and is shown in equation 5.2. The sent codeword can be calculated with  $v(x) = u(x) \odot \mathbf{G}_{8x12}$ .

To properly decode the shortened Hamming code, a correction polynomial is multiplied to the received data [99]. For the used (12,8) cyclic Hamming code the polynomial  $d(x) = 1 + x + x^3$  is used. To check if the received Hamming code is correct, a Meggitt decoder is used [99]. The generated syndrome s(x) is 0 if no error was observed. A non-zero syndrome indicates that a bit flip occurred in the transmission. The serial data input d is shifted into the data register and the syndrome is calculated in parallel. The syndrome has to be cleared before a new code word is received. After shifting 12 times the calculated syndrome can be checked and all the data is available at r(x).

#### Read and write access

Table 5.2 illustrates the required bits for a write access. The first byte sent after the slave address is the register number, where the data should be written. The slave increments the register number after each transmitted data byte. This enables multiple registers to be written in one communication.

|       | Sla     | ave a | ccess |     | Regist   |    | Data |      |    |      |  |  |
|-------|---------|-------|-------|-----|----------|----|------|------|----|------|--|--|
| Start | 7 1 4 1 |       | 8     | 4   | 1        | 8  | 4    | 1    |    | Stop |  |  |
|       | Address | W     | HC    | Ack | Register | HC | Ack  | Data | HC | Ack  |  |  |

Table 5.2: Bit pattern for SCB write command. The W bit is '0'.

A read command normally starts the same as a write command, by writing the register number. It is possible to write registers and to continue to read. If no new register number is written at the begin of the read process, the next register following the last access is read. The pure read command is illustrated in Table 5.3.

<sup>2</sup>(n,k) stands for n = 12 code word length and k = 8 data length.

| Write Reg | Start | Sla     | ccess |    | Data |      |    |     |  |      |
|-----------|-------|---------|-------|----|------|------|----|-----|--|------|
|           |       | 7       | 1     | 4  | 1    | 8    | 4  | 1   |  | Stop |
|           |       | Address | R     | HC | Ack  | Data | HC | Ack |  |      |

Table 5.3: Bit pattern for SCB read command. The R bit is '1'.

The master sends the start and stop signal, as well as the slave address. The start and stop signal are violations of the Manchester encoding. For the start signal, the SDAm line is at '0' for the entire bit, while it is '1' for one bit in case of the stop signal. The master sends all the data of a write access and the slave the ACK bit. During a read access, the slave is sending the data and the master gives the ACK bit.

If the master is not sending an ACK bit, the slave stops sending data. The slave doesn't send an acknowledge signal, if the address is not his own, or if an error in the Hamming code was detected. The master must recognize a missing acknowledge. If the transmission should be repeated is left to the operator of the master.

In the tables above bits are written in normal font if sent by the master and in *italic* font if the slave is sending.

#### Timing constraints

The timing for the SCLx2 and SDAm lines is shown in Figure 5.8. The protocol is designed to operate at a clock of 200 kHz for SCLx2, therefor  $2T = 10 \,\mu s$ . The phase shift  $t_D$ between SCLx2 and SDAm could become as large as half a period. This is because the decoding logic detects automatically which edge should be used (see section 5.5.1). For the keeper logic to operate properly a fast slope is required. The rise and fall time  $(t_{r/f})$ is required to be 0.2 µs or smaller.

The constraints on SDAs are analog to SDAm. Figure 5.9 on the next page shows a simulation of an SCB communication including internal signals. The simulation shows a read access with SCB, which includes also writing the register number.

Test of the communication are described in section 5.5.4.



Figure 5.8: Timing information for SCB. Only a fractional communication is shown. In gray are the Manchester decoded SCL and SDAm signals for reference.





# 5.5 Logic core

The previous prototypes included a working logic implementing the I<sup>2</sup>C-HC protocol. However, it was not yet triplicated and concerns about timing problems existed. The code structure also made changes difficult. Therefore it was decided to base the logic of the PSPPv3 on an alternative I<sup>2</sup>C slave design [100]. The new logic is designed in a more structured way and is also silicon proven. The design was modified to realize the PSPP functionality.

Part of the development in the user logic and triplication was done by M. Errenst. The top-level block diagram is shown in Figure 5.10. There are two main blocks: the protocol unit and the user unit.

Further, there is an input multiplexer to select from two sets of input signals. This was used to test the keeper stage or a normal digital input pad. The entire logic was developed and tested in simulation, but also on FPGA.

## 5.5.1 Protocol Unit

The protocol unit controls the serial communication. The  $I^2C$  slave was modified to implement the SCB protocol by adding the Hamming code and Manchester encoding. Its main element is a state machine shown in 5.11 on the next page controlling the different counters and access to the user unit. After a reset, the finite state machine (FSM) does some initialization and goes then to the IDLE state. For timing reasons, some output signals of the state machine depend on the inputs. Therefore, the state machine is implemented as a Mealy machine [101].

### Manchester decoding and IDLE state

A decoder block is implemented to restore the SDAm signal from the Manchester encoded signal transmitted over the SDAm line. See section 5.4 for more details on the protocol. This decoder also recovers the SCL signal, which is a 100 kHz clock. This is used as a clock enable signal in the logic for proper timing. The decoding logic is represented in Figure 5.12 on the facing page. The SCLx2 clock is divided by 2 to generate the SCL



Figure 5.10: Simplified block diagram of the SCB slave top-level.



Figure 5.11: FSM of the protocol unit.



Figure 5.12: Schematic representation of the logic to decode the Manchester data input (SDAm).

signal. SDAm is combined with this clock by a logical exclusive-or to decode it. Because the XNOR could generate glitches, it is sampled on the negative SCLx2 edge. This also helps if some phase shift between the clock and data is present.

The PSPP has no knowledge which rising edge of the SCLx2 corresponds to the rising edge of SCL. Therefore the signal is decoded twice where one is inverted i.e. phase shift by 180°. As long as the chip is in the idle state and no communication is ongoing, it will sample the decoded SDAm signal and select the line which is logic '1'. The master is sending the pattern for a '1' between two communications. As soon as the start signal is detected, the chip will save the selection until a stop signal is received.

#### Start and Stop detection

The start and stop signals are generated by a dedicated block looking for the corresponding pattern on the SDAm and SCL signal. This block uses the output signal from the Manchester decoding block.

If a start signal is captured, the FSM jumps to the START\_STATE, which begins the communication, independent of the current operation. The FSM sets the user logic to receive a new command, clears the bit counter and enables the input shift register. When a stop signal is seen, the FSM goes into the STOP\_STATE from where the IDLE state is loaded.

#### Receive data and input shift register

Once the FSM was started, it resets the user logic to receive a new command and activates the input shift register. The input shift register samples the data from the decoded SDAm signal and calculates the Hamming code syndrome. It directly checks if the syndrome is valid and no bit error occurred. Should the syndrome indicate an error will the state machine abort the communication and go into the NOT\_ADDRESSED state.

A bit counter is used to count how many data bits are already received. This counter indicates the state machine when the data part is finished, when the Hamming code was received and when the ACK bit has to be sent.

#### Slave address

The first byte received is passed to the address decoder block. The data is compared to the address set on the digital inputs. There are three fixed bits of the value "101" and four external bits. With the four external address bits, up to 16 different slave addresses can be selected.

The slave activates it's output as soon as the correct address is received, i.e. before the Hamming code is checked This happens during the RX\_ADDR\_HC state. This was done to apply a signal on SDAs and reset the line, which is not actively driven when no slave is active. Should the Hamming code be wrong, the output is deactivated again. Each slave is sending the same pattern during this time.

#### Receive or transmit

If the correct slave address was received, the FSM goes either into the RECEIVE or TRANSMIT mode. The state machines for these two modes are shown in Figure 5.13 on the next page. After sending the ACK bit, the receive or transmit process starts.

When receiving i.e. a write access, the FSM waits for the last bit and the Hamming code valid signal. If both are received, the data is passed to the user unit and an ACK bit is sent. This loop is repeated until the master sends a stop or start signal. It is the user unit that checks whether the register number or a new value is received.

During a read access, the slave has to transmit data. The FSM loads the data from the user unit into the transmit shift register. When the bit counter reached indicates



Figure 5.13: FSMs for the receive (a) and transmit (b) mode of the SCB slave.



Figure 5.14: Schematic of the data output shift register.

that all bits were sent, the FSM checks if the master has sent an acknowledge. If the ACK bit was received, the next register is loaded and transmitted. If no acknowledge was received, the FSM goes into the NOT\_ADDRESSED state.

#### Transmit shift register and Hamming code generation

The transmit shift register is in principle always active. It is the FSM that sets the SDAs\_en signal, which defines if an output signal is sent or not.

To implement the Manchester encoding (see section 5.4.2), the output shift register has 24 bits. It is loaded by the FSM with three different patterns: 1. an idle pattern, 2. the ACK bit or 3. data to send.

The idle and ACK bit pattern are a fixed value. The idle pattern is a Manchester encoded logic '1', while the ACK bit has '0' as the first bit, followed by ones. The eight data and four Hamming code bits are doubled and each odd bit is inverted to create the Manchester encoding. The patterns are loaded into the shift register and shifted through with the SCLx2 clock. The last bit is always loaded with '1' and is inverted each shift cycle. This fills the shift register with the idle pattern, to ensure that the correct signal is present after transmitting the bits. The shift register is represented in Figure 5.14

The shift register with doubled data bits for the Manchester encoding was selected to prevent glitches from the XNOR of the clock and data. It requires though that all the

bits are known when loading the data.

Normally the Hamming code is calculated during the shift process. This is not possible with the shift register implementing Manchester encoding as described above. Therefore the Hamming code was calculated for all 256 possible values of the 8 data bits and implemented as a look-up table.

#### General call

There is an additional mode besides receiving or transmitting data. This so-called general call (GEN\_CALL state in Figure 5.11 on page 59) is entered if the address "000 0000" is received. All slaves on the bus activate their SDAs output driver in this mode and send the same pattern, which is a Manchester encoded '1'. Other than that, the slaves do not act and do not even send an ACK bit.

This mode was added to reset the SDAs line. Simulations have shown, that after switching a bypass, the line could go into a blocking state where some slaves have a '1' and some a '0' on their input. In such a case, a single slave is not strong enough to reset the line to a correct state. After sending a general call, the slaves can again properly communicate.

### NOT ADDRESSED state

This state (see Figure 5.11 on page 59) is entered if either a wrong slave address was received or an error in the Hamming code detected. The slave becomes inactive and deactivates the output. It stays in this state until a new start or stop signal is received.

### 5.5.2 User Unit

The user unit implements the internal registers and PSPP specific functions. A state machine is the interface to the protocol unit and controls the register number counter.

The state machine also enables the read or write access to the DCS chip logic block, which implements the internal registers described below. The last block is the logic to control the bypass transistor.

#### Internal registers

The PSPPv3 has several registers listed in Table 5.4. The different functions of the PSPP are used by writing or reading these registers.

The chip ID is a fixed number that is hard coded. This ID is intended to be replaced by a unique ID per chip. Because there were more urgent tasks before the submission deadline, the implementation of the fuses was not done. Therefore the ID is a constant value common for all fabricated PSPPv3 chips. The ID and the SCB address are two independent values.

Several registers are available to access the ADC. The ADCmux register defines which channel is used. See section 5.6 for the available channels. The ADC starts a new conversion when the ADCmux register is written. Therefore, one has to write the ADCmux

Table 5.4: Internal registers of the PSPPv3 chip. Unused bits are always read '0' and can't be written. A register with direction "R" is read only, while "RW" stands for read and write. The exception is the bypass register which has a few read only bits, marked with "RO".

| Nama        | NIL | D:  |                 |                    |                    | Descr      | ription   |        |                |                |  |
|-------------|-----|-----|-----------------|--------------------|--------------------|------------|-----------|--------|----------------|----------------|--|
| Iname       | IND | Dir | 7               | 6                  | 5                  | 4          | 3         | 2      | 1              | 0              |  |
| Chip ID 1   | 0   | R   |                 |                    | \$21.              |            |           |        |                |                |  |
| Chip ID 2   | 1   | R   |                 |                    | Bits 7             | 7-0 of ID. | Fix at \$ | \$F5.  |                |                |  |
| ADCR1       | 2   | R   |                 |                    | unu                | used       |           |        | ADC data [9:8] |                |  |
| ADCR2       | 3   | R   |                 |                    |                    | ADC dat    | ta [7:0]  |        |                |                |  |
| Digital in  | 4   | R   |                 | unconn             | lected D           | IN (alwa   | ys '0')   |        | DIN [1:0]      |                |  |
| Digital out | 5   | RW  |                 | ur                 | nconnect           | ed DOU     | Г         |        | DOUT           | [1:0]          |  |
| Bypass      | 6   | RW  | (RO)            | OT<br>flag<br>(RO) | OV<br>flag<br>(RO) | OT en      | OV en     | OT rst | OV rst         | cmd            |  |
| ADCmux      | 7   | RW  |                 |                    | unused             |            |           | AD     | C mux [2       | 2:0]           |  |
| ADCL1       | 8   | R   |                 |                    |                    | ADC dat    | ta [9:2]  |        |                |                |  |
| ADCL2       | 9   | R   |                 | unused             |                    |            |           |        | ADC data [1:0] |                |  |
| Control     | 10  | RW  |                 | unused             |                    |            |           |        |                | trim<br>enable |  |
| BGHI        | 11  | RW  |                 | tuning bits 15-8   |                    |            |           |        |                |                |  |
| BGLO        | 12  | RW  | tuning bits 7-0 |                    |                    |            |           |        |                |                |  |

register before reading a new value. The value can be read back immediately after writing the multiplexer registers by continuing reading from the ADCL registers within the same SCB access. Alternatively, the value can also be read right adjusted from ADCR registers.

Three registers control the trimming of the additional bandgap (BG), used for testing purposes in the PSPPv3. This BG can be adjusted by writing register 11 and 12, while the *trim enable* bit activates the trimming. For more details on the BG, see section 5.9.

## **Bypass logic**

The bypass logic controls the activation of the bypass transistor. There are three different ways how the bypass can be activated:

- With a command, by setting the corresponding bit to '1'.
- When the temperature rises above a threshold and automatic bypass enabled for over-temperature.

• When the module voltage rises above a threshold and automatic bypass enabled for over-voltage.

The logic function can also be expressed with the following boolean equation:

$$Bypass\_State = cmd + (OV flag \cdot OV en) + (OT flag \cdot OT en)$$
(5.3)

where cmd, OV en and OT en are the control bits from register 6 as defined in Table 5.4 on the previous page.

The two flags for OV and OT indicate if the corresponding comparator saw a voltage above threshold. The flags stay active even if the controlled values goes back below the threshold. They have to be cleared by an operator, to ensure that the original failure is resolved. This guarantees that the bypass stays active until a command is sent. The flags are cleared by setting either the OV rst or OT rst bit. These two bits are automatically reset after an access and are always read '0'. The automated bypass control can be deactivated by clearing the corresponding bits (OV en) and OT en) as indicated in equation 5.3. The bypass is deactivated and the automatic activation is enabled by default.

## 5.5.3 Protection against single event upsets

The triple modular redundancy (TMR) method was applied to protect against SEUs. The inputs were directly split in three ways and the entire logic was triplicated. Also, the clock was triplicated for full protection. A final voter was added before the output signals. This represents the full TMR as described in section 3.2.2. The triplication was performed using the TMRG tool [71].

### 5.5.4 Communication and logic test with the PSPPv3

A serial power chain was built to verify the SCB communication. Instead of actual modules, power resistors were used. The goal was to have up to 16 chips in one chain which was realized as shown in Figure 5.15. It was possible to address all chips with a cable of 3 m length.



Figure 5.15: Test serial power chain with 16 PSPPv3 chips. The ribbon cable holds the SCB signal lines and goes to the FPGA (not in the picture).


Figure 5.16: Measured SCB communication. SDAm\_dec and SDAs\_dec are the decoded signal available on a debug pin of the PSPPv3 and FPGA respectively.

#### Communication test

The access to read all registers of a PSPPv3 is shown in Figure 5.16. The signals were measured on the master side of the bus. The chip only answers after being addressed.

The PSPPv3 has an output pad with the decoded SDAm signal (SDAM\_DEC in the figure) for testing purposes. Also the SCB master implemented in an FPGA has such a debug output for the SDAs signal. These two signals are shown in addition to the three SCB lines in Figure 5.16.

#### Logic test

While the registers in the PSPPv3 could all be read correctly, the default state was not well defined. The PSPPv3 does not include a dedicated power-on reset circuit. Therefore all registers have to be written first to set the chip in the correct state.

Some channels of the ADC could be read normally as expected. When reading other channels, no conversion happened and the last value was read back. It was found out that the odd channels are more likely to work than the even channels. Best results were achieved when first the multiplexer was set to 0 and then to the desired channel.

The source of the problem was a timing issue between the logic and the ADC. The start signal to trigger the conversion was applied together with the multiplexer bits on the same clock edge as the ADC was reading. Moreover, the multiplexer bits were routed in parallel to the start signal, with the least significant bit closest to the start signal. When the lowest bit goes from '0' to '1', it helps the start signal and leads to a higher chance of triggering the conversion. The inverse has a negative effect, explaining why odd channels work better. More ADC tests are shown in section 5.6.6

#### 5.5.5 Logic function updates for the PSPPv4

The logic of the PSPPv4 was updated to fix the bugs described above. This required changes mainly in the user logic. Some new features were included additionally. This includes a second input for a temperature converter. The two over-temperature signals are combined with a logic OR and can both activate the over-temperature flag. Further, the reset value for the BGHI register and control register were changed to have the trim active and trim value 0x1000. The ID was incremented to distinguish it from its predecessors. Only a single digital output is available on the PSPPv4 and no digital input. Therefore, the former DIN register shows now the internal status with the SCB address, the comparator status and the end-of-conversion flag from the ADC. Table 5.5 list the new status register.

#### ADC control and status register

To fix the ADC timing problem described above, the ADC clock was inverted, so that the control signals are assigned on the falling edge. Additionally, the multiplexer signals are assigned one clock earlier than the conversation start signal.

One bit in the status register indicates the status of the end of the ADC conversion as indicated in Table 5.5. The ADCR registers are still directly read from the ADC inputs, while the ADCL registers are buffered in a TMR register at the end of each ADC conversion. This was added because the ADC logic itself is not triplicated.

#### Correction of TMR implementation

Figure 5.17 on the facing page shows the synthesized schematic of one-third of the logic for the over-voltage flag. The other two bits from the triplication are identical. The output from the flip-flop is routed directly back to its input instead of the voted signal, indicated with the red in Figure 5.17 on the next page. This means that the triplication is not correcting a bit flip. Unfortunately, this was only discovered after submitting the PSPPv3.

The problem was that in the Verilog code no default case was defined. This was corrected for the PSPPv4 by explicitly stating that the voted output is the default value. After synthesis, the green connection was made. Now the flip-flop always loads the voted state and corrects its value in case of an SEU. The entire PSPPv4 logic was checked for the same mistake and corrected where necessary.

|        |    |     | Description |       |        |   |             |            |             |            |                   |
|--------|----|-----|-------------|-------|--------|---|-------------|------------|-------------|------------|-------------------|
| Name   | Nb | Dir | 7           | 6     | 5      | 4 | 3           | 2          | 1           | 0          | Default           |
| Status | 4  | R   |             | SCB a | ddress |   | Comp<br>Mod | Comp<br>[1 | Temp<br>:0] | ADC<br>EOC | \$X1 <sup>a</sup> |

Table 5.5: Updated status register of the PSPPv4 chip. See Table 5.4 for the full list.

<sup>a</sup> The address bits are defined by the connection at the pads (unconnected all at '1').



Figure 5.17: Schematic of the over-voltage flag. Red shows wrong implementation in the PSPPv3, while green is the corrected version in the PSPPv4.

#### 5.5.6 Test of PSPPv4 logic

The communication and register access worked in the PSPPv4 as with the PSPPv3. There were not enough PSPPv4 available for testing a full chain with 16 chips. Nevertheless, test chains with PSPPv3 and PSPPv4 together were successfully tested. The ADC can now be addressed correctly and the conversion happens reliably on all channels. The end-of-conversion flag ADC can be correctly read together with the comparator outputs from the status register defined in Table 5.5.

A test of the SEU protection was performed with a proton beam. The results are described in section 6.3.4.

#### 5.5.7 Asynchronous triple modular redundancy

The protection through triplication implemented in the PSPP relies on a working clock. In case of a lost clock, the communication won't work and the monitoring is lost. Still, a module could be operated when the monitoring is lost. The PSPP should therefore not activate its bypass, even when the communication is lost. In this case, the risk of SEU increases as the registers are not updated in periodic intervals anymore. The only option would be a power cycle to reset the chip. Therefore, an additional feature was added to protection against SEU without a clock.

The majority voter provides an error signal which is high in case one of the three flip-flops has a different value. All error signals from the voters are combined with a logic OR. The resulting signal is used to increment the SEU counter. The error signal can also be used to reset the content of all three flip-flops, meaning that the flip-flops are directly reset whenever an error occurs. Figure 5.18 on the following page illustrates the concept implemented for the PSPP logic.

This asynchronous protection was implemented in the user logic controlling the bypass. It was decided to implement two operation modes, one protected through the normal clock and a second protected asynchronously. Therefore a clock detection circuit is required, described in section 5.5.8. The clk\_en signal is used to switch between normal synchronous TMR and the asynchronous mode.



Figure 5.18: Simplified concept of the asynchronous TMR. The blue elements are added compared to the normal implementation. Only one-third is shown.

Table 5.6: Updated registers of the PATT chip. See Table 5.4 and 5.5 for the full list.

| NT     |        | Description |                       |        |        |             |            |                |      |                   |
|--------|--------|-------------|-----------------------|--------|--------|-------------|------------|----------------|------|-------------------|
| Name   | ND DIr | 7           | 6                     | 5      | 4      | 3           | 2          | 1              | 0    | Default           |
| Status | 4 R    |             | SCB a                 | ddress |        | Comp<br>Mod | Comp<br>[] | o Temp<br>.:0] | DIN  | \$X1 <sup>a</sup> |
| SEUHI  | 13 RO  |             |                       | SEU o  | ounter | r value 1   | 5-8        |                |      | \$00              |
| SEULO  | 14 RO  |             | SEU counter value 7-0 |        |        |             |            |                | \$00 |                   |

<sup>a</sup> The address bits are defined by the connection at the pads (unconnected all at '1').

#### Internal registers update for the PATT chip

Updates of the internal registers were made together with the asynchronous TMR in the PATT chip. An SEU counter was added, which counts the number of observed upsets, based on the error signal from the voters. The changes are listed in Table 5.6. The chip ID was again incremented to \$21F7.

#### Test of the asynchronous TMR

There is a problem with the logic, even though the design passed all simulations and had no timing issues in synthesis. The chip is responding to commands, but writing and reading the registers is not properly working. Some values can be written, though a second write access is not properly executed. The SEU counter returns a different value, every time it is read.

This indicates some timing issues with the error signal from the voters and the newly introduced asynchronous TMR. The investigation could not be completed by the time of this writing.

#### 5.5.8 Clock detection circuit

To switch between normal and asynchronous operation, a clock detection circuit is required. A circuit based on logic elements and delay elements was developed and implemented based on [102]. The delay elements were designed to detect a clock of 200 kHz. The output becomes high if there is no change in the input signal for more than 3 µs and indicates that the clock is missing. Figure 5.19 shows the simulated output signal as a function of the input. The output stays low as long a clock is at the input.

To test the clock detection circuit, different input frequencies were applied. The output stays low for frequencies  $\geq 200 \text{ kHz}$  as it should be. If the frequency is smaller, then the output goes to '1' within each period. At each clock edge, the output goes back to '0'. Two examples are given in Figure 5.20.



Figure 5.19: Simulation result of the clock detection circuit. The top signal is the input clock and the bottom signal is the output signal. When high, no clock is detected.



Figure 5.20: Measured output of the clock detect circuit for 200 kHz (a) and 25 kHz (b).

#### 5.6 ADC

The monitoring of the module health is one of the main tasks for the PSPP. This includes the measurement of the voltage and temperature. To directly digitize the values an ADC is used.

The ADC was originally designed as a generic ADC for monitoring within the FE-I4 [13] and was already used in the first PSPP prototypes. It is a 10-bit successive approximation ADC and has an input range from 0 V to the applied reference voltage (1 V). The reference voltage is provided by the internal regulators (see section 5.10). Because the logic cells from the original design were not available anymore, the successive approximation logic was replaced with a new block performing the same logical function and using the digital library from the design kit. The start conversion signal is generated when a write access to the *ADCmux* register is performed. The conversion is finished after eleven clock cycles. Therefore, the result is directly available on the next SCB read access, which requires more clock cycles to perform. The conversation of the binary value to a voltage can be done with the formula 5.4.

$$Vadc = Vref \cdot \frac{ADC}{1024} \tag{5.4}$$

An eight-way analog multiplexer is used to select between different signals. The channel can be selected by writing to the *ADCmux* register as described in section 5.5.2. The possible channels are listed in Table 5.7 on the next page together with the channels for the PSPPv4 and PATT.

#### 5.6.1 Voltage measurement

The module voltage ( $V_{mod}$ ) is reduced by an internal voltage divider to support up to 2.4 V at the input. The ADC has a resolution for the module voltage of 2.3 mV. To convert the binary ADC value, the divider factor has to be considered. For the PSPPv3 this is given with equation 5.5.

$$Vmod = Vref \cdot \frac{ADC}{1024} \cdot \frac{1}{0.42}$$
(5.5)

#### Correction of the module voltage input

The  $V_{mod}$  pad was connected to the  $V_{DD_A}$  voltage rail, which is at 1.5 V. Therefore the input cannot reach the required >2 V. It can be fixed by adding an external resistor between the  $V_{mod}$  pad and the module. As the internal voltage divider has a total resistance of ~100 k $\Omega$ , a 60 k $\Omega$  external resistance is required. To correctly set the threshold, an external pull-down is required also on the *ThMod* pin. This is indicated in the block

| $\mathbf{Ch}$ | PSPPv3 signal                                  | <b>PSPPv4</b> and <b>PATT</b> signal |
|---------------|------------------------------------------------|--------------------------------------|
| 0             | generic analog input <sup><math>a</math></sup> | $V_{mod}$ a                          |
| 1             | $V_{BG}$                                       | ThMod                                |
| 2             | ThMod                                          | $V_{temp0}$                          |
| 3             | $V_{mod}$ <sup>a</sup>                         | $V_{temp1}$                          |
| 4             | ThTemp                                         | ThTemp                               |
| 5             | $V_{temp}$                                     | $V_{global}$                         |
| 6             | $V_{global}$                                   | $1/2  V_{int}$                       |
| 7             | $1/2 \ V_{DD\_A}$                              | $V_{BG}$                             |
| a Di-         | riden factor of 0.49                           |                                      |

Table 5.7: ADC channels for the PSPP chips developed during this work

<sup>a</sup> Divider factor of 0.42

diagram of the PSPPv3 shown in Figure 5.1 on page 49. The desired threshold can be calculated with the following equation:

$$V_{ThMod} = \frac{160k}{42.6k} \cdot \frac{56.8k \cdot R_{ex}}{100k \cdot R_{ex} + 43.2k \cdot 56.8k} \cdot V_{DDA}$$
(5.6)

E.g. a  $47 \,\mathrm{k\Omega}$  is required for a threshold of  $2 \,\mathrm{V}$ , or  $82 \,\mathrm{k\Omega}$  for about  $2.4 \,\mathrm{V}$ .

#### 5.6.2 Temperature measurement

For measuring the temperature a  $10 \,\mathrm{k\Omega}$  NTC resistor is intended. The NTC should be connected between  $V_{DD\_A}$  and the  $V_{temp}$  pad for the PSPPv3. A  $10 \,\mathrm{k\Omega}$  pull-down is integrated in the PSPPv3 to operate the NTC. See Figure 5.1 on page 49 for details on the usage of the NTC. This gives a voltage of  $0.75 \,\mathrm{V}$  at  $25 \,^{\circ}$ C. The internal threshold ThTemp is set to  $0.8 \,\mathrm{V}$  what corresponds to approximately  $30 \,^{\circ}$ C. External resistances allow to adjust the threshold.

#### 5.6.3 Vglobal reference

During the risk analysis described in Chapter 7, the problem of a drifting reference voltage was identified. The best measures against this are design modifications to prevent any drifts. Another would be to allow trimming of the reference voltage. This requires to know how much the reference drifted. Once fabricated and installed in the experiment only the measurements from the PSPP itself can be used. To still be able to detect such a drift an external stable voltage is required. It was therefore decided to use the global supply voltage for the PSPP chip for this purpose.

To measure this voltage with the PSPP a divider is required to bring it down to the ADC input range. This is done by an external resistor to the supply voltage and an internal pull-down resistor to ground. The ADC channel  $V_{global}$  is the measurement of this voltage.

#### 5.6.4 Internal monitoring channels

The remaining channels are used for internal monitoring. This includes the thresholds for the over-voltage (ThMod) and over-temperature (ThTemp) protection. The transistor-BG implemented for testing purposes and the analog supply voltage  $V_{DD_A}$  reduced by half can be measured by the ADC.

The PSPPv3 further includes an additional analog input pad, which can be measured on channel 0.

#### 5.6.5 ADC update in PSPPv4 and PATT

Using the NTC as pull-up has the problem, that a voltage of 0 V is measured if the NTC is not connected. This can cause calculation errors during the conversion to temperature. Further, it requires access to the supply voltage. Better for the calculations and external routing is to have the NTC in the pull-down path of the divider. These changes were made for the PSPPv4.

A second temperature input was requested by the community. Therefore the channel assignment was changed for the PSPPv4 as shown in Table 5.7 on the preceding page.

#### 5.6.6 ADC test

An external voltage is applied to the monitoring channels in the initial test of the ADC. In the PSPPv3, this was done for the module voltage input and the temperature input. Because of the design error in the  $V_{mod}$  pad of the PSPPv3, it was only tested up to 1.4 V. The test results from a PSPPv3 is shown in Figure 5.21. There is an offset observed on the temperature channel of about 25 ADC counts, corresponding to 25 mV. The module voltage doesn't have this offset and the divider factor corresponds to the design value.



Figure 5.21: Test results for the monitoring channels of the PSPPv3 ADC.

#### 5.7 Comparator for over-voltage and over-temperature protection



Figure 5.22: Test results for the monitoring channels of the PSPPv4 ADC.

The test was performed also the PSPPv4, where two temperature inputs are tested. The results are shown in Figure 5.22. Here the module voltage input was corrected and therefore it was possible to measure up to 2.4 V. All channels for this chip have also a small offset and measure about 10 mV to small. The temperature channels show some non-linearity whenever one of the highest two bit changes. This is however not observed on the module voltage channel.

## 5.7 Comparator for over-voltage and over-temperature protection

The PSPP can protect a module from over-voltage and over-temperature. Later could be the case if a module delaminates and loses contact with the cooling pipe. To have a fast and automatic bypass activation, comparators are used. They set a flag in the bypass logic, which activates the bypass as explained in section 5.5.2.

#### 5.7.1 Comparator implementation in the PSPPv3

During the development of the PSPPv3, it was decided to reuse the comparator from the predecessor. This comparator is not designed for the required radiation hardness and has to be replaced in future versions.

The PSPPv3 chip includes two comparators for the automatic bypass activation. One comparator checks the temperature and the other the module voltage. The thresholds are controlled with voltage dividers as can be seen in Figure 5.23 on the next page. Both threshold voltages are also available on an analog output pad and can be read in the ADC. The external pads allow to adjust them through external resistors.



Figure 5.23: Comparators in the PSPPv3 for module voltage (a) and temperature (b).



Figure 5.24: Schematic of the implemented comparator circuit. Based on [37].

#### 5.7.2 Radiation hard comparator

A new comparator was designed based on the design from Baker [37]. The comparator is designed to operate with a supply voltage of 1.5 V and uses only thin gate transistors for radiation hardness. It consists of three stages as shown in Figure 5.24.

The input stage is a differential amplifier with M3 and M4 with an active load formed with M1 and M2. The bias of the differential stage is using an existing biasing circuit from the regulators generating  $V_{bnc}$  and  $V_{bn}$ .

The decision stage gets as input the mirrored differential current. The cross-gate connection from M10 and M11 makes a positive feedback to increase the gain. The transistors M9 and M12 modify the switching level of the decision element and introduce a hysteresis. The hysteresis was set to 5 mV.

In the last stage, the output from the decision circuit is converted to a digital signal. Two inverters are added to buffer the digital signal and convert it from the higher analog supply used in the comparator to the digital supply voltage.

The comparator was designed to consume less than 0.5 mA and switches within 20 ns. The delay for the output signal is in around 10 ns for the rising edge. Figure 5.25 on the next page shows a transient simulation of the designed comparator.

The new comparator is included in the PARC chip to study the performance. The inputs of the comparator are directly available on pads. No further voltage divider or

#### 5.7 Comparator for over-voltage and over-temperature protection



Figure 5.25: Simulation of the comparator.

filters were added in the test chip to verify the function of the comparator circuit.

Tests with the comparator in the PARC have shown some problems. An improved design was therefore made for the PSPPv4 which is described below. The test results from both designs are described in section 5.7.4.

#### 5.7.3 Comparator enhancements

An oscillating behavior of the comparator output was observed when the two inputs are close to each other as described in the next section. To prevent this, the comparator design was adjusted by increasing the hysteresis and by adding a filter at the input. To increase the hysteresis, the decision stage of the comparator circuit was adjusted. Simulations gave a new hysteresis larger than 17.5 mV across all process corners<sup>3</sup>.

The filter used in the PSPPv3 was added to the comparator input. This filter has a cutoff frequency of 1.6 MHz.

The wiring of the NTC was changed as explained in section 5.6.5. Because the sensor is in the pull-down path, high temperatures result in low voltages. Therefore an inverted output was added to still have a high signal when an over-temperature occurs.

Further, the powering of the comparator was modified. The output stage and the buffers are all powered from the digital supply. This is to reduce the load on the analog supply during the switching. All the changes are shown in Figure 5.26 on the following page. Capacitors were added next to the comparator to stabilize both power supplies. These capacitors are not shown in the figure.

<sup>&</sup>lt;sup>3</sup>A process corner simulation means that the simulation was done for different variations in the process. These variations are measured by the foundry and implemented in the design kit. This allows verifying that a chip work also in non-optimal cases.



Figure 5.26: Schematic of the enhanced comparator for the PSPPv4. In green are the updated elements compared to Figure 5.24 on page 74



Figure 5.27: Measured signal of the comparator in the PARC (a) and PSPPv4 (b). Ch. 1 (yellow) is the p-input, Ch. 2 (green) is the m-input and Ch. 3 (purple) is the comparator output.

#### 5.7.4 Test results with the comparators

When testing the comparator in the PARC, an oscillating behavior was observed when both inputs were close together. This is shown in Figure 5.27a. Adding an external filter and capacitors on the supply voltage allowed to remove the oscillation.

Without a filter, the oscillation was also seen on the inputs signals. Therefore, it is suspected that switching the output couples to the inputs. This causes the input to change in a way, that the output toggles back resulting in an oscillating behavior. However, it was not possible to reproduce this effect in simulation even with the regulators included.

Based on these assumptions, an update of the comparator was therefore made as described in the previous section. Figure 5.27b shows that the updated comparator is working as intended. This figure shows the results from the PSPPv4, where the reference applied to the m-input is generated by the chip. The comparator tested is for the temperature, which uses the inverted output signal.

#### 5.8 Bypass transistor

The design goal for the bypass transistor was to work with up to 8 A and a voltage drop of <100 mV. These numbers would lead to a maximum power of 800 mW. The bypass is designed with thin-film oxide transistors to be radiation hard. The core transistors of the 130 nm technology support only up to 1.6 V between two contacts. To achieve the required voltage tolerance of >2 V when open, the bypass transistor needs to be cascoded. To isolate the bulk contact from the chip substrate, the cascoded transistor has to be a triple-well transistor. This is required because otherwise the bulks of the two NMOS devices would be shorted together. The schematic of the bypass is shown in Figure 5.28. It was decided to use for both transistors triple-well types. This permits to use for both transistors the same design and reuse the layout.

The nodes  $V_{mod+}$  and  $V_{mod-}$  are the contacts to the module voltage.  $V_{mod-}$  is also the local ground and will be connected to the PSPP ground potential.  $V_{bias}$  is the bias voltage for the cascoded transistor. The control signal  $V_{ctrl}$  from the logic is applied to the lower transistor.

#### 5.8.1 PSPPv3 bypass

The design of the bypass implemented in the PSPPv3 is made out of multiple transistors the size of 480 nm by 300 µm and having 15 fingers. The minimal size was not used as it is more affected by radiation (see section 3.1.1). The bias voltage was set to 1.2 V i.e. the digital supply voltage. The digital supply was chosen to have the large load from the bypass not on the analog supply, even though a lower on-resistance could be achieved with the analog supply at 1.5 V. To achieve a voltage drop of 0.1 V when active, 1500base transistors are placed in parallel. This gives a total width of 450 mm.



Figure 5.28: Schematic for the cascoded bypass transistor using triple-well NMOS transistors. The green voltages indicate the "off" state and the red for "on" state.

#### Driver for bypass

During the design of the bypass for the PSPPv3, the focus was on switching fast to reduce the power in the transistor. The requirement with the transients on the LV line came only later. A driver consisting of two inverters in series was used. The second is made out of several in parallel, realizing a stronger driver. Attention was paid to the internal voltages during switching so that no voltage goes above 1.6 V during switch-on. The simulated behavior on activation and deactivation is shown in Figure 5.29.

When closing the bypass, the control transistor M1 switches on first. As soon as the voltage on the cascoded transistor M2 drops, it turns on and is completely open before M1. According to simulation, the bypass closes within 15 ns and opens within 30 ns.



Figure 5.29: Corner simulation for PSPPv3 bypass voltages for switch-on (a) and switch-off(b). Top plot: Control signal,  $V_{GS\ M1}$ ,  $V_{GS\ M2}$ , Middle plot:  $V_{DS\ M1}$ ,  $V_{DS\ M2}$  and Bottom plot:  $V_{mod}$ 



Figure 5.30: Layout of one forth of the bypass for the PSPPv3 including the pads.

#### Bypass pads and layout

The current also needs to go through an input pad. The analog input pad supports (300 to 400) mA and a single wire bond carries a current of  $\sim 150$  mA. To have some margin, it was therefore decided to put 32 pads for both drain and source of the bypass. Always 8 pads are grouped and form one large pad opening, where at least 20 wire bonds can be connected, which results in a current of 100 mA per wire bond at 8 A total current.

The bypass was split into four equal parts and placed on each chip edge. This was done to minimize the chip size. It is further beneficial, as this spreads the heat load evenly across the chip. The layout of one fourth is shown in Figure 5.30 together with the pads. The actual bypass transistor is in the upper part of the figure and has a height of 205 µm. Together with the pads, the bypass is 2.35 mm wide.

The cascoded transistor M2 was placed next to the drain pad, while the control transistor M1 is placed next to the source pad. The connection between the different transistors was made from on all available metal layers. This was done respecting the electromigration rules form the design kit, to pass 2 A per bypass part. Otherwise, there would be the risk that the metal lines get damaged over time.

The test results are described together with the results from the PSPPv4 bypass in section 5.8.3.

#### 5.8.2 Bypass design improvements

The following problems were observed with the bypass in the PSPPv3 and are addressed in an updated version for the 4th generation.

- The on-resistance was larger than designed and the power consumed in the chip and wire bonds was too large. Leading to chip temperatures >100 °C if the maximal current is bypassed.
- The original bypass was designed to have a fast switch-on to minimize the power consumed. This can cause large transients. It is better to switch slower.
- When the PSPPv3 is not powered, current can still flow through the bypass. Simulations showed that if  $V_{sup}$  is shorted to GND, a significant amount of current was flowing through the diodes protecting against electrostatic discharge (ESD). Also, internal voltages are built up at the two gates, causing the transistor to partially open if the chip is not supplied.

#### **PSPPv4** bypass layout

It was decided to use the same base transistor as for the PSPPv3 with a W/L of  $300 \,\mu\text{m}/480 \,\text{nm}$ . This allowed reusing part of the existing and working layout. The total width of the bypass was increased to fill the entire available space in the layout. The size was defined by the number of bumps per bypass contact, see section 5.3.3. Each device has 6300 base transistors resulting in a total width of 1.89 m per device if all single transistors would be put in a row. The entire bypass layout has a size of 2.8 mm by 2.68 mm. It fills about two-thirds of the PSPPv4 chip area. It is located where the two large arrays of bump bonds are seen in Figure 5.5 on page 51. To reduce the on-resistance further, it was decided to use  $V_{DD}$  <sub>A</sub> as a bias voltage.

#### Update of the bypass driver

A normal analog pad has diodes to protect against ESD, which are connected to  $V_{sup}$  and ground. They could become conducting when a voltage is applied on the pads and no power is present. According to simulation, around 2 A were flowing through the ESD diodes of the analog output pad if a current of 8 A was applied to the serial chain. This was tested using smaller currents with the PSPPv3. When applying an  $I_{SP}$  of 1.8 A and a module voltage of 1.5 V, 16 % of  $I_{SP}$  was flowing through the PSPPv3. For more details see the next section.

To prevent that a current is flowing through the bypass, a supply pad is used. This pad has a voltage clamp, which activates in case a high voltage spike is seen at the input. Usually, this kind of pad is used for the supply voltage. With the voltage clamp, the current through the pads drops to a few mA in simulation.

If  $V_{mod}$  rises above 1.6 V while the chip is not powered, the cascode transistor sees a voltage larger than the technology allows. To prevent this, a bias circuit was added to the bypass driver as shown in Figure 5.31 on the next page. The voltage divider R2 and R3 generates a bias voltage and thus protects the bypass from too large voltages if the power is off. However, the control gate was charged at around 130 mV and about 100 mA was flowing through the bypass. Therefore, the resistance R4 was added on the control gate. This resistance prevents the bypass from opening when no power is applied.

Originally it was planned to directly apply the 1.5 V or 1.2 V from the LDOs. The LDO includes a rather small (about  $6 \text{ k}\Omega$ ) voltage divider at its output. This would require an even smaller voltage divider, or just a pull-up could be imagined as the LDO output resistance is in parallel to R3. But both solutions disturb the proper operation of the LDO when the chip is powered. A buffer was therefore added to decouple the LDO from the bias gate of the bypass.

The PMOS transistor of the output stage of the buffer still becomes conductive. The bias gate is around 500 mV connected to the drain of the PMOS. The source is connected to  $V_{sup}$ , which is at 0 V. The source and drain are thus inverted and because the gate of the PMOS is also kept at voltages close to  $V_{sup}$ , this transistor becomes slightly conductive. A global optimization was run to find the best values for the voltage divider on the bias gate so that the off resistance stayed above  $100 \Omega$ .



Figure 5.31: Implemented bypass driver for the PSPPv4 to prevent current flow if no power is applied on the chip.

As there is a feedback loop with the buffer for the bias voltage, it could become unstable. A stability simulation was performed and showed that the phase margin is across all process corners above  $90^{\circ}$ .

The discussion above was made with  $V_{sup} = 0$  V, i.e. shorted to the chip ground. Another point is when  $V_{sup}$  is not connected and is floating. In this case, the bypass resistance drops, because the gates of the bypass see some current. The leakage current through the gate make a rather high Vctrl and therefore a small current of some mA is passing through the bypass. By decreasing the pull-down on the control gate, the bypass resistance could be increased to a >100  $\Omega$ .

The RC filter after the driver for the control gate is to reduce the switching speed.

#### Simulations

The test bench to perform simulations on the bypass included also the LDO for generating  $V_{bias}$ . A passive model was used for the module and the bump bonds. The different operation cases were verified in simulation.

**Bypass resistance:** The bypass resistance was checked in different cases with a DC simulation. A serial current of 8 A was used. The current through the bumps was measured as well as through the bypass. The resistance was calculated using the voltage between drain and source  $(R_{fet})$ , but also between the two pads, including the bump resistance  $(R_{bypass})$ . The following three operation cases were looked at:

- Bypass on: Vctrl at 1.2 V and Vbias at 1.5 V and  $V_{sup}$  at 2.3 V
- Bypass off: Vctrl at 0 V and Vbias at 1.5 V and  $V_{sup}$  at 2.3 V
- Power-off: Vctrl at 0 V and Vbias at 0 V and  $V_{sup}$  at 0 V

Table 5.8 lists the summary of the bypass values over the three tests and all process corners. The temperature was varied from (-40 to 80) °C. It can be seen that the off resistance has a very large spread. The very low values occur at 80 °C. The smallest off

|           |              |                              | aione, white                 | "Toypass men                  |                                 | lie builip bolius:      |
|-----------|--------------|------------------------------|------------------------------|-------------------------------|---------------------------------|-------------------------|
|           |              | $\mathbf{Min}\;[\mathrm{V}]$ | $\mathbf{Max}\;[\mathrm{V}]$ | $\mathbf{Mean}\;[\mathrm{V}]$ | $\mathbf{Median}\;[\mathrm{V}]$ | Std. Dev $[V]$          |
| $R_{on}$  | $R_{fet}$    | $0.924\mathrm{m}\Omega$      | $2.142\mathrm{m}\Omega$      | $1.428\mathrm{m}\Omega$       | $1.41\mathrm{m}\Omega$          | $0.325\mathrm{m}\Omega$ |
|           | $R_{bypass}$ | $6.924\mathrm{m}\Omega$      | $8.142\mathrm{m}\Omega$      | $7.428\mathrm{m}\Omega$       | $7.41\mathrm{m}\Omega$          | $0.325\mathrm{m}\Omega$ |
| $R_{off}$ | $R_{bypass}$ | $29.62\Omega$                | $830.5\mathrm{k}\Omega$      | $63.66\mathrm{k}\Omega$       | $2.283\mathrm{k}\Omega$         | $170.4\mathrm{k}\Omega$ |
| Power off | $R_{bypass}$ | $51.78\Omega$                | $347.6\mathrm{k}\Omega$      | $23.34\mathrm{k}\Omega$       | $1.128\mathrm{k}\Omega$         | $67.66\mathrm{k}\Omega$ |

Table 5.8: Simulated PSPPv4 bypass resistance for the different cases.  $R_{fet}$  is the resistance of the transistor alone, while  $R_{humass}$  includes  $6 \text{ m}\Omega$  for the bump bonds.

Table 5.9: Simulated PSPPv4 bypass switching characteristics with different resistor values. The PSPPv3 bypass was simulated with the old driver in the same test bench.

| R driver                                                  | $1\mathrm{k}\Omega$ | $5\mathrm{k}\Omega$ | $10 \mathrm{k}\Omega$ | PSPPv3 bypass |
|-----------------------------------------------------------|---------------------|---------------------|-----------------------|---------------|
| on delay [µs]                                             | 3.353               | 10.87               | 20.98                 | 0.123         |
| off delay $[\mu s]$                                       | 22.72               | 98.26               | 187.8                 | 1.398         |
| switch on time [ $\mu$ s]                                 | 1.345               | 5.013               | 10.42                 | 0.079         |
| switch off time [ $\mu$ s]                                | 6.678               | 24.97               | 49.03                 | 1.11          |
| $\boldsymbol{R_{on}} [\mathrm{m}\Omega]$                  | 1.345               | 1.371               | 1.394                 | 5.95          |
| $\boldsymbol{R_{of\!f}} \left[ \mathrm{k} \Omega \right]$ | 1.296               | 1.173               | 1.041                 | 6.19          |

resistance at 27 °C is 257.6  $\Omega$  when the bypass is off and 241.6  $\Omega$  for the power-off case. This corresponds to a current of about 1 ‰ of the serial supply current, or around 8 mA flowing through the bypass. This is in the order of the current added by a single PSPP chip to the serial supply chain and is no issue for the operation of the chain.

**Bypass switching speed:** Fast bypass switching causes transients on the chain. Especially when opening the bypass, large spikes can occur on the bypass itself. A RC filter is in front of the control gate to reduce the switching time as seen in Figure 5.31 on the preceding page. The switching time is increasing with a larger filter. Because of the large gate capacitance, it needs a capacitance >10 nF to see a difference in the output. Therefore C1 was not implemented inside the chip. The control gate voltage was made available on a pad so that an external capacitance could be added for further adjustments. However, the control gate voltage becomes smaller when R1 is increased, which is leading to a larger on-resistance.

Table 5.9 lists the values for switching with different driver resistances. While Figure 5.32 on the next page shows the simulation of the bypass transistor voltages. A  $5 \text{ k}\Omega$  resistor was chosen. This gives a reaction time in the order of 10 µs while having a switch on time of around 5 µs. No over-voltages are seen during the switching.



Figure 5.32: PSPPv4 bypass voltages for switch-on (a) and switch-off(b) across corners. Top plot: Control signal,  $V_{GS M1}$ ,  $V_{GS M2}$ , Middle plot:  $V_{DS M1}$ ,  $V_{DS M2}$ and Bottom plot:  $V_{mod}$ 

#### 5.8.3 Bypass performance tests

The results presented here are for the PSPPv3 and PSPPv4 to compare the two versions and improvements made with the PSPPv4. To measure the performance of the bypass a small serial power chain was used with resistors as dummy modules as shown in Figure 5.33 on the following page. For simplicity, only a simple transistor is shown for the bypass instead of the cascoded transistors implemented.

#### Bypass on-resistance

The on-resistance  $(R_{on})$  was measured as a function of the current flowing through the bypass. For this  $I_{SP}$  was ramped up while  $I_R$ ,  $I_{By}$  and  $V_{By}$  were measured.  $V_{By}$  was measured behind the ampere meter, to measure only the voltage drop across the bypass.



Figure 5.33: Schematic of the chain for bypass tests. PSPP #2 to #3 were always PSPPv3, while #1 was either a PSPPv3 or PSPPv4.



Figure 5.34: Bypass  $R_{on}$  as function of current for the PSPPv3 (a) and PSPPv4 (b).

The on-resistance is shown in Figure 5.34 for the two versions of the PSPP. For PSPPv3 the current was only increased up to 4.5 A as the chip temperature reached 70 °C at that current. While for the PSPPv4 the chip temperature stayed below 60 °C even at a current of 8 A.

For both chips the on-resistance increases with the current. This is due to the self-heating of the chip. The on-resistance of the PSPPv3 is  $25.5 \pm 0.9 \,\mathrm{m\Omega}$  while the PSPPv4 has a five times smaller on-resistance of  $5.2 \pm 0.2 \,\mathrm{m\Omega}$ .

#### Bypass off-resistance

The off-resistance  $(R_{off})$  was measured the same way as the on-resistance. It was measured once with power (Figure 5.35 on the facing page) and once without power (Figure 5.36 on the next page).

The off-resistance of the PSPPv3 is larger than for the PSPPv4 when properly powered.



Figure 5.35: Bypass off-resistance with PSPP powered.



This is the drawback of the larger transistor in the PSPPv4. The leakage current begins to saturate with rising module voltage. Therefore,  $R_{off}$  is increasing with the voltage over the bypass.

As mentioned in section 5.8.2 the PSPPv3 had the problem, that a significant part of the  $I_{SP}$  current flows through the bypass, when the PSPPv3 is not powered. This can be seen in Figure 5.36 where the off-resistance drops at a voltage >1 V to less than 10  $\Omega$ . At a voltage of 1.5 V, almost 300 mA are flowing through the bypass while 1.5 A were flowing through the dummy module.

The updates implemented in the PSPPv4 prevents this as  $R_{off}$  stays larger than 100  $\Omega$ , even at a module voltage of 2 V. The leakage current through the bypass was only 2 mA at maximum voltage.

#### Bypass switching

The switching behavior of the bypass was measured in a chain as shown in Figure 5.33 on the preceding page. However, there were five dummy modules used for the switching measurements. The lowest three (closest to the chain ground) were all equipped with either PSPPv3 or PSPPv4.

The PSPPv3 switches on within ~30 ns as can be seen in Figure 5.37 on the following page. While the fast reaction time is good for protecting against over-voltages, it causes transients in the other modules. The power supply in current limiting mode is not fast enough to react to the changes in the load. Also, the opening of the bypass happens in less than 100 ns. Here a large over-voltage on the module actually opening can be seen. The measurement shown in Figure 5.37 on the next page was done with a reduced current to limit the peak. This is dangerous for the PSPPv3 itself. Therefore the precaution in the usage was taken, to open a bypass only when no  $I_{SP}$  is flowing. This might also be a possible scenario for operation. Especially when a bypass is set or cleared by command, the chain is powered off beforehand to prevent any transients.

Figure 5.38 on the following page shows the switching behavior for the PSPPv4. The



Figure 5.37: PSPPv3 bypass switching on (a) and off (b). Ch1 to Ch4 show the voltages on module 1 to 4 with ground as reference. The third bypass in the chain was switched.



Figure 5.38: PSPPv4 bypass switching on (a) and off (b). Ch1 to Ch4 show the voltages on module 1 to 4 with ground as reference. The third bypass in the chain was switched.

switch-on time is about  $8 \,\mu\text{s}$  and the switch off time takes  $\sim 51 \,\mu\text{s}$ . Both times are twice as large as simulated (see Table 5.8 on page 82).

The switching time of the PSPPv4 is about 100 times slower than for the PSPPv3. Still, the neighboring modules see some variation in the voltage, indicating that the current source is not fast enough to regulate the current. When opening the bypass, no over-voltages are observed anymore. This improves the safety for operation because the bypass can be opened with LV on.

#### 5.9 Bandgap reference

An important element in the circuit is the voltage reference. It should create a constant voltage, which is temperature and process independent. From this voltage, other voltages can be referenced. The regulators and reference from the PSPP version 1 and 2 use thick-gate-oxide transistors and are therefore not suited for the radiation levels required by the PSPP. Two bandgap (BG) circuits were investigated for usage in the PSPP.

#### 5.9.1 Diode based bandgap reference

A diode-based BG was developed for the PARC chip by T. Fröse during a semester project at the University of Applied Science and Arts Dortmund under the supervision of M. Karagounis and with help from myself. This BG (referenced as diode-BG hereafter) was designed to operate with a supply voltage of 2.5 V. The design is based on a CMOS bandgap reference circuit by [103]. Figure 5.39 shows the complete schematic of the diode-BG. The circuit uses cascoded elements to achieve the voltage tolerance with thingate transistors.

Test results with the diode-BG are shown together with the regulator test in section 5.10.3 and 5.10.4. Further results from long term tests are given in section 6.4.2.

#### **Reference voltage**

The output voltage  $V_{BG}$  is defined by  $I_{R7} \cdot R7$ . The current  $I_{R7}$  is created by the reference circuit as developed below. To achieve temperature stability, the positive temperature coefficient of the resistance R7 is compensated with the negative temperature coefficient of the diodes.

From the diode equation, the forward voltage across the diode can be defined as equation 5.7.  $I_D$  is the current through the diode,  $V_T$  the thermal voltage  $e/(n \cdot k \cdot T) \approx$ 



Figure 5.39: Schematic for the diode-BG.

 $26 \,\mathrm{mV}$  and  $I_S$  the saturation current.

$$V_f = V_T \cdot \ln \frac{I_D}{I_S} \tag{5.7}$$

Further have the voltages at the amplifier input the same potential, because the amplifier operates in negative feedback. The voltages over the diodes can be expressed as equation 5.8. The diode D2 was chosen N = 8 times larger as D1 by placing the same structure eight times. The current  $I_D$  from equation 5.7 is for D2, therefore, N times smaller than for D1. The current  $I_{R2}$  is equal to  $I_{R5}$  and also  $I_{R7}$  as all three paths are controlled by the output from the operational amplifier. R3 and R6 are identical and therefore are also the currents  $I_{R3}$  and  $I_{R6}$ , respectively  $I_{D1}$  and  $I_{D2}$  have to be the same. Together with equation 5.7, this gives 5.9.

$$V_{fD1} = V_{R4} + V_{fD2} (5.8)$$

$$V_{R4} = V_T \cdot \ln\left(\frac{I_{D1}}{I_{D2}/N}\right) = V_T \cdot \ln\left(N\right)$$
(5.9)

The reference current is defined by the sum of  $I_{R6}$  and  $I_{D2}$ , given in equation 5.10. The output voltage  $V_{BG}$  is defined with equation 5.11

$$I_{R5} = I_{D2} + I_{R6} = \frac{V_{R4}}{R4} + \frac{V_{D1}}{R6}$$
(5.10)

$$V_{BG} = R4 \cdot \left(\frac{V_{R4}}{R4} + \frac{V_{D1}}{R6}\right) \tag{5.11}$$

The current  $I_{D2}$  has a positive temperature coefficient, while  $I_{R6}$  has a negative. By choosing the resistance values accordingly, it is possible to minimize the temperature dependence around the operation temperature.

#### Startup

The BG has two stable operation points, one at 0V and the other at the desired  $V_{BG}$  value. A start-up circuit was added so that the reference starts always with the correct output voltage. This circuit is shown on the left in Figure 5.39 on the previous page. The start-up circuit injects an additional current into  $I_{R3}$  as long as the output voltage is low. This causes the amplifier to keep the output low and therefore creating a large current  $I_{R7}$ . Once the output voltage is reached, the transistor M10 opens and stops the additional current injection. The capacitance C1 is used to keep the gate of the cascoded transistors low during start-up.

During start-up simulations in combination with the shunt regulator, it was seen that the output increased above the defined value. The transistors M25, M26 and M27 are required to limit the output voltage and prevent this from happening. An additional filter was added at the output to improve the stability of the reference together with the shunt regulator.

#### 5.9.2 Transistor based bandgap structure

A BG reference was developed for ASICs in the ITk Strip detector and made available for use in the PSPP [104]. This BG is hereafter referenced as transistor-BG because it uses dynamic threshold MOS (DTMOS) transistors instead of diodes. The schematic is shown in Figure 5.40. The working principle is the same as for the diode-BG.

The DTMOS transistors are designed with an enclosed layout transistor (ELT) layout for increased radiation stability (see also section 3.2.1). DTMOS transistors have their gate connected to the body and drain for a forward biasing of the junction diode. Compared to normal MOS devices, they have a better matching [105].

However, the process variation is larger than with diodes. For this reason, a trimming circuit is included. The trimming is done by changing the value of R8 in Figure 5.40.

#### Trimming of BG

Figure 5.41 on the following page shows the BG output voltage when the trimming is set. A '1' was shifted through the trim bits from the lowest bit to the highest bit. This reduces the output voltage step by step.

The BG voltage can only be reduced with the trim bits. Therefore, the trimming was set as default to the center of the range. From the simulations, it was seen that half of the trim range is reached when the trim-bits are set to 0x1000. The transistor-BG has an output voltage of 0.68 V for the chosen default trimming.



Figure 5.40: Schematic for the transistor-BG.



Figure 5.41: Simulation of the transistor-BG trimming characteristic.



Figure 5.42: Transistor-BG output voltage as function of temperature with default trimming for all process corners. Red curves with 1.0V, Blue 1.2V and Green 1.5V supply voltage.

#### Simulation of the BG characteristic

Simulations were performed to check the behavior of the BG as function of temperature and process corners. They were performed with and without the trimming active, but only the later is shown here as this is the default mode.

The output voltage was simulated as a function of supply voltage and temperature. Between a change in the supply voltage from (1.0 to 1.5) V a variation of 10 mV was simulated. The output voltage is more stable for lower temperatures than for higher as can be seen in Figure 5.42. In the range from (-40 to 60) °C the output voltage changes in average by 25 mV.

A Monte Carlo simulation was made to evaluate the process variation. The spread of the output voltage has a standard deviation of 18.5 mV. This rather high variation shows the need for the trimming in the transistor-BG.



Figure 5.43: Measurement of the trimming for the transistor-BG.

#### 5.9.3 Usage of the transistor-BG in PSPPv3

The submission deadline for the PSPPv3 didn't allow to update the regulators. Therefore the same regulators as in the PSPPv2 were used also for the v3. They were implemented such that the regulators could be deactivated and external voltages provided. The transistor-BG was integrated in the PSPPv3 for testing purposes.

The trimming was tested by shifting a '1' through the 16 trim bits. The result of the measurement is shown in Figure 5.43. The behavior corresponds to the simulation shown in Figure 5.41 on the preceding page.

#### 5.10 Radiation hard regulators

All PSPPs used in one serial power chain are connected through bias resistors to a single common power line, as indicated in Figure 4.6 on page 43. The powering scheme foresees the use of a shunt regulator, also mentioned in section 2.3.3. A current flows from the PSPP power supply through the bias resistor and the shunt regulator of the chip. The shunt regulator generates the required operation voltage for the chip. The supply voltage is large enough to provide all chips in the chain with sufficient current in all conditions. However, the shunt regulator must have an amply input range to operate at different currents. Depending on the chain status, the PSPP supply current changes. For example, the activation of a bypass changes the module voltages. The bias resistor values should be adjusted in function of the chain length and position in the chain to optimize the power and currents.

The PSPPv3 uses the same shunt regulator as its predecessors [21]. An updated version was developed, which is using only thin-gate transistors for radiation hardness. R. Ahmad and M. Karagounis helped with the development and design of the regulators. The final implementation and layout were done by me.



Figure 5.44: Schematic of the shunt regulator circuit developed for the PSPP.

#### 5.10.1 Shunt regulator

The principle of a shunt regulator is already given in section 2.3.3. The slope resistor R3 from Figure 2.8 on page 20 is needed for parallel operation of shunt regulators, which is not required in the PSPP. To save power and get a flat output regulation curve, this resistance is omitted.

The schematic of the shunt regulator implemented is shown in Figure 5.44. The supply voltage at  $V_{sup}$  is input and output at the same time. The level of the voltage is defined with the voltage divider R1 and R2 and the bandgap voltage ( $V_{BG}$ ). To support voltage levels of more than 2 V, the shunt devices is cascoded and realized with transistors M1 and M2. The error amplifier A1 regulates the gate of transistor M1 to keep the potential at  $V_{sup}$  constant, independent of the input current. M2 is biased by a dedicated biasing circuit, also used to bias the error amplifier and not shown in the schematic. The biasing circuit was originally developed for the ShuLDO together with the error amplifier [35]. For stabilization purposes, a dominant pole was added by the external capacitance  $C_E$ , indicated with the dashed lines in the schematic. An additional zero is created with the feedback path R3 and C1. A parasitic pole can be suppressed by selecting R3 and C1 accordingly. This improves the stability and phase reserve of the regulator.

#### 5.10.2 Linear regulator

Besides the supply voltage, three more regulated voltages are used in the PSPP chip: analog supply  $V_{DD_A}$  at 1.5 V for the ADC and comparator, 1.2 V digital supply  $V_{DD_D}$  for the logic and communication and a 1 V reference voltage  $V_{ref}$  for the ADC.

These voltages are generated by three linear regulators. All three regulators are the same except for the output voltage.

The schematic of the linear regulator is shown in Figure 5.45 on the next page. M1 is the pass device that is controlled by the error amplifier. The pass device is regulated so that the voltage  $V_{out}$  is constant to the reference voltage. The output voltage is compared with the bandgap voltage  $V_{BG}$ . To define the value of the output, the voltage divider



Figure 5.45: Schematic of the linear regulator circuit designed for the PSPP.

R1 and R2 is used. For each of the three linear regulators, a different value is used for the divider. The linear regulator was designed to deliver a current of 30 mA.

It was decided to use an NMOS pass device because no low-drop operation is required. The capacitor C3 is added internally on the output to create the dominant pole. Its time constant is defined by the voltage divider R1, R2 and the Miller-capacitance. The Miller capacitance is formed by C1 between the input and output of the amplifier. With an amplifier gain of A, the Miller-capacitance has an observed value of  $C1 \cdot (1 + A)$ . The additional capacitor C2 limits the bandwidth of the amplifier and therefore reduces the Miller effect at high frequencies. This suppresses the effects from the parasitic pole introduced by M1 and optimizes the phase reserve. R3 adds with C1 an additional zero to suppress the parasitic pole of the amplifier.

#### 5.10.3 Regulator functionality test

The regulator was tested by measuring the input curve. The input current was increased while the output voltages are monitored. The results for the PSPPv3 and PARC are shown in Figure 5.46 on the following page.

Both regulators require a certain current to properly operate. The regulator of the PSPPv3 shows a hysteresis. The output voltage stays stable at lower currents than required for the power-up.

The new shunt regulator implemented in the PARC has a flat output voltage. As mentioned in section 5.10 there is no series resistor in front of the shunt regulator. This is not the case in the PSPPv3 and therefore  $V_{sup}$  is increasing.

Further, a load test was performed with the regulators in the PARC. Two test results are shown in Figure 5.47 on the next page. The output of the linear regulators was connected to a source meter where a load current was drawn. The output voltage stays stable until a load current of about 5 mA below the supply current. If the load becomes too large, the output drops. The highest regulated voltages always drops first.



Figure 5.46: Output voltage as function of the input current for the regulators in the PSPPv3 (a) and PARC (b).



Figure 5.47: Load test with the linear regulators in the PARC. (a) shows test results from  $V_{DD\_A}$  with a supply current of 15 mA while (b) shows  $V_{DD\_D}$  with a supply current of 35 mA.

#### 5.10.4 PARC regulator irradiation

The PARC was irradiated in 2017 [106, 107]. The focus during this irradiation was the verification of the regulators. More irradiation results are described in section 6.3.

The regulator voltages were monitored while being irradiated with X-ray up to a total ionizing dose (TID) of 600 Mrad. As can be seen in Figure 5.48 on the facing page, the voltages are increasing with irradiation. The voltage from the BG is changing with accumulated dose. However, the ratio between the regulator outputs ( $V_{sup}$ ,  $V_{DD_A}$ , etc) and  $V_{BG}$  is stable. This indicates that primarily the bandgap is affected by the radiation. It is suspected that the reference diodes are changing with radiation.



Figure 5.48: Measured voltages of the PARC under irradiation [106].



Figure 5.49: Simplified schematic of the regulators used in the PSPPv4.

#### 5.10.5 Updated regulator concept

An update of the regulators was made to improve radiation stability. It was therefore decided to use the transistor-BG based on transistors, which showed promising results in the PSPPv3. This BG is designed to operate with 1.5 V only. Therefore, it cannot be connected directly to the voltage generated by the shunt regulator. A new scheme is applied with two BGs for the PSPPv4 as shown in Figure 5.49.

To power the transistor-BG a linear regulator is used, which itself gets the reference from the PARC BG. This internal supply voltage  $(V_{int})$  was set to 1.0 V in nominal operation, which is enough for the transistor-BG to operate. The rise of the diode-BG seen in Figure 5.48 indicates that the voltage will stay within the operating limits of the transistor-BG.

The trimming of the transistor-BG was set to 680 mV, which is in the center of the trim range. It can be adjusted through logic. Additionally, one bit is available externally. Tying this pad to ground increases the BG voltage to the maximum. This was included, should the chip not work because of a too small supply voltage.

The linear regulators are the same from the PARC, except that the voltage divider was updated to work with the adjusted BG voltage.



Figure 5.50: Line test of PSPPv4 regulators.

#### 5.10.6 PSPPv4 regulator tests

A line test was performed with the PSPPv4 and PATT to verify the function of the regulators. The voltages ramp up as expected with increasing input current as seen in Figure 5.50. A current of 9 mA is required for proper operation. This is higher than for the PSPPv3, though expected due to the additional BG and comparator.

The PATT uses the same regulator as the PSPPv4 showing a similar behavior as shown in Figure 5.50. More test results with the regulators are shown in section 6.3.2.

#### 5.11 Power-on reset

A power-on reset circuit was developed to set the chip in a known and valid state after start-up. The design was made by R. Ahmad [108] while the layout was done by me. The reset circuit is based on a bandgap circuit using the same transistor structures as in the transistor-BG. A bandgap-based power-on circuit is independent of the supply voltage rise time. This is an advantage against capacitance-based power-on reset circuits.

Figure 5.51 shows the schematic. Like for a bandgap circuit two paths with diodeconnected transistors are used, where the right path (M3 & M4) is eight times larger than the left. There are two DTMOS transistors in series to achieve the desired voltage of 1 V. A point exists where the voltages Vm and Vp are the same. This point is also a turning point where Vp becomes larger than Vm. Instead of regulating at this voltage, the power-on reset uses a comparator to generate a reset signal. Figure 5.52 on the next page shows the simulated voltages during the power-up.

The nominal supply voltage for the digital circuits is 1.2 V, but they are operational already at lower voltages. The circuit was designed to release the reset at a voltage of 1 V. This voltage allows to properly reset the circuit, while there is still some margin in case of a drop in the supply.

#### Test of the power-on reset

The output of the power-on reset is also measured during the regulator test (see section 5.10.6). Figure 5.53 shows the output signal as function of the  $V_{DD_D}$  voltage. The reset is removed at 1.04 V when powering up, matching well the desired voltage of 1 V. It stays active until 0.98 V when powering off.

The logic is properly reset after a power-up and all register are read with the defined initial value. For testing purposes, the reset signal generated by the power-on reset was not directly connected to the reset input of the logic. If the connection is not made, the registers have random values after power-up. For a future version, this connection can be made internally of the chip.



Figure 5.52: Simulation of the power-on reset with different rise times (10 µs, 100 µs and 1000 µs). Top is the monitored supply voltage, Middle are the input voltages of the comparator, bottom is the output signal.



Figure 5.53: Power-on reset signal during the regulator test of the PSPPv4.

#### 5.12 Shift register for SEU tests

A test logic was implemented to compare the TMR logic with simple logic and to measure the SEU rate. This test logic is made of two shift registers, where one has no redundancy while the other implements full triplication. The simplified schematic of the test logic is shown in Figure 5.54.

There are three inputs: data, mode and clock, and two outputs: out\_simple and out\_TMR. The signal applied at the data input is shifted through both registers at each rising edge of the clock if the mode is set. Each register stores the current value if the mode is at '0'.

In the PARC the shift registers were realized with a length of 500 bits.

The same shift register was integrated in the PATT. However, the register in the PATT was realized with 3000 bits to obtain a higher statistic during irradiation tests.



Figure 5.54: Schematic of the SEU test shift register.

### Chapter 6

# Operation and performance measurements

The PSPP prototypes were tested under different conditions. Tabletop tests were made to verify the basic functionality of the chips and communication in a dummy serial power chain. These results were already described in the previous chapter.

Irradiation campaigns were performed to verify the behavior of the prototypes under irradiation. The focus here was on total ionizing dose (TID) and single event upset (SEU) hardness. Chips were further tested in a climate chamber over several weeks to check if the chip can operate over a long time.

The PSPPv3 was also used in a system test together with pixel modules in a fully functional serial power chain. The PARC chip was widely used as a physical layer for a detector control system (DCS) controller implemented in an FPGA during the tests. Developments are ongoing about a possible integration of the PSPPv4 into future system tests.

This chapter gives an overview of the tests performed. Several bachelor and master students performed measurements under my guidance or helped with the measurements. They are mentioned if they were leading the measurement and analysis.

#### 6.1 Initial test setup

An automatic test setup was developed to verify the function of the prototype chips. Such an initial test setup could also be used during production. This setup was originally developed for the PSPPv3 by Yann Narbutt and Jakob Schick [109, 110]. It was updated and enhanced during further tests and for usage with the PSPPv4 and PATT by myself and other students.

The setup uses an ARTY board [111] with an ARTIX-7 FPGA as control unit. To control the PSPP under test, an SCB master was developed and integrated into the firmware. A PARC chip is used as a physical layer for SCB and mounted on the measurement board. The measurement board houses several ADCs to monitor and measure voltages. It includes also digital-to-analog converters used to apply voltages on the analog inputs of the PSPP under test. Two power boards are generating the supply voltage for the chip under test and a current to evaluate the bypass. All three boards and the FPGA are shown in Figure 6.1 on the following page.



#### Development of a DCS Chip



Figure 6.1: FPGA and cards for the initial test setup. Different adapter boards are used for each Figure 6.2: Carrier board (a) prototype chip. for the PSPPv4.

Depending on the prototype chip, a different adapter board is used. The chip under test is mounted on a carrier board and plugged into the carrier support as shown in Figure 6.2. The adapter boards were fabricated to be cut apart, into the test adapter and the carrier support. If used for desktop tests, no cable is required. For irradiation tests, the carrier support is separated and connected through a ribbon cable. This allows placing the chip for example in an irradiation chamber, while the rest of the setup is outside.

The FPGA firmware is based on a softcore processor at the core using a C-program to write the test functions. It communicates over USB with a PC where the data are logged and stored.

#### 6.2 Outer barrel demonstrator program

To verify the concept of serial powering, a system test setup is built at CERN [112]. This system test included several different prototypes for mechanical, thermal and electrical tests. The system test is based on a design, which is foreseen for the three outermost layers of the ITk Pixel Detector (see section 1.2.2). The PSPPv3 is used in the electrical prototypes to realize the control & feedback path described in section 4.2.2.
### 6.2.1 Chip probing

Additional PSPPv3 chips were ordered for usage in the demonstrator program. To verify that these chips were functional, a probe card was fabricated. The probe card replaces the carrier support card in the initial test setup. It is used to contact the chips before wire bonding, allowing to run an initial test on bare dies. The same setup as described in section 6.1 was used to run the test. Jakob Schick performed the chip probing at the University of Geneva [110]. 146 PSPPv3 were tested in total. The result is that

- 44% were grade A, which means that the chip is fully functional;
- 48% were rated B, meaning that some ADC channels could not be properly read out;
- 7% were rated C, where either the temperature or voltage is not readable;
- Only one chip failed, where the regulator was not working.

Chips with grade A or B are suited for usage in the demonstrator. Even though Bgrade chips have some problems with the ADC, they can still be used as the module temperature and voltage can be read. This gives an overall yield of 92 %, which is a very good result.

#### 6.2.2 Electrical prototype

The electrical prototype is designed to implement all required elements for operating multiple serial power chains including DCS monitoring and control, an interlock system, power and readout. Cable and services with realistic lengths are used. New services and mechanical structures were developed for the setup using available prototypes for the ASICs. The pixel modules are based on the FE-I4 readout chip [13] and the PSPPv3 is used for the control and feedback path.

Figure 6.3 shows part of the electrical demonstrator. The flexible PCB with the populate PSPPv3 is located below the mechanical support. The implementation of the DCS in the demonstrator is described in [113].

During the commissioning and operation of the demonstrator, the PSPP proved to be very useful. The monitoring function of the PSPP help to debug the system. By measuring the module voltage, it was possible to check which modules were configured.



Figure 6.3: Electrical system test setup with inclined dual and flat quad modules [114].



Figure 6.4: Over-voltage protection test. Ch1 = plus contact of dummy, Ch2 = minus contact of dummy, Ch3 = chain voltage, Math = voltage across dummy (Ch1-Ch2)

Further, the over-voltage module interlock protected the chain from harm. The FE-I4 can cause transients and voltage spikes in neighboring modules if the chip becomes noisy. This behavior of the FE-I4 is because an overload of the ShuLDO can cause a voltage change. Updates of the ShuLDO are being developed to prevent this in a future version. In the demonstrator, the PSPP detected these spikes and activated the bypass, preventing an over-voltage on the module.

#### Over-voltage protection test

To verify in more detail the over-voltage protection, a dedicated test was performed with an electrical prototype with seven modules. One of the modules was replaced with a fuse and a resistive dummy load, while the rest of the setup was kept the same. Once the current was switched on, the fuse opened resulting in an over-voltage. This could correspond to the case where a connector fails.

The PSPP in parallel activated the bypass and closed the chain. This is shown in Figure 6.4. The voltage at the minus contact of the resistor dropped when the fuse opened, while the plus increased. As soon as the bypass activates, the difference across drops to 0 V.

## 6.3 Irradiation tests

Several irradiation campaigns were performed during this work with the PSPP prototypes. Table 6.1 on the facing page lists when and where the irradiations were performed. The X-Ray irradiations aimed to verify the operation of the chip at the expected total ionizing dose (TID) defined in Table 5.1 on page 46. The protection against single event effect (SEE) was tested with heavy-ion and proton beams. The focus was on the measurement of the single event upset (SEU) cross-section. See Chapter 3 for more details on radiation effects.

| Date           | Chips                         | Irradiation<br>beam | Location    |  |
|----------------|-------------------------------|---------------------|-------------|--|
| 07.08 25.08.17 | PSPPv3 & PARC                 | X-Ray               | CERN        |  |
| 07.12 08.12.17 | PARC                          | Heavy Ion           | HIF Louvain |  |
| 01.02 02.02.19 | PARC, PSPPv4,<br>PATT & FE-i4 | Proton              | PSI         |  |
| 13.02 23.02.19 | PSPPv4                        | X-Ray               | CERN        |  |
| 26.02 11.03.19 | PATT & PSPPv3                 | X-Ray               | Bonn        |  |

Table 6.1: List of irradiation campaigns with PSPP prototypes.

### 6.3.1 2017 TID irradiation

The PSPPv3 and PARC chip were irradiated at CERN to 600 Mrad. These measurements were performed by Y. Narbutt and P. Bergmann with my guidance. The results of these irradiations are presented in [106, 107, 109]. The irradiation of the PARC chip is also described in section 5.10.4.

#### PSPPv3 result summary

The results from the PSPPv3 were also reported in [106]. The PSPPv3 was supplied externally as the integrated regulator is not radiation hard. The external supply allowed to measure the current for the digital and analog parts individually. A rise in the digital supply current was observed, as expected by the RINCE effect (see section 3.1.1 and [42]). The logic including ADC worked during the entire test.

The bypass was not activated during the irradiation. Measuring the bypass onresistance  $(R_{on})$  before and after irradiation showed an increase of  $6 \text{ m}\Omega$ . This means a 24% increase of  $R_{on}$  resulting also in higher power consumption. Improvements in the bypass were made to reduce  $R_{on}$  as described in section 5.8.

## 6.3.2 2019 TID irradiation

#### Voltages of PSPPv4

The PSPPv4 chip was irradiated at CERN to 650 Mrad. This was done in two steps as for the 2017 irradiation. First, at a dose rate of 250 krad/h to measure more precisely the expected bump in the leakage current. After 10 Mrad, the rate was increased to 3 Mrad/h, the maximum achievable by the X-ray machine. The voltages and supply current of the PSPPv4 under irradiation are plotted in Figure 6.5 on the following page.



Figure 6.5: Measured voltages of the PSPPv4 under irradiation.

No rise in the current is observed at the expected value of 1 Mrad for the RINCE effect (see section 3.1.1). This is because the chip is supplied with a constant current. As long as the supply current  $I_{sup}$  is large enough, it can cover for the increase in the leakage current. This is a useful feature of the PSPP powering concept. It allows having a constant power consumption with dose.

It can be observed that  $V_{int}$  is rising. This is expected as this is the voltage based on the diode-bandgap (BG) as seen in section 5.10.4. The voltage did not rise above the maximum allowed level of 1.6 V. Therefore the defined default voltage seems well suited.

On the other hand, a rise in the other regulated voltages was still observed starting at 100 Mrad. This was not expected and the trimming value was adjusted twice to keep the supply voltages within safe operating ranges. By calculating the ratio between the regulated voltage and the  $V_{BG}$  it can be shown that it is again the reference voltage drifting. The BG trimming was adjusted twice at ~120 Mrad and 350 Mrad. The transistor-BG (see section 5.9) will also be used in the ASICs for the ATLAS ITk Strips detector. However, there TID values smaller than 100 Mrad are expected until which the transistor-BG is stable.

Not many circuits have been studied at such high doses. One explanation could be that the resistors used in the BG are not stable with irradiation. Even though measurements on the irradiated chips showed some increase in the internal resistance, the effect was too small to account for the observed voltage drift. More likely is, that changes in the threshold voltage above 100 Mrad appear and lead to the rise in the voltage. Nevertheless, the chip is still functional at doses above 500 Mrad even though the bandgap has to be trimmed.

#### Voltages of PATT

A PATT chip was irradiated in Bonn with a rate of 270 krad/h to about 10 Mrad/h and later with 8.3 Mrad/h to 800 Mrad. Figure 6.6 on the next page shows the voltages for



Figure 6.6: Measured voltages of the PATT under irradiation.

the PATT. To verify that the drift of the transistor-BG is not due to the changing  $V_{int}$ , an external  $V_{int}$  was provided during the irradiation. The same rise of  $V_{BG}$  was observed as for the PSPPv4. No trimming was used for the PATT irradiation.

In contrast to the PSPPv4, a bump in  $I_{sup}$  can be observed for the PATT at 1 Mrad. The supply current was chosen lower for the PATT than the PSPPv4 and is not large enough to cover the leakage current rise. The increase of the supply current above 200 Mrad is not yet investigated. A possible reason could be the high regulator voltages, as the BG was not trimmed.

#### **Bypass**

During the high dose rate measurement, the bypass was alternating open and closed for 1 h. A current of 1.5 A was applied when the bypass was closed.

The on-resistance of the bypass was  $4.9 \pm 0.7 \,\mathrm{m}\Omega$  and the off-resistance was  $90 \pm 50 \,\Omega$  over the entire irradiation. The uncertainty on the off-resistance has a large value because the setup is not suited to measure small currents for the bypass. It was designed for large currents when the bypass is on.

The on-resistance as a function of dose is shown in Figure 6.7 on the following page. Each point is the average over an "on"-period. At the beginning of the irradiation  $R_{on}$  was  $4.8 \pm 0.7 \,\mathrm{m\Omega}$  while at the end it was  $4.9 \pm 0.7 \,\mathrm{m\Omega}$ . The increase seen with the PSPPv3 was not observed with the PSPPv4.  $R_{off}$  stayed also stable within the uncertainty during the irradiation.

#### 6.3.3 SEU cross-section

A test logic was implemented in the PARC and PATT test chip to measure the SEU crosssection of the technology (see section 5.12). The rate was measured first at the heavy ion facility of the University of Louvain. A second measurement was done at the proton



Figure 6.7: Measured on-resistance of the PSPPv4 bypass under irradiation.

irradiation facility of the Paul Scherrer Institut (PSI) in Villigen. Previous irradiation also performed at PSI gave an SEU cross-section of  $4.83 \pm 0.10 \times 10^{-14} \text{ cm}^2$  [90].

#### Heavy ion SEU measurement

The measurement with heavy ion was performed by J. Kraus [115]. The SEU crosssection was evaluated for the simple shift register. It is plotted as a function of the linear energy transfer (LET) in Figure 6.8.

There is a saturation of the cross-section at higher LET, where every hit creates an upset. A cross-section from  $((0.26 \pm 0.04) \times 10^{-8} \text{ to } (4.53 \pm 0.20) \times 10^{-8}) \text{ cm}^2$  was found. There was also a higher cross-section for a flip '1' to '0' than '0' to '1'. The cross-section becomes flat at high LET. Each of the heavy ions causes an upset, while the lighter ions



Figure 6.8: SEU cross-section as a function of LET measured with the PARC [115].

might not all deposit enough charge to cause upsets.

As there are no heavy ions in the radiation environment for ATLAS, a conversion was performed to estimate the cross-section according to [48]. Using the Weibull fit from Figure 6.8 on the facing page a cross-section of  $2.26 \times 10^{-14} \text{ cm}^2$  was obtained.

In the triplicated register only four upsets were detected. These were observed at reduced frequencies, where the protection is less efficient.

#### **Proton irradiation**

A PARC, PATT and PSPPv4 chips were tested during the irradiation at PSI. The results of the SEU cross-section for the PATT chip are listed in Table 6.2. A total cross-section over all runs of  $4.06 \pm 0.08 \times 10^{-14} \,\mathrm{cm}^2$  was found. The PATT collected a dose of 1.46 Mrad over the entire proton irradiation.

The data from the PARC irradiation was analyzed by M. Caspar [72]. A cross-section of  $3.9 \pm 0.3 \times 10^{-14}$  cm<sup>2</sup> was found. A dose of 390 krad was delivered to the PARC. The results from the PSPPv4 are discussed in section 6.3.4.

**Simple shift register:** For analysis of the simple register, only events are considered where no hit in the triplicated register was observed. Discussion of the triplicated register and why these events were filtered follows further below. 2399 SEUs were used for the analysis.

| Run | Energy<br>[MeV] | $\frac{\mathbf{Fluence}}{[p/cm^2]}$ | SEUs<br>[bit] | $\sigma_{0 \rightarrow 1}   [\mathrm{cm}^2]$ | $\sigma_{1 \to 0}  [\mathrm{cm}^2]$ | $\sigma  [{\rm cm}^2]$          |
|-----|-----------------|-------------------------------------|---------------|----------------------------------------------|-------------------------------------|---------------------------------|
| 3   | 230.3           | $1.057\times 10^{12}$               | 108           | $(3.2 \pm 0.5) \times 10^{-14}$              | $(3.6\pm 0.5)\times 10^{-14}$       | $(3.4 \pm 0.3) \times 10^{-14}$ |
| 4   | 230.3           | $8.130\times10^{12}$                | 985           | $(3.6\pm 0.2)\times 10^{-14}$                | $(4.7\pm 0.2)\times 10^{-14}$       | $(4.0 \pm 0.1) \times 10^{-14}$ |
| 5   | 230.3           | $5.132\times10^{11}$                | 62            | $(3.6\pm 0.7)\times 10^{-14}$                | $(4.4 \pm 0.8) \times 10^{-14}$     | $(4.0 \pm 0.5) \times 10^{-14}$ |
| 6   | 200.4           | $6.296\times10^{10}$                | 9             | $(4.3\pm 2.2)\times 10^{-14}$                | $(5.2 \pm 2.3) \times 10^{-14}$     | $(4.7 \pm 1.6) \times 10^{-14}$ |
| 7   | 200.4           | $1.727\times 10^{12}$               | 236           | $(4.0\pm 0.4)\times 10^{-14}$                | $(5.1 \pm 0.5) \times 10^{-14}$     | $(4.6 \pm 0.3) \times 10^{-14}$ |
| 8   | 150.0           | $7.558\times10^{11}$                | 82            | $(3.2\pm 0.5)\times 10^{-14}$                | $(4.1\pm 0.6)\times 10^{-14}$       | $(3.6 \pm 0.4) \times 10^{-14}$ |
| 9   | 99.8            | $5.895\times10^{11}$                | 51            | $(2.8\pm 0.6)\times 10^{-14}$                | $(2.9\pm 0.6)\times 10^{-14}$       | $(2.9 \pm 0.4) \times 10^{-14}$ |
| 10  | 51.7            | $3.694\times10^{11}$                | 38            | $(2.7\pm 0.7)\times 10^{-14}$                | $(4.1 \pm 0.9) \times 10^{-14}$     | $(3.4 \pm 0.6) \times 10^{-14}$ |
| 11  | 230.3           | $6.518\times10^{12}$                | 828           | $(3.8\pm 0.2)\times 10^{-14}$                | $(4.7 \pm 0.2) \times 10^{-14}$     | $(4.2 \pm 0.2) \times 10^{-14}$ |

Table 6.2: SEUs observed in the simple register at PSI. Run 3-11 were measured with the PATT.

The bits shifted through the registers were randomly generated. This created some fluctuation between numbers of '1' and '0' in the register. This was taken into account when calculating the cross-section. An average of 1501 '1' per 3000 bits were in the registers over the entire measurement. Table 6.2 shows that the cross-section for flips





Figure 6.9: SEU cross-section as a function of proton beam energy measured with the PATT.



 $1 \rightarrow 0$  is larger than for  $0 \rightarrow 1$ . This was also already observed with the heavy ion irradiation.

The cross-section for each energy is shown in Figure 6.9. It is energy independent within the uncertainty, except for 100 MeV where a lower cross-section was observed. Because an SEU is a random event, the distribution of the observed upsets should follow a Poisson-distribution. This was also the case as shown in Figure 6.10. The expectation value of the Poisson distribution is  $N_{SEU}/N_{Registers} = 2399/3000 = 0.800$  for the simple register. From the fitted Poisson distribution, an expectation value of  $\lambda = 0.801 \pm 0.006$  and amplitude of  $A = 3001.5 \pm 1.5$  was obtained, matching the data. The uncertainty of the fluence was assumed to be smaller than the statistical uncertainty on the measured bit flips.

**Triplicated register:** While most of the time the content in the triplicated register was the same as what was written, there were some events where a difference was observed. The runs in Table 6.2 on the previous page indicate the duration of active beam. To acquire the data, a python script was used. This script was launched for some runs multiple times, in case problems were observed. The run was not always stopped for this, to reduce the setup time. Hereafter, script-run means the events acquired while running the script.

In 10 of these events, they were the first events of the script-run. Most of these events had more than 100 upsets observed with a similar amount of upsets in the simple register. There was a bug in the script, that didn't properly initialize the shift register at the beginning causing these upset. Therefore they can be ignored.

Another 10 events were the 2<sup>nd</sup> and 3<sup>rd</sup> event of a script-run. In these events, the simple and triplicated register had the exact same data. This indicates also some mismatch in the readout system and the events are therefore ignored.

Two script-runs observed in all events several upsets in both registers. For one script-

run, the simple and TMR register almost had the same data. A few single bits were different between the two shift registers, which are most likely SEUs in the simple register. This script-run was also ignored as it shows again a problem in the readout system. It looks as if the data was not completely shifted through, creating an offset in the data. The other script-run was performed with no clock during the measurement. Unfortunately, as there seemed to be a similar problem with a shift in the readout, no conclusion can be drawn from these events.

Two consecutive events were observed in the middle of the run, where again both shift registers had several hundred upsets. In the first event also some mismatch between the triplicated and simple register was found. The second had only one single bit different between the two registers, which is most likely an SEU in the simple register. As it is in the middle of the run, it indicates another source of problems than in the readout system. Possible sources are a transient in the clock input pad or the mode pad, causing a mismatch during the shifting process.

The last event to mention also occurred in the middle of the run. There a single SEU form '1' to '0' in the triplicated register was found, while three SEUs happen in the simple register. This could be a real SEU in the triplicated register.

A limit of the cross-section for the TMR register can be made with filtering the above cases:

 $\sigma_{TMR} < 1.7 \times 10^{-17} \, \mathrm{cm}^2$ 

A longer irradiation time or larger shift registers would be needed to allow for a better statement and a measurement of the cross-section for the triplicated registers. This measurement showed that the triplication improves the cross-section by at least 2000. This is also in the same order as found by [69] where a triplicated latch was tested (see also section 3.2.2).

The limit is still ten times larger than the required cross-section defined in equation 5.1. Therefore, about 8 SEUs per month could be expected with this cross-section in the entire ITk Pixel detector. To further increase the tolerance of the bypass control bits, the registers could be replaced by DICE cells or the redundancy could be increased to five bits.

## 6.3.4 Upsets in the PSPPv4 logic

The internal registers of the PSPPv4 were read out during the irradiation at the PSI. They were read every 2s and the configuration registers (bypass and bandgap trimming) written every 5 or 10s. The writing of the registers was done to make sure, that the chip stayed operational and to reset it if any upsets should occur. In total the PSPPv4 received a dose of 1.9 Mrad.

#### Read/write and constant registers

The registers of the PSPPv4 chip are listed in Table 5.5 on page 66. Register 0,1 are constants and were always read correctly. The status register (nb. 4) is defined by

external signals and was also always read the same. A non-expected value in these registers could indicate a bit flip in the communication logic, which never happened.

All bits in the digital output register were set either to '1' or to '0'. When reading back this register, only these two values were read. The same for the bandgap trimming registers (nb. 10-12) which were always written and read back at the default value.

All these registers are protected by triplication. No SEU was observed in these registers indicating the correct operation of the protection. The total fluence observed by the PSPPv4 during the test beam was  $3.42 \times 10^{13} \text{ p/cm}^2$ . From the trimming and status register, the SEU cross-section for the triplicated bits has to be  $<1.5 \times 10^{-15} \text{ cm}^2$ .

#### Bypass register

The bypass register is the most critical register in the PSPP. As described in section 5.5.2, the bypass can be set from three different sources: first by command, second by an over-voltage and third by an over-temperature. The automatic over-temperature protection was switched off during the irradiation at the proton irradiation facility while the over-voltage protection was left activated.

There were some unexpected values read-back from this register, where mainly the overvoltage (OV) and over-temperature (OT) flags were activated. The voltages measured at the input of the comparators were during all these events far from the threshold voltage. Figure 6.11 shows two example events where the OV and OT flags were activated. The OV flag was activated once, while the OT flag was set five times. Another event looked like the chip was reset, as all registers including the bypass register went to the default value.

The suspicion is that a SET happened in the comparator. The comparators are not triplicated and connect to an asynchronous set input of the flip flops storing the flags. Therefore a transient on the comparator output could upset the flags. It should be



Figure 6.11: Possible single event transient (SET) in comparator activating the OT flag (a) and OV flag (b). Normally the Temp0 or Temp1 voltages should be below ThTemp, respectively V\_by larger than ThMod to activate the flags.

reminded that the PSPPv4 has two comparators for temperature. The two outputs are combined with a logic OR as described in section 5.5.2. Therefore it is normal that the OT flag is activated more often. Furthermore, the same comparator circuit is used in the power-on reset. This could therefore also cause a reset in the chip.

Nevertheless the observed flags, the bypass was never switched unintentionally during the entire irradiation campaign. The command bit used for manual activation and the enable bits are therefore properly protected.

#### Simulation of the comparator regarding single event transients

A simulation was performed to verify if an SET in the comparator could indeed cause an upset of the flag. This was done based on the methods described in section 3.1.5 and with parameters from [59].

Figure 6.12 shows the simplified schematic of the test bench used. The complete comparator schematic was used, but with an ideal voltage source for the supply voltage. Additionally, the logic for the over-voltage flag was added at the output. A charge pulse was injected into the two nodes X and  $\overline{X}$  to simulate a hit in a transistor from the logic gates.

Figure 6.13 on the next page shows the result of the simulation. The top graph shows the current pulses induced in nodes X and  $\overline{X}$ . The first pulse is on node X, while the second pulse on node  $\overline{X}$ . The second and third graphs show the voltage at node X and node  $\overline{X}$  respectively. The fourth graph is the output of the comparator and the last the status of the OV flag. The colors represent the amount of charge injected. Red is a pulse of 100 fC, green corresponds to 40 fC and blue to 10 fC. Similar values were also used by [59] and correspond to a particle with an effective LET of about  $2 \text{ MeV cm}^2 \text{ mg}^{-1}$ ,  $8 \text{ MeV cm}^2 \text{ mg}^{-1}$  and  $19 \text{ MeV cm}^2 \text{ mg}^{-1}$  respectively. These charges could also be deposited by single protons as simulated by [116]. It can be seen that the first pulse causes an upset for charges  $\geq 40 \text{ fC}$  in the flip flop. Node  $\overline{X}$  is not upset even at 100 fC.

Node X is driven by the output stage of the baker comparator, while the inverter drives  $\overline{X}$ . The inverter is stronger than the output stage and has a larger load capacitance from



Figure 6.12: Simplified schematic for SEU simulation with the comparator.



Figure 6.13: Simulation of an SET in the comparator. Red indicates a pulse of 100 fC, green corresponds to 40 fC and blue to 10 fC. See text for signal description.

the output buffers. Therefore a larger charge is required to upset node  $\overline{X}$  than node X.

Even though this simulation is not exhaustive, it shows that a SET in the comparator could set the flags as observed in the data. Furthermore, the voltage pulse created with an injected charge of 100 fC in node X is larger than the tolerance of the technology.

The inverter and buffer used at the comparator output are cells from the standard logic library. Assuming that the upset occurred in these as simulated, the cross-section should be similar as for the registers. Since registers also include buffers and inverters as described in section A.1.2. All seven events for the four comparators used in the PSPPv4 give a cross-section of  $5.1 \pm 1.9 \times 10^{-14}$  cm<sup>2</sup>. This is within the uncertainty equal to the rate measured with the shift register.

Methods to improve the SET tolerance of the comparator can be applied as described in section 3.2. A full triplication of the comparator is probably too intensive in power and area consumption. Adjusting the transistor sizes of the output stage or inserting additional filters would be more suited here.

## 6.4 Stability and long term operation

The PSPP chip has to operate over the entire lifetime of the experiment, which is estimated to ten years. To verify that the bypass and communication are working over a longer time, the chip is operated in a climate chamber at elevated and changing temperatures.

#### 6.4.1 PSPPv3 long term test

A PSPPv3 chip was tested at different temperatures in a climate chamber for a total of 21 days. The chip was operated at  $0^{\circ}$ C,  $30^{\circ}$ C and  $50^{\circ}$ C for seven days each and a bypass current of 3 A was used. This test was done by P. Bergmann during his bachelor thesis [107] and also presented in [106].

The chip worked at all temperatures as intended. A bypass resistance of  $19.5 \pm 0.1 \text{ m}\Omega$  was measured for the highest temperature, which is the worst case. The measurement for the bypass is shown in Figure 6.14. The bypass was switched on during 23 hours and off for one hour. This cycle was repeated during the test.

The chip temperature was measured with an NTC on the backside of the carrier board. This temperature was 10 °C above the environment temperature, while the bypass was active. From this, the chip temperature was estimated to be  $15 \pm 1$  °C above the environment [107].

#### 6.4.2 PSPPv4 climate chamber test

Two PSPPv4 chips were operated over several weeks in a climate chamber. The first chip (#2) was operated during 19 days with a bypass current of 4 A. A second chip (#12) was put into the chamber and operated for 42 days with a bypass current of 5.6 A. There was a problem with the setup, so that chip #12 didn't switch off the bypass at the intended times. This was because an over-voltage was generated when the bypass got opened. The problem was only realized after 22 days of measurements. It could be fixed by adjusting the python routine. Therefore the results for chip #12 are sometimes



Figure 6.14: PSPPv3 bypass resistance during the long-term measurements [106].



Figure 6.15: Defined temperature profile for the climate chamber. The profile was repeated after 16 hours.

split in two. All data is included if only #12 is used. If marked #12a, it is for the first 22 days of measuring, while #12b for the remaining measurements.

The temperature profile as shown in Figure 6.15 was configured in the climate chamber. For the first days of operation with chip #2, a slightly different profile was used. There were shorter stable times in the beginning and instead of -15 °C were only 0 °C defined. The relative humidity was set to 10% during the entire test.

The test showed that the PSPPv4 can be operated with changing temperature cycles and at elevated temperatures. More stress tests are foreseen to be conducted with multiple chips. Due to delays in the schedule, it was not possible to include these tests in this work.

#### Chip temperature

The temperature was measured with two negative temperature coefficient (NTC) resistors on the carrier board shown in Figure 6.2a on page 100. The temperature sensors 1 (T1) is located closer to the chip than the sensor T0. An additional NTC sensor was located in the climate chamber to measure the environmental temperature. The temperatures measured for chip #12 are shown in Figure 6.16 on the next page. The temperatures for chip #2 look similar. Further it did not reach the configured -15 °C and the relative humidity increased at low temperatures.

The temperature measured on the carrier board is always a little higher than in the chamber. This comes from the fact, that the active bypass is dissipating in the order of (100 to 200) mW. Table 6.3 on the facing page lists the average temperature increase over the entire measurement. The increase of the bypass current from (4 to 5.6) A doubled the temperature difference of the chip.

The time where the bypass was open for #12a was rather short. Therefore the chip didn't have time to cool down before the bypass was activated again.

The current used in the PSPPv4 is with 5.6 A almost double than the 3 A used in the test with the PSPPv3. Similar temperatures were measured for both chips. In comparison with the PSPPv3, the PSPPv4 has a lower temperature rise, due to the almost four times smaller bypass on-resistance.



Figure 6.16: Temperature as a function of time for PSPPv4 chip #12 during the climate chamber long term test.

Table 6.3: Increase of temperature measured on the PSPPv4 carrier board during the long term climate chamber test.  $\Delta T0$  and  $\Delta T1$  are the difference between NTC0, resp. NTC1 and the chamber temperature.

| Chip    | Bypass state | $\Delta T0 \ [^{\circ}C]$ | $\Delta T1 [^{\circ}C]$ |
|---------|--------------|---------------------------|-------------------------|
| #2 —    | closed       | $4.7 \pm 1.4$             | $5.6 \pm 1.4$           |
|         | open         | $2.1\pm1.2$               | $2.1\pm1.2$             |
| #195    | closed       | $10.2\pm1.1$              | $12.6\pm1.1$            |
| <i></i> | open         | $6.0 \pm 2.7$             | $6.7 \pm 3.0$           |
| #12b —  | closed       | $9.8 \pm 1.5$             | $12.2 \pm 1.6$          |
|         | open         | $2.5\pm1.7$               | $2.9 \pm 1.8$           |

#### Bypass resistance

The bypass resistance was calculated from the measured bypass voltage and current. The data was grouped according to times, when the bypass was active and when it was inactive. To do this, the logged bypass status register was used. The time for switching was left out, to use only measurements in a stable state.

 $R_{on}$  increases by  $1.3 \pm 0.4 \,\mathrm{m}\Omega$  from a temperature below 0 °C to 60 °C. The mean values of  $R_{on}$  for both chips are shown in Table 6.4 on the next page for different temperatures. The bypass on-resistance  $R_{on}$  of chip #12 during the measurement is shown in Figure 6.17 on the following page. Again, a similar picture is obtained for chip #2. The oscillating pattern matches the temperature cycles of the climate chamber.

The off-resistance  $R_{off}$  was also measured. For chip  $\#2 R_{off}$  is  $127 \pm 50 \Omega$  and for chip



Figure 6.17: Bypass on-resistance as a function of time for PSPPv4 chip #12 during the climate chamber long term test.

Table 6.4: PSPPv4 bypass on-resistance during the long term climate chamber test.

| Chip | Temperature range $[^{\circ}C]$ | Resistance $[m\Omega]$ |
|------|---------------------------------|------------------------|
|      | -10 to 60                       | $5.2\pm0.6$            |
| #2   | <0                              | $4.7 \pm 0.4$          |
|      | $30 \pm 5$                      | $5.3 \pm 0.4$          |
|      | >55                             | $5.9 \pm 0.4$          |
| #12  | -10 to 60                       | $5.6\pm0.5$            |
|      | <0                              | $5.1 \pm 0.3$          |
|      | $30 \pm 5$                      | $5.7 \pm 0.3$          |
|      | >55                             | $6.4 \pm 0.3$          |

 $\#12 R_{off} = 53 \pm 22 \Omega$ . The lower off-resistance for chip #12 is because the current sense resistance was reduced by half. The differential amplifier used to measure the current has a lower limit of 10 mV, limiting the minimal current that can be measured.

#### Bandgap voltage

The bandgap is designed to have a small temperature dependency of the output voltage as described in section 5.9. To verify if this is the case, the output voltage  $V_{BG}$  from the transistor-BG is plotted against the temperature of the chip. Figure 6.18 on the next page shows the measured voltage as a function of the NTC1 temperature. The internal voltage  $V_{int}$  is shown too, which is dependent on the diode-BG.

The mean of  $V_{BG}$  is listed Table 6.5 on the facing page for different temperatures of both chips. The average of the chip temperature was taken in for different ranges of

the environmental temperature. Because the environmental chamber was less precise for lower temperatures, there is a rather large uncertainty on the temperature.

It can be seen that over the temperature range from (0 to 60) °C a change of -20 mV was measured for the transistor-BG. This matches the simulation results where over the range from (-40 to 60) °C a mean change of -25 mV was estimated (see section 5.9.2). For  $V_{int}$ , a change of -10 mV was observed in the same range. The temperature dependence of the diode-BG is lower than for the transistor-BG in the measured range.



Figure 6.18: Bandgap voltage  $V_{BG}$  and  $V_{int}$  as a function of the chip temperature measured with NTC1 for PSPPv4 #2.

| Chip             | Temperature range<br>[°C] | $V_{BG}$ [V]      | $V_{int}$ [V]     | Chip temperature<br>[°C] |
|------------------|---------------------------|-------------------|-------------------|--------------------------|
|                  | <0                        | $0.694 \pm 0.003$ | $0.925 \pm 0.006$ | $2\pm3$                  |
| #2  Vbg          | $30 \pm 5$                | $0.693 \pm 0.003$ | $0.921 \pm 0.060$ | $34 \pm 2$               |
|                  | >55                       | $0.680 \pm 0.004$ | $0.913 \pm 0.070$ | $65 \pm 3$               |
|                  | <0                        | $0.695 \pm 0.003$ | $0.899 \pm 0.007$ | $7\pm4$                  |
| $\#12  { m Vbg}$ | $30 \pm 5$                | $0.691 \pm 0.003$ | $0.896 \pm 0.007$ | $41 \pm 4$               |
|                  | >55                       | $0.677 \pm 0.004$ | $0.890\pm0.009$   | $71 \pm 4$               |

Table 6.5:  $V_{BG}$  and  $V_{int}$  for different temperatures.

# Chapter 7 Risk analysis for serial power

The serial power approach has never been used in a particle physics experiment before. A good understanding of the system is therefore required to prevent failures during operation. The DCS for a serial power chain as described in section 4.3 introduces additional active elements for protection. The parallel operation of the front-end chips introduces redundancy and therefore some protection against failures. The PSPP adds additional flexibility in operation and possibilities to recover from failures. Temperature and voltage monitoring are used to prevent any damage by taking corresponding actions.

On the other hand, the PSPP introduces an additional failure source. The chain becomes more complicated and a failure in the bypass could disable a working module.

To evaluate associated risks and assess potential benefits, a failure mode and effects analysis was made for the PSPP.

## 7.1 PSPP failure modes and effects analysis

The key points from the failure mode and effects analysis are listed in Table 7.1 on the next page. The full table is attached in Appendix C.

The analysis was performed by looking at the different elements in the PSPP. For each element, possible failure modes were identified and the effects of the failure on the serial power chain analyzed. Every failure was also analyzed in terms of occurrence probability and severity. The severity is deemed "catastrophic", in the case an entire chain is affected and "critical" if the failure mode causes a module to fail. Loss of monitoring or similar effects are assigned a "marginal" status. In addition, the detectability of a failure mode was estimated.

In response to the failure mode and effect analysis, several design changes were made. The most important actions taken are listed here:

- The location of the PSPP was moved. Originally, it was foreseen to place the PSPP on the module flex. As stated in Table 7.1, a connector failure can only be bypassed if the PSPP is located on the services. This became baseline and is also described in the technical design report [5].
- To have a possibility for detecting drift of the reference voltage of the PSPP, the Vglobal pad was introduced (see section 5.6.3).
- The automatic bypass activation can be disabled as described in section 5.5.2. This can be used to prevent the bypass from activating if the reference is drifting.

| Potential<br>failure       | Effects of failure                                                      | Severity     | Actions                                                                                             |
|----------------------------|-------------------------------------------------------------------------|--------------|-----------------------------------------------------------------------------------------------------|
| Module<br>connector open   | The current path is open,<br>causing power loss in the<br>entire chain. | Catastrophic | Bypass connector with<br>PSPP on type-0 services.                                                   |
| PSPP regulator<br>failure  | PSPP not operational. No effects on chain.                              | Marginal     | Not possible to recover                                                                             |
| PSPP reference<br>drifting | The bypass could activate too early or too late.                        | Critical     | Monitor external voltage to<br>detect drift for detection.<br>Disable automatic bypass<br>function. |
| SCB failure on<br>one chip | PSPP not responding<br>anymore. Bypass can't be<br>set or reset.        | Marginal     | PSPP can be reset with a power cycle. Not possible to recover failure.                              |
| Bypass fails<br>closed     | Module is deactivated                                                   | Critical     | Power-off PSPP, resulting in<br>loss of monitoring for the<br>chain.                                |
| Bypass<br>oscillating      | Noise injected in serial power chain                                    | Catastrophic | Power-off PSPP.                                                                                     |

Table 7.1: Summary of failure modes and effects analysis for the PSPP.

## 7.2 Failure probability of chain

The number of working front-end chips in the full detector depends on the percentage of failures in the front-end, modules and serial power chains. The predicted number of malfunctioning serial power chains can be estimated from the failure rates of the front-end and the other elements in the chain. Not considered are cooling failures, where the bypass could prevent a module from thermal runaway<sup>1</sup>.

#### 7.2.1 Without bypass

A front-end chip has two regulators and four chips are operated in parallel on most modules. The probability of a regulator failure is  $p_r$ .

The regulator can fail gracefully  $p_{rg} = p_r \cdot (1 - f_e)$  where it does not affect others. It is affecting the module when failing non-gracefully  $p_{rng} = p_r \cdot f_e$ .  $f_e$  is defining the fraction of non-graceful regulator failures. The probability of a chip failing which does not affect others is:

$$p_c = 2(1 - p_r) p_{rg} + p_{rg}^2 \tag{7.1}$$

<sup>&</sup>lt;sup>1</sup>The discussion here is based on probability calculations by M. Garcia-Sciveres, C. Zeitnitz and myself.

Assuming four front-end chips in a module, it causes a failure of the chain if

- more than four regulators fail gracefully:  $p_{m1}$  (equation 7.2)
- a single regulator fails non-gracefully:  $p_{m2}$  (equation 7.3)
- the module connector fails:  $p_{m3}$

$$p_{m1} = \left[ p_{rg}^8 + 8p_{rg}^7 \left( 1 - p_{rg} \right) + 28p_{rg}^6 \left( 1 - p_{rg} \right)^2 + 56p_{rg}^5 \left( 1 - p_{rg} \right)^3 \right] \cdot \left( 1 - p_{rng} \right)^8$$

$$(7.2)$$

$$p_{m2} = 1 - \left( 1 - p_{rng} \right)^8$$

$$(7.3)$$

Depending on the chain length N, the chain failure probability is therefore defined as:

$$p_{SP} = 1 - \left(1 - \left(p_{m1} + p_{m2} + p_{m3}\right)\right)^N \tag{7.4}$$

The total fraction of dead front-end chips in the detector is then defined in equation 7.5. Not considered is the double failure of a module and connector. With  $p_{SP} \cdot p_c$  the fraction of gracefully dead chips in a failing serial power chain.

fraction of dead front-end = 
$$p_{SP} + p_c - p_{SP} \cdot p_c$$
 (7.5)

#### 7.2.2 With bypass

The PSPP prevents the chain from failing due to a module failure by bypassing the defective module. The module is still lost though. Therefore, the equations for  $p_c$  (7.1),  $p_{m1}$  (7.2) and  $p_{m2}$  (7.3) are still valid. Not considered are double failures, i.e. module failure and PSPP not able to close the bypass. This results in the assumption that no chain fails with a bypass.

On the other hand, the PSPP can also fail with bypass closed with a probability of  $p_{mP}$ . Consequently, each failing PSPP causes a module to fail. This gives a total fraction of dead front-end chips equal to:

fraction of dead front-end = 
$$(p_{m1} + p_{m2} + p_{m3} + p_{mP}) \cdot (1 - p_c) + p_c$$
 (7.6)

#### 7.2.3 Probability discussion

The final regulators in the front-end and PSPP are not yet available and only a limited number of prototypes can be used for tests. This makes it difficult to estimate the different failures probabilities. For different cases, the fraction of dead front-end chips was plotted against the regulator failure probability in Figure 7.1 on the following page.

The black lines show the case without PSPP for different fractions  $f_e$ . The green and red lines show the case with PSPP for different  $p_{mP}$ .



Figure 7.1: Fraction of dead detector as a function of regulator failure probability.

The higher  $f_e$  and  $p_r$  become, the more benefit is added by the PSPP. The specifications for the front-end is to have less 1% failure rate. The PSPP failure rate should be well below 1% to be beneficial for the serial power chain reliability.

From the tests with the PSPP prototypes, no failure with a closed bypass was observed so far. This is excluding the problem in the PSPPv3 described in section 5.8.2. While about 100 PSPPv3 were used and tested during this work and in the system test (see section 6.2), only about 10 PSPPv4 were tested during this work. The statistic is therefore low and more tests would be required to define a clear number.

## 7.3 Decision by the collaboration

A task force was created in the ATLAS Collaboration in fall 2018, looking at the risks in a serial power chain. I participated as expert for the PSPP. This task force looked also at the other elements of the serial power chain and was lead by D. Bortoletto.

Further improvements of the front-end chip are being made, which include more safety features including over-voltage protection and under-shunt protection, preventing transients by noisy modules. These features are added as response to observations made in system tests. There, the protection was done by the PSPP as described in section 6.2. The improvements in the front-end chip adopt in part the protection features of the PSPP. With the updates, the risk of a failing regulator to affect the entire module and chain (i.e.  $f_e$ ), is seen as very low. Also, the risk of open connectors and cooling failures is deemed to be low as well.

The added complexity and risk with a PSPP has been estimated to be more of a disadvantage than the benefits from the flexibility in operation with a bypass. In spring 2019, the ITk Pixel collaboration decided therefore on a new baseline without the PSPP.

The independent monitoring is still considered as an important feature. A PSPP without the bypass would add the same complications as with a bypass. Therefore, it was decided to use the DCS controller chip as a monitoring chip for the entire SP chain.

# Chapter 8

## Conclusion

The upgrade of the ATLAS ITk detector includes many challenges. The projected radiation dose will exceed everything so far observed. Research and development are concluding. The production of the detector is scheduled to be finished in 2026.

In this thesis, a control and monitoring chip for the new DCS of the ATLAS ITk Pixel detector was developed. This pixel serial powering & protection (PSPP) chip monitors voltage and temperature of the detector modules in a serial powered chain. Only four additional lines are required for power and communication of up to 16 PSPPs in a serial power chain. This allows operating PSPP independently of the pixel detector modules. A bypass transistor can switch individual modules in the chain with a current of up to 8 A while having a power loss smaller than 400 mW. The PSPP is working up to 800 Mrad of total ionizing dose (TID).

As M. F. Newcomer stated in response to irradiation results presented in this work [117]:

"It actually seems quite amazing to use a process that allows stable operation up to 800 Mrad. Standard CMOS processes were only good to about 30 krad in the mid 90's."

## 8.1 Status and summary

The PSPP prototype chips developed in this work proved the concept of a detector control system for a serial power chain. A serial control bus (SCB) was enhanced to work reliably with AC coupled single-ended lines. Logic for the master of the SCB was implemented in an FPGA and the PARC test chip was used as a physical layer for the DCS controller. Two test chips (PARC and PATT) included a test logic which allowed to measure the single event upset (SEU) cross-section. The cross-section for a simple register was found to be  $4.06 \pm 0.08 \times 10^{-14} \text{ cm}^2$ . The data for the triplicated register was not sufficient to make a precise statement. Theoretically, the cross-section for the triple modular redundancy (TMR) register could be up to 10 orders smaller than for a simple register. From the test beam data, the cross-section for the triplicated register is smaller than  $<1.7 \times 10^{-17} \text{ cm}^2$ .

The PSPPv3 was used in a system test for verifying the operation of a serial power chain. The integration of the PSPPv3 was straight-forward and it could be operated very reliably. During the commissioning and debugging of the system test, the PSPPv3 proved to be very useful. The monitoring values helped to identify problems in the

modules. Further, the integrated automatic bypass activation protected the modules in case of over-voltage from damages. The results presented here were made with a serial power chain of seven quad modules. A larger structure with multiple chains is in the commissioning phase, where the PSPPv3 is included and already working.

The PSPPv4 is an updated version and was designed with radiation hard elements. It can bypass a current of 8 A without active cooling. The chip remains functional at 800 Mrad and includes protection against SEU. The PSPPv4 chip was tested in the climate chamber for more than 1 month at temperatures from (0 to 60) °C. The same chip is still in the climate chamber at a constant 85% relative humidity and 60 °C. At the time of this writing, it was operational and working for three weeks under these conditions.

Besides the development and verification of the PSPP chip, the concept and reliability of the detector control system for the serial power in the ITk Pixel detector were analyzed as part of this thesis. The PSPP chip adds complexity to the serial power chain and introduces additional risks. These have been addressed to be minimized. It becomes especially beneficial to the system when a single front-end chip can fail in a way that affects the entire chain. The commissioning, debugging and operation of the detector benefit from an independent monitoring path, as it allows to better investigate failures. Furthermore, the bypass adds the possibility to control single modules in the serial power chain.

## 8.2 Towards a production of the PSPP

The basic functionality is implemented in the latest prototype and proofs to be working. For production, some improvements are recommended.

- While the TMR is working well for protecting the logic, a loss of the clock would make the chip vulnerable. An asynchronous TMR as implemented in a test chip still shows some timing issues and requires more development.
- Further, the implemented comparator is not yet protected against single event transient (SET). This is a risk for operation as it could activate the bypass.
- The internal bandgap (BG) references change with high TID. This requires adjusting the regulated voltages by trimming the BG, allowing the chip to remain functional. However, a stable reference voltage would be beneficial for the operation.

Several tests were performed to verify the functionality of the PSPP prototypes. For an even more thorough check, additional tests could be made. A test of switching the bypass in a magnetic field is recommended, to check if the mechanical stability is given.

Additional irradiation studies will provide more precise insights about the cross-section of the TMR protected logic. Also the angular dependency of SEU could be interesting in such a test beam, to investigate multiple-bit upsets from single particle. The PSPP might be hit from any direction in the experiment, while the test beams performed so far were always perpendicular. Further, effects of non-ionizing particles are not yet investigated. Even though the PSPPv4 proves to be a reliable chip, the ATLAS collaboration decided not to use it in the detector. This decision was made because of the added complexity in the serial chain and because some protection functions from the PSPP were integrated into the front-end chip.

Nevertheless, the PSPP with bypass and monitoring could be an addition to the detector for improving the operation and to provide more control options of a serial power chain. Good understanding of the detector and full control over it are required to obtain high performance. The PSPP developed in this work could be an important component for this.

## Acknowledgments

To complete this work, many people helped me and were involved in different ways. I would like to thank them all, however small the contribution was.

My thanks go to Christian Zeitnitz, my supervisor at the University of Wuppertal. He put his trust in me to perform the task at hand. I thank him also for the advice and counsel, as well as inputs during the progress of the work.

Many thanks go also to Susanne Kersten, who guided me through the years in Wuppertal. Her experience with the detector control system, contacts and knowledge of the experiment was a great help. It was a great pleasure to work with her and I thank her for reading my entire thesis and giving me important input.

Another big thank you goes to Michael Karagounis from the Fachhochschule Dortmund. His knowledge in chip design and input for the development of the PSPP was very helpful. It was always fun to discuss with him and I thank him for his time to support my work.

Many students worked for me and performed measurements used in this work. Martin Errenst, currently at CERN, supported me in my thesis and I enjoyed the conversations over coffee, fondue or at the monastery with him. Philip Bergmann, worked hard to perform tests with all the different prototypes. Yann Narbutt and Jakob Schick started the development of the initial test setup, which proofed to be very useful in many tests. Johanna Kraus did an excellent work on test beam data with the PARC. Max Caspar helped to develop all kinds of little boards and python programs. His fast turn-around and completion of tasks made it almost difficult to provide him with enough input. I thank them all for their hard work.

Many thanks go also to Peter Kind, who supported me in building the test setups. It was a great experience to teach the electronic laboratory sessions with him.

Rizwan Ahmad, originally from the FH Dortmund, is now working on the DCS controller in Wuppertal. He assisted with the design of the PSPP and became a valued colleague. Also, Tobias Fröse and Andreas Stiller were helping hands from Dortmund and supported me during some submissions.

My thanks go also to the whole Wuppertal high energy physics group for warmly welcoming me in their midst. Especially Tobias Flick for correcting the introduction, Marius Wensing for help in all kinds of technical questions and Carsten Dülsen for organizing the town hall meetings and his valuable input to technical questions.

Further, I'd like to thank Carl Haber and Maurice Garcia-Sciveres from the Lawrence Berkeley National Laboratory. Thanks to Carl I was introduced to the ATLAS community and Maurice put me in touch with the group in Wuppertal.

Susanne Kühn from CERN supported me during the time in Geneva and I thank her for all the good input for my work. I thank also Matthias Hamer, who supported our DCS concept in the collaboration and kept the overview of all services. Thanks to them, the demonstrator was a success.

A large thank you goes to Wolfgang Luithardt, my professor and mentor at the Haute école d'ingénierie et d'architecture de Fribourg. He started my journey into physics by giving me the chance to work in Berkeley. He is dearly missed.

Many thanks to family and friends who were interested in my work and listened to my explanations. Without the support from my parents, Therese and Beat Lehmann, this would not have been possible. They encouraged me in all my goals and during the long years of my studies. Thank you very much for everything!

A very big thank you goes to Anna Schönbein. You kept me running and reassured me when I only saw mountains of work. Thanks for enduring all the long hours and cumbersome travels. I'm extremely happy to be with you!

# Bibliography

- <sup>1</sup>CERN, *The CERN Experimental Program: ATLAS*, https://greybook.cern.ch/ greybook/experiment/detail?id=ATLAS (visited on 05/24/2019) (cit. on p. 1).
- <sup>2</sup>L. Evans and P. Bryant, "LHC machine", Journal of Instrumentation **3**, S08001–S08001 (2008) (cit. on p. 3).
- <sup>3</sup>Rende Steerenberg, *LHC Report: Another run is over and LS2 has just begun...* (Dec. 2018) https://home.cern/news/news/accelerators/lhc-report-another-run-over-and-ls2-has-just-begun (visited on 05/22/2019) (cit. on p. 3).
- <sup>4</sup>Corinne Pralavorio, *Record luminosity: well done LHC*, (Nov. 2017) https://home. cern/news/news/accelerators/record-luminosity-well-done-lhc (visited on 05/22/2019) (cit. on p. 3).
- <sup>5</sup>ATLAS Collaboration, Technical Design Report for the ATLAS Inner Tracker Pixel Detector, tech. rep. CERN-LHCC-2017-021. ATLAS-TDR-030 (CERN, Geneva, Sept. 2017) (cit. on pp. 3, 6, 7, 13, 14, 19, 21, 24, 43–45, 47, 48, 119, 149).
- <sup>6</sup>P. Azzi et al., "Standard Model Physics at the HL-LHC and HE-LHC", arXiv pre-print 1902.04070 (2019) (cit. on p. 4).
- <sup>7</sup>ATLAS Collaboration, "The ATLAS experiment at the CERN large hadron collider", Journal of Instrumentation **3**, S08003–S08003 (2008) (cit. on pp. 4, 5).
- <sup>8</sup>ATLAS Collaboration, ATLAS: letter of intent for a general-purpose pp experiment at the large hadron collider at CERN, tech. rep. (CERN, 1992) (cit. on p. 4).
- <sup>9</sup>ATLAS Collaboration, "Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC", Physics Letters B **716**, 1 -29 (2012) (cit. on pp. 4, 10).
- <sup>10</sup>CMS Collaboration, "Observation of a new boson at a mass of 125 GeV with the CMS experiment at the LHC", Physics Letters B **716**, 30 –61 (2012) (cit. on pp. 4, 10).
- <sup>11</sup>B. Aubert et al., "Construction, assembly and tests of the ATLAS electromagnetic barrel calorimeter", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment **558**, 388 –418 (2006) (cit. on p. 5).
- <sup>12</sup>ATLAS IBI Collaboration, "Production and integration of the ATLAS Insertable B-Layer", Journal of Instrumentation 13, T05008 (2018) (cit. on pp. 5, 13, 23).

- <sup>13</sup>M. Garcia-Sciveres et al., "The FE-I4 pixel readout integrated circuit", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment **636**, 7th International "Hiroshima" Symposium on the Development and Application of Semiconductor Tracking Detectors, S155 S159 (2011) (cit. on pp. 5, 23, 48, 54, 70, 101).
- <sup>14</sup>ATLAS Collaboration, Technical Design Report for the ATLAS Inner Tracker Strip Detector, tech. rep. CERN-LHCC-2017-005. ATLAS-TDR-025 (CERN, Geneva, Apr. 2017) (cit. on pp. 6, 13, 15, 18, 21).
- <sup>15</sup>ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the AT-LAS Tile Calorimeter, tech. rep. CERN-LHCC-2017-019. ATLAS-TDR-028 (CERN, Geneva, Sept. 2017) (cit. on p. 6).
- <sup>16</sup>ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the AT-LAS LAr Calorimeter, tech. rep. CERN-LHCC-2017-018. ATLAS-TDR-027 (CERN, Geneva, Sept. 2017) (cit. on p. 6).
- <sup>17</sup>ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the AT-LAS Muon Spectrometer, tech. rep. CERN-LHCC-2017-017. ATLAS-TDR-026 (CERN, Geneva, Sept. 2017) (cit. on p. 6).
- <sup>18</sup>ATLAS Collaboration, Technical Design Report for the Phase-II Upgrade of the AT-LAS TDAQ System, tech. rep. CERN-LHCC-2017-020. ATLAS-TDR-029 (CERN, Geneva, Sept. 2017) (cit. on p. 6).
- <sup>19</sup>W. Demtröder, *Experimentalphysik* 4, 5th ed., Springer Lehrbuch (Springer Spektrum, 2017) (cit. on pp. 8, 9).
- <sup>20</sup>R. A. Serway and J. W. Jewett, *Physics for scientists and engineers with modern physics*, 8th, International Edition (Brooks/Cole Publishing Co., Pacific Grove, CA, USA, 2011) (cit. on pp. 8, 9).
- <sup>21</sup>L. Püllen, "Development of a Detector Control System for the serially powered ATLAS pixel detector at the HL-LHC", PhD thesis (Bergische Universität Wuppertal, Feb. 2014) (cit. on pp. 8, 47, 52, 53, 91, 161).
- <sup>22</sup>Wikipedia, Standard Model of Elementary Particles, (Mar. 2019) https://commons. wikimedia.org/wiki/File:Standard\_Model\_of\_Elementary\_Particles.svg (visited on 05/22/2019) (cit. on p. 8).
- <sup>23</sup>Particle Data Group, "Review of Particle Physics", Phys. Rev. D 98, 030001 (2018) (cit. on p. 9).
- <sup>24</sup>LHCb Collaboration, "Observation of  $J/\psi p$  Resonances Consistent with Pentaquark States in  $\Lambda_b^0 \to J/\psi K^- p$  Decays", Phys. Rev. Lett. **115**, 072001 (2015) (cit. on p. 9).
- <sup>25</sup>C. Haber, "Introductory Lectures on Tracking Detectors", AIP Conference Proceedings 674, 36–75 (2003) (cit. on pp. 11, 13, 16).
- <sup>26</sup>H. Spieler, *Semiconductor Detector Systems* (Oxford Science Publications, 2005) (cit. on pp. 11, 15, 16, 23).

- <sup>27</sup>L. Tlustos, "Performance and limitations of high granularity single photon processing X-ray imaging detectors", Presented on 1 Apr 2005 (2005) (cit. on p. 13).
- <sup>28</sup>ALICE Collaboration, "Technical design report for the upgrade of the ALICE inner tracking system", English (US), Journal of Physics G: Nuclear and Particle Physics **41** (2014) (cit. on p. 14).
- <sup>29</sup>M. Garcia-Sciveres and N. Wermes, "A review of advances in pixel detectors for experiments with high rate and radiation", Reports on Progress in Physics 81, 066101 (2018) (cit. on pp. 14, 15).
- <sup>30</sup>C. J. Riegel, "Performance tests of depleted CMOS sensors for application at HL-LHC", PhD thesis (Bergische Universität Wuppertal, Apr. 2018) (cit. on p. 15).
- <sup>31</sup>G. Aad et al., "ATLAS pixel detector electronics and sensors", Journal of Instrumentation 3, P07007–P07007 (2008) (cit. on p. 17).
- <sup>32</sup>M. Bochenek, "Development of radiation resistant CMOS integrated circuits for the power distribution system in the upgraded ATLAS Semiconductor Tracker", PhD thesis (AGH-UST, Cracow, 2012-04) (cit. on p. 18).
- <sup>33</sup>F. Faccio et al., "FEAST2: A Radiation and Magnetic Field Tolerant Point-of-Load Buck DC/DC Converter", in (July 2014), pp. 1–7 (cit. on p. 18).
- <sup>34</sup>D. Ta et al., "Serial powering: Proof of principle demonstration of a scheme for the operation of a large pixel detector at the LHC", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 557, 445 –459 (2006) (cit. on p. 19).
- <sup>35</sup>M. A. Karagounis, "Analog integrated CMOS Circuits for the readout and powering of highly segmented detectors in particle physics applications", PhD thesis (FernUniversität in Hagen, Hagen, 2010) (cit. on pp. 19, 20, 92).
- <sup>36</sup>Wikipedia, Transistor count, https://en.wikipedia.org/wiki/Transistor\_count (visited on 01/03/2019) (cit. on p. 21).
- <sup>37</sup>R. J. Baker, CMOS Circuit Design, Layout, and Simulation, 3rd (Wiley-IEEE Press, 2010) (cit. on pp. 21, 74, 156, 159).
- <sup>38</sup>P. E. Dodd and L. W. Massengill, "Basic mechanisms and modeling of single-event upset in digital microelectronics", IEEE Transactions on Nuclear Science **50**, 583–602 (2003) (cit. on pp. 21, 22, 24, 25, 28).
- <sup>39</sup>T. May, "Soft Errors in VLSI: Present and Future", IEEE Transactions on Components, Hybrids, and Manufacturing Technology 2, 377–387 (1979) (cit. on pp. 21, 25).
- <sup>40</sup>R. Velazco, P. Fouillat, and R. Reis, *Radiation Effects on Embedded Systems* (Springer Netherlands, 2007) (cit. on p. 21).
- <sup>41</sup>G. Anelli et al., "Radiation tolerant VLSI circuits in standard deep submicron CMOS technologies for the LHC experiments: practical design aspects", IEEE Transactions on Nuclear Science 46, 1690–1696 (1999) (cit. on p. 21).

- <sup>42</sup>F. Faccio and G. Cervelli, "Radiation-Induced Edge Effects in Deep Submicron CMOS Transistors", IEEE Transactions on Nuclear Science **52**, 2413–2420 (2005) (cit. on pp. 22, 23, 29, 47, 103).
- <sup>43</sup>F. Faccio et al., "Radiation-Induced Short Channel (RISCE) and Narrow Channel (RINCE) Effects in 65 and 130 nm MOSFETs", IEEE Transactions on Nuclear Science 62, 2933–2940 (2015) (cit. on p. 22).
- <sup>44</sup>R. D. Schrimpf, "Radiation Effects in Microelectronics", in *Radiation effects on embedded systems*, edited by R. VELAZCO, P. FOUILLAT, and R. REIS (Springer Netherlands, Dordrecht, 2007), pp. 11–29 (cit. on p. 22).
- <sup>45</sup>K. Dette, "Commissioning of the ATLAS Insertable B-Layer and first operation experience", Presented 24 Mar 2017, PhD thesis (Feb. 2017) (cit. on p. 23).
- <sup>46</sup>F. Hartmann, Evolution of Silicon Sensor Technology in Particle Physics (Springer, Cham, 2017) (cit. on p. 23).
- <sup>47</sup>H. Kolanoski and N. Wermes, Evolution of Silicon Sensor Technology in Particle Physics (Springer Spektrum, Berlin, Heidelberg, 2016) (cit. on p. 23).
- <sup>48</sup>M. Huhtinen and F. Faccio, "Computational method to estimate single event upset rates in an accelerator environment", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment **450**, 155 –172 (2000) (cit. on pp. 24, 107).
- <sup>49</sup>C. M. Hsieh and P. C. Murley and R. R. O'Brien, "Dynamics of Charge Collection from Alpha-Particle Tracks in Integrated Circuits", in 19th International Reliability Physics Symposium (Apr. 1981), pp. 38–42 (cit. on p. 24).
- <sup>50</sup>T. C. May and M. H. Woods, "Alpha-particle-induced soft errors in dynamic memories", IEEE Transactions on Electron Devices 26, 2–9 (1979) (cit. on p. 25).
- <sup>51</sup>E. Petersen, "Soft Errors Due to Protons in the Radiation Belt", IEEE Transactions on Nuclear Science 28, 3981–3986 (1981) (cit. on p. 25).
- <sup>52</sup>F. Wrobel et al., "Incidence of multi-particle events on soft error rates caused by n-Si nuclear reactions", IEEE Transactions on Nuclear Science **47**, 2580–2585 (2000) (cit. on p. 25).
- <sup>53</sup>L. W. Massengill and S. E. Diehl-Nagle, "Transient Radiation Upset Simulations of CMOS Memory Circuits", IEEE Transactions on Nuclear Science **31**, 1337–1343 (1984) (cit. on p. 26).
- <sup>54</sup>L. W. Massengill, S. E. Diehl, and J. S. Browning, "Dose-Rate Upset Patterns in a 16K CMOS SRAM", IEEE Transactions on Nuclear Science **33**, 1541–1545 (1986) (cit. on p. 26).
- <sup>55</sup>A. D. Tipton and others, "Multiple-Bit Upset in 130 nm CMOS Technology", IEEE Transactions on Nuclear Science **53**, 3259–3264 (2006) (cit. on p. 26).
- <sup>56</sup>O. A. Amusan et al., "Charge collection and charge sharing in a 130 nm cmos technology", IEEE Transactions on Nuclear Science **53**, 3253–3258 (2006) (cit. on pp. 26, 34).

- <sup>57</sup>Fairchild, Understanding Latch-Up in Advanced CMOS Logic, tech. rep. AN-600 (ON Semiconductor, Apr. 1999) (cit. on p. 27).
- <sup>58</sup>W. A. Kolasinski et al., "Simulation of cosmic-ray induced soft errors and latchup in integrated-circuit computer memories", IEEE Transactions on Nuclear Science 26, 5087–5091 (1979) (cit. on p. 26).
- <sup>59</sup>F. Márquez et al., "Automatic single event effects sensitivity analysis of a 13-bit successive approximation adc", IEEE Transactions on Nuclear Science **62**, 1609–1616 (2015) (cit. on pp. 28, 29, 111).
- <sup>60</sup>F. R. Palomo Pinto, Simulation of SEU/SET effects in the DICE latches by AFTU tool, (Feb. 2019) https://indico.cern.ch/event/769192/contributions/3305894/ (visited on 02/12/2019) (cit. on p. 29).
- <sup>61</sup>F. Faccio, "Design hardening methodologies for asics", in *Radiation effects on embedded systems*, edited by R. VELAZCO, P. FOUILLAT, and R. REIS (Springer Netherlands, Dordrecht, 2007), pp. 143–160 (cit. on pp. 29–31).
- <sup>62</sup>G. E. Moore, "Cramming more components onto integrated circuits", Electronics **38** (1965) (cit. on p. 29).
- <sup>63</sup>N. S. Saks, M. G. Ancona, and J. A. Modolo, "Radiation effects in mos capacitors with very thin oxides at 80k", IEEE Transactions on Nuclear Science **31**, 1249–1255 (1984) (cit. on p. 29).
- <sup>64</sup>N. S. Saks, M. G. Ancona, and J. A. Modolo, "Generation of interface states by ionizing radiation in very thin mos oxides", IEEE Transactions on Nuclear Science **33**, 1185– 1190 (1986) (cit. on p. 29).
- <sup>65</sup>W. Snoeys et al., "Layout techniques to enhance the radiation tolerance of standard CMOS technologies demonstrated on a pixel detector readout chip", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment 439, 349 –360 (2000) (cit. on p. 30).
- <sup>66</sup>F Faccio et al., "SEU effects in registers and in a dual-ported static RAM designed in a 0.25  $\mu$ m CMOS technology for applications in the LHC", 5 p (1999) (cit. on p. 31).
- <sup>67</sup>E. Bartz, "The token bit manager chip for the CMS pixel readout", (2003) (cit. on p. 31).
- <sup>68</sup>T. Calin, M. Nicolaidis, and R. Velazco, "Upset hardened memory design for submicron CMOS technology", IEEE Transactions on Nuclear Science **43**, 2874–2878 (1996) (cit. on p. 31).
- <sup>69</sup>D. Fougeron, "A study of SEU-tolerant latches for the RD53A chip", PoS TWEPP-17, 095 (2018) (cit. on pp. 31, 33, 34, 109).
- <sup>70</sup>M. Menouni et al., "SEU tolerant memory design for the ATLAS pixel readout chip", Journal of Instrumentation 8, C02026–C02026 (2013) (cit. on pp. 31, 33).
- <sup>71</sup>S. Kulis, "Single Event Effects mitigation with TMRG tool", Journal of Instrumentation **12**, C01082 (2017) (cit. on pp. 32, 64).

- <sup>72</sup>M. F. Caspar, "Untersuchungen zu Bitfehlern durch ionisierende Strahlung an einem DCS-ASIC für den ATLAS-Detektor", Thesis zur Erlangung des akademischen Grades Bachelor of Science (B.Sc.) im Studiengang Physik der Bergischen Universität Wuppertal (Bergische Universität Wuppertal, Feb. 2019) (cit. on pp. 33, 107).
- <sup>73</sup>CERN, IRRAD Beam parameters, https://ps-irrad.web.cern.ch/beam\_params.php#/beam\_param (visited on 03/12/2019) (cit. on p. 34).
- <sup>74</sup>D. G. Mavis, D. R. Alexander, and G. L. Dinger, "A chip-level modeling approach for rail span collapse and survivability analyses", IEEE Transactions on Nuclear Science **36**, 2239–2246 (1989) (cit. on p. 34).
- <sup>75</sup>A. I. Chumakov, "Modeling rail-span collapse in ICs exposed to a single radiation pulse", Russian Microelectronics **35**, 156–161 (2006) (cit. on p. 34).
- <sup>76</sup>J. C. Braatz and M. L. White, "ASIC design system for radiation environments", in [1991] Proceedings Fourth Annual IEEE International ASIC Conference and Exhibit (Sept. 1991), P4–4/1 (cit. on p. 34).
- <sup>77</sup>M. Johnson et al., *Latch-Up (white paper)*, tech. rep. SCAA124 (Texas Instruments, Apr. 2015) (cit. on p. 35).
- <sup>78</sup>SIMATIC WinCC Open Architecture, http://www.etm.at/ (visited on 10/25/2018) (cit. on p. 37).
- <sup>79</sup>ATLAS Collaboration, ATLAS DCS Public plots, https://twiki.cern.ch/twiki/bin/ view/AtlasPublic/ApprovedPlotsDCS (visited on 11/28/2018) (cit. on p. 38).
- <sup>80</sup>M. Ishino and A. Oh, *General Information for all ATLAS shifters*, tech. rep. (CERN, 2017) (cit. on p. 38).
- <sup>81</sup>S Blanchard et al., "Bake-Out Mobile Controls for Large Vacuum Systems", CERN-ACC-2014-0353 (2014) (cit. on p. 39).
- <sup>82</sup>CERN, A vacuum as empty as interstellar space, https://home.cern/science/ engineering/vacuum-empty-interstellar-space (visited on 04/07/2019) (cit. on p. 39).
- <sup>83</sup>SRF, Herzstück des Cern ausser Betrieb: Es war ein Marder, https://www.srf.ch/ news/panorama/herzstueck-des-cern-ausser-betrieb-es-war-ein-marder (visited on 04/07/2019) (cit. on p. 39).
- <sup>84</sup>C. Gaspar and B. Franek, "Tools for the automation of large distributed control systems", IEEE Transactions on Nuclear Science 53, 974–979 (2006) (cit. on p. 40).
- <sup>85</sup>T. Henss et al., "The hardware of the ATLAS Pixel Detector Control System", Journal of Instrumentation 2, P05006 (2007) (cit. on p. 41).
- <sup>86</sup>S. Kersten et al., "Control, Safety, and Diagnostics for Future ATLAS Pixel Detectors", Proceedings of ICALEPCS2013, San Francisco, CA, USA (2013) (cit. on p. 41).
- <sup>87</sup>N. Lehmann et al., "Development of a Detector Control System for the ATLAS Pixel detector in the HL-LHC", Journal of Instrumentation **11**, C11004 (2016) (cit. on pp. 41, 44).
- <sup>88</sup>S. Kersten, private communication, Apr. 2019 (cit. on p. 42).

<sup>89</sup>Bosch, CAN Specification 2.0 (Stuttgart, Germany, 1991) (cit. on p. 44).

- <sup>90</sup>J. Boek, "Entwicklung eines strahlenharten Controllers für das Kontrollsystem des ATLAS-Pixeldetektors am HL-LHC", PhD thesis (Bergische Universität Wuppertal, 2012) (cit. on pp. 44, 47, 106, 161).
- <sup>91</sup>B. I. Hallgren et al., "The Embedded Local Monitor Board (ELMB) in the LHC Front-end I/O Control System", (2001) (cit. on p. 44).
- <sup>92</sup>M. Garcia-Sciveres, *The RD53A Integrated Circuit*, tech. rep. CERN-RD53-PUB-17-001 (CERN, Geneva, Oct. 2017) (cit. on pp. 46, 48).
- <sup>93</sup>A. K. Becker, "Design and Test of a Control Chip for the Future ATLAS Pixel Detector at the sLHC", Diplomarbeit (Bergische Universität Wuppertal, 2010) (cit. on pp. 47, 52, 54, 55, 161).
- <sup>94</sup>M. F. Newcomer, Serial Power and Protection (SPP) ASIC for 1V to 2.5V Hybrid Operation, Talk at ATLAS-CMS Power Working Group at CERN, Mar. 2010 (cit. on p. 47).
- <sup>95</sup>V. Filimonov et al., "A serial powering pixel stave prototype for the ATLAS ITk upgrade", Journal of Instrumentation **12**, C03045–C03045 (2017) (cit. on p. 47).
- <sup>96</sup>B. Otto, private communication, Mar. 2019 (cit. on p. 51).
- <sup>97</sup>P. Phillips, private communication, Nov. 2018 (cit. on p. 51).
- <sup>98</sup>Wikipedia, Manchester Code, https://en.wikipedia.org/wiki/Manchester\_code (visited on 05/27/2019) (cit. on p. 53).
- <sup>99</sup>M. Werner, Information und Codierung: Grundlagen und Anwendungen, 2nd ed., Studium (Vieweg + Teubner, 2009) (cit. on p. 55).
- <sup>100</sup>T. Büth, "Logikentwurf, Layout und Verifikation einer I2C Einheit für einen ASIC", Diplomarbeit (Faculty of Information-, Media- and Electrical Technology, University of Applied Sciences Cologne, 2009) (cit. on p. 58).
- <sup>101</sup>Wikipedia, Mealy machine, https://en.wikipedia.org/wiki/Mealy\_machine (visited on 05/08/2019) (cit. on p. 58).
- <sup>102</sup>S. P. Andrew and J. F. Evered, "Digital clock detection", Patent EP 1 309 086 A1 (EU) (May 2003) (cit. on p. 69).
- <sup>103</sup>H. Banba et al., "A CMOS bandgap reference circuit with sub-1-V operation", IEEE Journal of Solid-State Circuits **34**, 670–674 (1999) (cit. on p. 87).
- <sup>104</sup>F. Anghinolfi, ABC130 Band Gap, private communication, June 2016 (cit. on p. 89).
- <sup>105</sup>A. J. Annema, "Low-power bandgap references featuring DTMOSTs", IEEE Journal of Solid-State Circuits **34**, 949–955 (1999) (cit. on p. 89).
- <sup>106</sup>N. Lehmann et al., "Prototype chip for a control system in a serial powered pixel detector at the ATLAS Phase II upgrade", Proceeding of Science Pos(TWEPP-17)026 (2017) (cit. on pp. 94, 95, 103, 113).

- <sup>107</sup>P. Bergmann, "Untersuchungen an Komponenten für einen Kontrollchip eines zukünftigen ATLAS Pixeldetektors", Thesis zur Erlangung des akademischen Grades Bachelor of Science (B.Sc.) im Studiengang Physik der Bergischen Universität Wuppertal (Bergische Universität Wuppertal, Sept. 2017) (cit. on pp. 94, 103, 113).
- <sup>108</sup>R. Ahamd, "Analog and Digital CMOS Circuit Design for the Control System of ATLAS Pixel Detector", Master Degree Thesis for University of Applied Sciences and Arts Dortmund (Fachhochschule Dortmund, Mar. 2018) (cit. on p. 96).
- <sup>109</sup>Y. Narbutt, "Bestrahlungstests eines Prototyp-Kontrollchips für den zukünftigen ATLAS-ITk-Pixeldetektor", Thesis zur Erlangung des akademischen Grades Bachelor of Science (B.Sc.) im Studiengang Physik der Bergischen Universität Wuppertal (Bergische Universität Wuppertal, Sept. 2017) (cit. on pp. 99, 103).
- <sup>110</sup>J. Schick, "Automated test setup for a prototype control chip, of the ITk Pixeldetector in the ATLAS Experiment at CERN", Thesis to earn the academic degree Bachelor of Science (B.Sc.) in the major Physics at the Bergischen Universität Wuppertal (Bergische Universität Wuppertal, Sept. 2017) (cit. on pp. 99, 101).
- <sup>111</sup>Digilent, ARTY Reference, (2017) https://reference.digilentinc.com/reference/programmable-logic/arty/start (visited on 05/09/2017) (cit. on p. 99).
- <sup>112</sup>S. Kuehn, "Results of prototyping for the phase-II upgrade of the pixel detector of the ATLAS experiment", Journal of Instrumentation **14**, C04010–C04010 (2019) (cit. on p. 100).
- <sup>113</sup>N. Lehmann, "Control and Monitoring for a serially powered pixel demonstrator for the ATLAS Phase II upgrade", Proceeding of Science **Pos(TWEPP2018)133** (2018) (cit. on p. 101).
- <sup>114</sup>D. Alvarez, private communication, Feb. 2019 (cit. on p. 101).
- <sup>115</sup>J. W. Krauss, "Single Event Upset Untersuchungen an einem Kontrollchip für den zukünftigen ATLAS-Pixeldetektor", Thesis zur Erlangung des akademischen Grades Bachelor of Science (B.Sc.) im Studiengang Physik der Bergischen Universität Wuppertal (Bergische Universität Wuppertal, Jan. 2018) (cit. on p. 106).
- <sup>116</sup>J. Han and G. Guo, "Characteristics of energy deposition from 1-1000 mev proton and neutron induced nuclear reactions in silicon", AIP Advances 7, 115220 (2017) (cit. on p. 111).
- <sup>117</sup>M. F. Newcomer, Instrumentation Specialist and ASIC designer for the ITk Strip detector, private communication, Apr. 2019 (cit. on p. 123).
- <sup>118</sup>WaferWorld, What you need to know about silicon wafers, (2017) https://www.waferworld.com/silicon-wafer-about/ (visited on 10/31/2018) (cit. on p. 159).
- <sup>119</sup>K. Becker et al., "Radiationhard components for the control system of a future ATLAS pixel detector", Journal of Instrumentation **6**, C01017–C01017 (2011) (cit. on p. 161).
- <sup>120</sup>L. Püllen et al., "Studies for the detector control system of the ATLAS pixel at the HL-LHC", Journal of Instrumentation 7, C02053 (2012) (cit. on p. 161).
- <sup>121</sup>L. Püllen et al., "Prototypes for components of a control system for the ATLAS pixel detector at the HL-LHC", Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment **731**, PIXEL 2012, 114 –119 (2013) (cit. on p. 161).
- <sup>122</sup>S. Kersten, L. Püllen, and C. Zeitnitz, "Ongoing studies for the control system of a serially powered ATLAS pixel detector at the HL-LHC", Journal of Instrumentation 11, C02070–C02070 (2016) (cit. on p. 161).

### Acronyms

- **ADC** analog to digital converter. 16, 28, 43, 44, 48, 62, 65–67, 70–73, 92, 99, 101, 103, 149
- **ALICE** A Large Ion Collider Experiment. 4, 14
- **ASIC** application specific integrated circuit. 1, 2, 15, 16, 19, 21, 30, 42, 44, 45, 47, 48, 51, 52, 89, 101, 104, 149, 152, 156, 157, 159, 161
- ATLAS A Toroidal LHC ApparatuS. 4
- **BG** bandgap. 63, 72, 87–91, 94–96, 104, 105, 116, 117, 124, 157
- CAN controller area network. 44, 161
- **CERN** European Center for Nuclear Research. 1, 3, 100, 103
- **CMOS** complementary metal oxide semiconductor. 14, 21, 23, 26, 29, 35, 87, 149, 151, 157, 158
- CMS Compact Muon Solenoid. 4, 31
- CU control unit. 40
- DAC digital to analog converter. 149
- DCS detector control system. 1, 2, 7, 37–45, 52, 62, 99, 101, 119, 122–124, 161
- **DFF** data flip flop. 151, 152
- **DICE** dual interlocked storage cell. 31, 109
- DMAPS depleted monolithic active pixel sensors. 14
- **DRAM** dynamic random access memory. 151
- DRC design rule check. 156, 157
- **DTMOS** dynamic threshold MOS. 89, 96
- EDA electronic design automation. 149, 153
- ELT enclosed layout transistor. 23, 29, 30, 89

#### Acronyms

- **ESD** electrostatic discharge. 79, 80
- **FE** front-end. 13–16, 18, 19, 31, 42, 43, 45, 46, 124, 149
- **FPGA** field programmable gate array. 42, 54, 58, 65, 99, 123, 152
- FSM finite state machine. 37, 40, 58–61
- **HC** Hamming code. 52, 55, 56
- HDL hardware description language. 152, 153
- **HL-LHC** high luminosity LHC. 3, 5, 6, 10, 34
- **HV** high voltage. 16, 18, 37, 39, 42
- $I^2C$  inter-integrated circuit. 52–54, 58
- **IBL** insertable B-layer. 5, 13, 23, 41, 54
- **ID** inner detector. 5, 6
- **IP** interaction point. 4
- ITk inner tracker. 6, 7, 10, 13, 14, 16, 18–21, 24, 41, 42, 45, 47, 48, 52, 89, 100, 104, 109, 122–124, 149
- LBNL Lawrence Berkeley National Laboratory. 13
- LDO low drop-out. 20, 80, 81
- **LET** linear energy transfer. 25, 106, 111
- LHC Large Hadron Collider. 1-4, 24, 37-40, 44, 147
- LHCb Large Hadron Collider Beauty. 4, 9
- LV low voltage. 16, 42, 43, 45, 78, 86
- LVS layout vs. schematic. 156, 157
- **MAPS** monolithic active pixel sensors. 14
- **MBU** multi-bit upset. 25, 26, 34
- **MOSFET** metal oxide semiconductor field effect transistor. 22, 24, 26, 35, 149, 150
- **NTC** negative temperature coefficient. 46, 71, 72, 75, 113, 114
- OT over-temperature, module temperature goes above threshold. 46, 64, 110, 111

- **OV** over-voltage, module voltage goes above threshold. 46, 64, 110, 111
- **PARC** PSPP add-on regulator and comparator. 49, 50, 54, 74–76, 87, 93–95, 98, 99, 103, 105–107, 123
- **PATT** PSPP asynchronous TMR test. 51, 52, 68, 70, 71, 96, 98, 99, 105, 107, 108, 123
- **PCB** printed circuit board. 14, 51, 52, 101
- PnR place and route. 153, 156
- ${\sf PP}\,$  patch panel. 7, 44
- **PSI** Paul Scherrer Institut. 106, 107
- **PSPP** pixel serial powering & protection. 1, 2, 44–53, 57–59, 62–67, 70–87, 89, 91–97, 99–107, 109–117, 119–125, 143, 144, 147, 149, 154, 157, 161
- **RINCE** radiation-induced narrow channel effect. 23, 104
- **RTL** register transfer level. 152, 153
- SCADA supervisory control and data acquisition. 37
- **SCB** serial control bus. 44, 49, 53, 54, 56, 58, 61–66, 70, 99, 120, 123, 161
- **SCT** semiconductor tracker. 5
- **SEE** single event effect. 21, 22, 24, 25, 28, 32, 102
- **SEL** single event latch-up. 26
- **SET** single event transient. 27, 28, 30, 32, 34, 110–112, 124
- **SEU** single event upset. 26–28, 30–34, 46, 48, 49, 52, 53, 64, 66–68, 98, 99, 103, 105–111, 123, 124, 161
- **ShuLDO** shunt low drop-out. 20, 92, 102
- **SP** serial power. 18, 19, 41–46, 52, 122
- **SPICE** Simulation Program with Integrated Circuit Emphasis. 28
- **SPP** serial powering & protection. 47
- SRAM static random access memory. 27, 31, 151
- **STI** shallow trench isolation. 22, 23, 29
- **TID** total ionizing dose. 22, 23, 29, 94, 99, 102, 104, 123, 124
- **TMR** triple modular redundancy. 32–34, 52, 53, 64, 66–68, 98, 109, 123, 124, 161
- **TRT** transition radiation tracker. 5

# List of Figures

| 1.1               | Overview of the ATLAS detector                   |
|-------------------|--------------------------------------------------|
| 1.2               | Layout of the ITk detector                       |
| 1.3               | Services for the ITk Pixel detector              |
| 1.4               | Elementary particles of the Standard Model       |
| 91                | Graphical repredicementation of a PN-junction 12 |
| 2.1               | Prototype ATLAS strip module                     |
| $\frac{2.2}{2.3}$ | Silicon pixel sensor with hump-bonded FE chip    |
| $\frac{2.5}{2.4}$ | Principle of depleted monolithic pixel sensors   |
| $\frac{2.4}{2.5}$ | Analog front end electronics                     |
| $\frac{2.6}{2.6}$ | Powering principles 17                           |
| $\frac{2.0}{2.7}$ | Schematic of serial power for ITk Pixel          |
| 2.8               | Schematic of a shunt regulator                   |
| 2.9               | Schematic of a ShuLDO regulator                  |
| -                 |                                                  |
| 3.1               | Radiation-induced changes in NMOS transistors    |
| 3.2               | Transistor cross section with sensitive volume   |
| 3.3               | SEU by direct ionization                         |
| 3.4               | SEU by indirect ionization                       |
| 3.5               | Parasitics in CMOS cross section                 |
| 3.6               | SEU in SRAM cell                                 |
| 3.7               | ELT transistor layout                            |
| 3.8               | SEU protected Latch                              |
| 3.9               | DICE memory cell                                 |
| 3.10              | Schematical representation of TMR                |
| 4.1               | Front panel of the ATLAS DCS                     |
| 4.2               | LHC machine cycle                                |
| 4.3               | DCS Finite State Machine                         |
| 4.4               | Overview of the ITk Pixel DCS                    |
| 4.5               | Safety path overview                             |
| 4.6               | Serial power chain overview                      |
| 4.7               | DCS controller block diagram                     |
| 5.1               | Block diagram of the PSPPv3 Chip. 49             |
| 5.2               | Fabricated PSPPv3 Chip                           |
| 5.3               | Fabricated PARC Chip                             |
| 5.4               | PSPPv4 block diagram                             |

| 5.5  | Xray image of a soldered PSPPv4                   | 51 |
|------|---------------------------------------------------|----|
| 5.6  | Pictrue of the PATT chip                          | 51 |
| 5.7  | Keeper pad schematic                              | 54 |
| 5.8  | SCB timing information                            | 56 |
| 5.9  | Read access to the PSPPv3                         | 57 |
| 5.10 | Block diagram of the SCB slave top-level          | 58 |
| 5.11 | FSM of the protocol unit                          | 59 |
| 5.12 | Manchester decoding logic                         | 59 |
| 5.13 | Receive and transmit FSM                          | 61 |
| 5.14 | Data output shift register                        | 61 |
| 5.15 | Test serial power chain with 16 PSPPv3            | 64 |
| 5.16 | Mesaured SCB communication                        | 65 |
| 5.17 | Schematic of synthesized over-voltage flag        | 67 |
| 5.18 | Asynchronous TMR concept                          | 68 |
| 5.19 | Clock detection circuit simulation                | 69 |
| 5.20 | Clock detection circuit measurement               | 69 |
| 5.21 | Test results for the PSPPv3 ADC                   | 72 |
| 5.22 | Test results for the PSPPv4 ADC                   | 73 |
| 5.23 | Schematic of the comparator usage in the PSPPv3   | 74 |
| 5.24 | Schematic of the Baker comparator                 | 74 |
| 5.25 | Simulated comparator switching characteristic     | 75 |
| 5.26 | Schematic of the comparator for the PSPPv4        | 76 |
| 5.27 | Comparator test                                   | 76 |
| 5.28 | Schematic of the bypass transistor                | 77 |
| 5.29 | Simulated PSPPv3 bypass voltages during switching | 78 |
| 5.30 | Bypass layout for the PSPPv3                      | 79 |
| 5.31 | Updated bypass and driver for the PSPPv4          | 81 |
| 5.32 | Simulated PSPPv4 bypass voltages during switching | 83 |
| 5.33 | Schematic of the chain for bypass tests           | 84 |
| 5.34 | Bypass on-resistance as function of current       | 84 |
| 5.35 | Bypass off-resistance with powered PSPP           | 85 |
| 5.36 | Bypass off-resistance with un-powered PSPP        | 85 |
| 5.37 | PSPPv3 bypass switching                           | 86 |
| 5.38 | PSPPv4 bypass switching                           | 86 |
| 5.39 | Schematic for diode-BG                            | 87 |
| 5.40 | Schematic for transistor-BG                       | 89 |
| 5.41 | Transistor-BG trimming simulation                 | 90 |
| 5.42 | Transistor-BG output as function of temperature   | 90 |
| 5.43 | Trimming of the transistor-BG                     | 91 |
| 5.44 | Shunt regulator schematic developped for the PSPP | 92 |
| 5.45 | PSPP Linear regulator schematic                   | 93 |
| 5.46 | Regulator input characteristic                    | 94 |
| 5.47 | Regulator load test                               | 94 |
| 5.48 | PARC voltages under irradiation                   | 95 |

| 5.49  | Regulator schematic for the PSPPv4                                           |
|-------|------------------------------------------------------------------------------|
| 5.50  | PSPPv4 supply voltages                                                       |
| 5.51  | Power-on reset schematic                                                     |
| 5.52  | Power-on reset switching                                                     |
| 5.53  | Power-on reset measurement                                                   |
| 5.54  | Schematic of the SEU test shift register                                     |
| 61    | Initial test setup 100                                                       |
| 6.2   | Carrier support for test setup                                               |
| 6.3   | Electrical system test                                                       |
| 6.4   | Over-voltage protection test 102                                             |
| 6.5   | PSPPv4 voltages under irradiation 104                                        |
| 6.6   | PATT voltages under irradiation 105                                          |
| 6.7   | PSPPv4 bypass under irradiation 106                                          |
| 6.8   | SEU cross-section as funciton of LET                                         |
| 6.9   | SEU cross-section as funciton of beam energy                                 |
| 6.10  | SEU distribution in PATT shift register                                      |
| 6.11  | Possible SET on over-temperature and over-voltage flag                       |
| 6.12  | Schematic for SEU simulation with the comparator                             |
| 6.13  | Simulation of an SET in the comparator                                       |
| 6.14  | PSPPv3 bypass longterm test                                                  |
| 6.15  | Climate chamber temperature profile                                          |
| 6.16  | Temperature as function of time for PSPPv4                                   |
| 6.17  | Bypass on-resistance as a function of time for PSPPv4                        |
| 6.18  | $V_{BG}$ as function of chip temperature                                     |
| 71    | Fraction of doad detector as a function of regulator failure probability 122 |
| 1.1   | Fraction of dead detector as a function of regulator failure probability 122 |
| A.1   | Current mirror                                                               |
| A.2   | Differential pair                                                            |
| A.3   | Cascode structure                                                            |
| A.4   | Schematic of a CMOS AND                                                      |
| A.5   | Schematical of a SRAM cell                                                   |
| A.6   | Schematical of a data flip flop                                              |
| A.7   | Digital design flow                                                          |
| A.8   | Example of a digital design                                                  |
| A.9   | Analog design flow                                                           |
| A.10  | Example schematic for an analog design                                       |
| A.11  | Process variation in a band gap                                              |
| A.12  | Layout of a differntial pair                                                 |
| A.13  | Silicon wafer                                                                |
| A 1 4 |                                                                              |

# List of Tables

| 2.1          | Comparision of different powering schemes                                                                                                                                                |
|--------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| $4.1 \\ 4.2$ | ATLAS run cycles    39      Operation cycles of the LHC    39                                                                                                                            |
| 5.1          | Radiation tolerance requirements for the PSPP                                                                                                                                            |
| 5.2          | SCB Write command                                                                                                                                                                        |
| 5.3          | SCB read command                                                                                                                                                                         |
| 5.4          | Internal registers of the PSPPv3 chip                                                                                                                                                    |
| 5.5          | Updated status register of the PSPPv4 chip                                                                                                                                               |
| 5.6          | Updated registers for the PATT chip                                                                                                                                                      |
| 5.7          | PSPP ADC channels                                                                                                                                                                        |
| 5.8          | Simulated bypass resistance for the PSPPv4                                                                                                                                               |
| 5.9          | PSPPv4 bypass switching times depending on driver                                                                                                                                        |
| 6.1          | Irradiation campaings                                                                                                                                                                    |
| 6.2          | SEUs observed at PSI                                                                                                                                                                     |
| 6.3          | Temperature increase of PSPPv4                                                                                                                                                           |
| 6.4          | Bypass on-resistance of PSPPv4                                                                                                                                                           |
| 6.5          | $V_{BG}$ and $V_{int}$ for different temperatures $\ldots \ldots \ldots$ |
| 7.1          | Summary of failure modes and effects analysis for the PSPP                                                                                                                               |
| B.1          | DCS test chips developed at the University of Wuppertal                                                                                                                                  |

## Appendix A

## Introduction to ASIC design

Application specific integrated circuit (ASIC) design describes the process of developing an integrated circuit. Usually, a distinction is made between analog and digital designs. Mixed design is used for a combination of both methods. Most fabricated circuits include both elements, but individual components in a chip are either digital circuits performing logic functions or analog circuits like regulators or amplifiers. Exceptions are circuits like analog to digital converters (ADCs), digital to analog converters (DACs) or comparators that are interfacing both domains. Basic analog design elements and simple digital circuits are briefly explained in section A.1.

The design methods are different for digital and analog parts. Both are described briefly in sections A.2 and A.3 respectively. There are two different approaches to integrate elements in a chip: analog-top or digital-top. Analog-top means that the last design step is performed by the designer who draws connections manually. The pixel serial powering & protection (PSPP) is a chip created with analog-top as described in this thesis. For digital-top, all blocks including analog blocks are abstracted and the electronic design automation (EDA) tools perform the integration and creation of connections. The new front-end (FE) chip for the inner tracker (ITk) Pixel Detector is done in this way by implementing "analog islands" in a digital sea [5]. The designer has to configure the tools to place analog blocks at desired positions and verify that all design requirements are met.

#### A.1 Short introduction to CMOS circuits

An understanding of the inner structure of a complementary metal oxide semiconductor (CMOS) circuit helps to understand the vulnerabilities and how to protect against radiation damages. Therefore I give here a short introduction to the basic elements, which will help to understand the protection methods.

#### A.1.1 Analog base building blocks

The metal oxide semiconductor field effect transistor (MOSFET) is the basic device for integrated circuits. There are two types of transistors<sup>1</sup> in the CMOS technology:

<sup>&</sup>lt;sup>1</sup>Transistors used in CMOS are field effect transistors, also known as MOSFET. These transistors have an isolated gate contact, which controls the channel by an electric field. Other transistor types, like bipolar operate differently. I use the term transistor for MOSFET devices.



Figure A.1: Current mirror. Figure A.2: Differential Figure A.3: Cascode strucpair. ture.

N-channel (NMOS) and P-channel (PMOS) devices. They operate the same but with inverted signs of the voltages and currents. A MOSFET has four terminals: drain, source, gate and bulk. The bulk is the well contact of the device which is n-doped for the PMOS or the p-doped substrate for the NMOS. Drain and source are two symmetric contacts. They are doped opposite to the well. The gate contact is isolated from the other structure by a thin oxide layer. When a voltage is applied between gate and source, a channel is formed between the source and drain which allows charge to flow.

#### **Basic circuits**

Three basic circuits are mainly used to create analog devices.

The current mirror shown in Figure A.1 allows "copying" currents. The two transistors have the same gate-source voltage and therefore the same operating point. When the two transistors have the same size (W/L), the currents  $I_1$  and  $I_2$  are also equal. By changing the ratio, a current multiplication can be achieved.

The differential pair is used for amplifying a differential signal. The circuit is shown in Figure A.2. Two equal sized transistors are connected at the source and are biased by a common current  $I_0$ . When the two input voltages  $V_1$  and  $V_2$  are equal, the current  $I_0$ is split evenly on  $I_1$  and  $I_2$ . If  $V_1$  becomes larger than  $V_2$ , i.e.  $V_{diff}$  positive, then the transistor M1 is conducting better than M2. The output currents  $I_1$  and  $I_2$  are reflecting this as  $I_1$  is becoming larger. Differential pairs are used as input stages for amplifiers or comparators. By adding current mirrors as load to the pair, the amplifying factor can be increased.

For the current mirror and differential pair, it is important to properly match the transistors in the layout. This is done by selecting transistors with the same sizes and same orientation. See also section A.3.

The cascoded structure puts two transistors in series as shown for PMOS devices in Figure A.3. Cascoding applied to a current mirror increases the output resistance of the mirror. On the other hand, the dynamic range is reduced and the output voltage has to be larger. This can also be used to increase the voltage tolerance of a circuit.

#### A.1.2 Logic circuits

Logic circuits are using transistors as switches, which are either on or off. This simplification can be used for understanding the function of the circuit. Though for more insights, the analog side of the transistor should be taken into account.

#### **Combinational logic gates**

All CMOS circuits have both PMOS and NMOS devices. The most basic element in a CMOS circuit is an inverter consisting of two transistors. A slightly more complex gate is shown in Figure A.4, an AND gate. An inverter is also seen between node X and Z, consisting of transistors M5 & M6. All basic logic gates are built similarly.

In logical blocks, the transistors are either open or closed. So normally there is no current path from the supply  $(V_{DD})$  to ground  $(V_{SS})$ . Only during the transition from '1' to '0' or inverse a path is opened. E.g. looking at inverter M5 & M6 in Figure A.4. During the transition, neither of the transistors is fully blocking and a current can flow from  $V_{DD}$  to  $V_{SS}$ . Furthermore, a current is required to charge the gate capacitance of the next gate.

#### Memory cells

The basic memory cells used in digital designs are dynamic random access memory (DRAM), static random access memory (SRAM) and data flip flop (DFF). DRAM cells can be made very small as the information is stored on a capacitor and needs only a single transistor per bit. As the charge can be lost over time, they need periodic refreshing of the memory content.

An SRAM cell is built with six transistors as shown in Figure A.5. This cell doesn't need refreshment as it holds the memory in a bistable latch formed by the two inverters (M1-M4). With M5 and M6, the memory can be set or read. SRAM cells are arranged in arrays to create larger memory blocks.

A DFF is more complex than the random access memory cells. Normally they are built on a master-slave basis. A possible circuit is shown in Figure A.6 where the latches



Figure A.4: Schematic of a CMOS AND gate



Figure A.5: Schematic of a single SRAM cell.



Figure A.6: Schematic of a simple DFF. There are also variants with asynchronous set or reset possible.

are realized also with a bistable circuit. Latch 1 copies the value from input D when the clock signal is '0'. During this time the pass device is blocking and latch 2 stores its value. When the clock is at '1', latch 1 stores the value while latch 2 copies it through the now open pass device. Therefore input D is memorized at the rising edge. An additional output buffer provides the stored bit to further logic. DFFs are mainly used in sequential logic and not in memory blocks, due to their complexity and size.

#### A.2 Digital design flow

Digital designs are performed with automated tools. The desired function is described in hardware description language (HDL) code, which is then translated into a digital circuit. The most common HDL languages are Verilog and VHDL, which are also used to write firmware for field programmable gate arrays (FPGAs). Logic intended for an ASIC could be tested first in an FPGA to check if everything is working in HW.

A Verilog code example is given in Listing A.1. Often a register transfer level (RTL) coding style is used, i.e. the purpose of a circuit is described on register level with a functional description of the logic.

Code A.1: Example Verilog code. Used to describe the bypass comparator flip flop

```
1 always @(posedge vmod comp
            or posedge clk in
3
            or negedge RSTn)
  begin
     if (RSTn == 1'b0) // reset
5
       overvoltIn \leq 1'b0;
    else if (vmod comp == 1'b1) // asynchronous set
7
       overvoltIn <= 1'b1;
9
    else
             synchronos reset through logic
    begin
       if (overvoltage_rst == 1'b1)
11
         overvoltIn <= 1'b0;
13
       else
         overvoltIn <= overvoltInVoted;
15
    end
```

The digital design flow is graphically represented in Figure A.7 on the next page. It contains multiple steps using different EDA tools.

- 1. RTL code is written in parallel with a test bench. The test bench is used to verify in simulation if the written logic is working as intended.
- 2. Once the code is performing as desired, it is passed to the synthesizer. This tool requires also input about timing of the logic, like clock frequency. Timing information is given in a constraints file. The synthesizer uses a predefined library with standard cells like simple logic gates (AND, OR, NOR, etc) as well as sequential elements like flip flops. Each cell contains information about its function and timing properties. The synthesizer translates the functional description from HDL code into a netlist connecting logic gates and respecting given constraints if possible. It will generate errors if the logic is too complex or the timing constraints have to be adjusted in that case.
- 3. The generated netlist can then be simulated again with the original test bench to verify that it still works as intended. This is the post-synthesis simulation step in Figure A.7 on the following page. There are also EDA tools that can perform a logic equivalent check to compare original RTL code with generated netlists and verify that they have the same logical function.
- 4. Once the design was successfully synthesized, it is passed to the place and route (PnR) tool. In addition to timing constraints, information about the physical size of the design is required. This includes also input about analog blocks in case of a digital-top design and power lines. The PnR places the cells defined after synthesis. After this step, better information about delays between different cells are available. The netlist is further adjusted to respect timing. Additionally, a clock tree synthesis is performed which analyses delays of clock signals. The clock path is optimized and buffers are inserted to assure that no hold and setup times are violated. Filler cells are added to fill up the remaining space once all cells are placed. Finally, design rule checks are performed to verify that none of the rules for production are violated.
- 5. The finalized netlist can again be simulated against the test bench and compared to the synthesized netlist with a logic equivalent checker. The generated netlist and layout are exported and can either be used in an analog-top design or for fabrication. An example of a finished digital block is shown in Figure A.8 on the next page.

#### A.3 Analog design flow

The design of analog elements is not yet as automated as for digital circuits. EDA tools nevertheless support the design process. An overview of the analog or full-custom design

Niklaus Lehmann

 $\mathbf{end}$ 



Figure A.7: Schematic representation of digital design flow.



Figure A.8: Logic block of the PSPPv3 as an example of a generated digital design. The power rails go around the logic and there are vertical rails to provide power for logic cells.



Figure A.9: Schematic representation of analog design flow.

flow is given in Figure A.9. Design kits contain information from the foundry like device models for simulation, physical properties for layout and the design rules.

Each analog design starts with a schematic as shown in Figure A.10 on the next page. The schematic defines connections between transistors and other elements, like resistors or capacitors. An analog circuit is normally split into small parts to simplify the design and to reuse circuits like amplifiers.

Simulations are an essential part of analog design. Once the function of a circuit is defined, transistors and other elements have to be properly sized i.e. the physical dimensions to be defined. This is required to guarantee that the circuit operates as intended in all expected conditions.

Characteristics of elements depend on temperature, process variations and also radiation (see sections 3.1 and 3.2). These variations are different for each technology and therefore it is not straightforward to transfer a design from one technology to another. Process variations are measured by the foundry and implemented in design models. They are grouped in different corners to represent extreme cases. A corner simulation is performed to verify that a circuit works across the desired temperature range within all design corners. Figure A.11 on page 157 shows an example of such a simulation. It can be seen that the output voltage varies depending on the corner. Furthermore, a Monte Carlo simulation is run to analyze a randomized set of process variations. This gives fur-



Figure A.10: Schematic of a comparator circuit as an example for an analog design.

ther insight into the expected yield and how design parameters can change the operating point of a circuit.

Once the circuit performs as desired in simulation, it has to be drawn as a layout. Analog layouts are drawn manually to have full control. This is important to guarantee that transistors are matching. All transistors in a current mirror or from a differential pair should have the same orientation and same surroundings. This is used to assure that transistors observe the same process variations during fabrication. A better matching improves the performance of the circuit and can increase the yield. Used techniques for matching are interleaving transistors for a symmetric arrangement or adding of dummy transistors for a common surrounding. Figure A.12 on page 158 shows an example layout of a differential pair where both techniques were applied. More details on transistor matching are given for example in [37].

A layout is based on layers, which are used to create masks for production. See section A.4 for more details on how an ASIC is fabricated. The layout has to follow design rules, which are checked by a dedicated tool. It is usually referred to as design rule check (DRC). Another software tool is used to verify that layout and schematic represent the same circuit. The layout vs. schematic (LVS) tool extracts designed devices and interconnection from the layout. The extracted circuit is compared to its schematic and the tool verifies that both are equivalent, including size and parameters of devices.

Parasitic elements appear in the circuit due to the tracks. For example, each metal line is a small resistor and two parallel lines form a parasitic capacitor. The extraction of parasitics is done after finishing the layout of a circuit. Analog simulations are repeated, taking parasitic information into account, to ensure that a design is still fulfilling all requirements.

Once all elements are designed, they can be integrated into the top-level design. For digital-top, it is necessary to create a file containing all information for the PnR tool.



Figure A.11: Simulation of a bandgap (BG) with different corners.

An analog block cannot be connected correctly without this information.

For the analog-top approach, it is good practice to start with IO pads. Pads are connections to external signals. Creating a padframe in an early stage of the design process is helpful to reduce problems with design rules, which are different if pads are included. The blocks are integrated one by one into the padframe and DRC and LVS are performed after each new block is added. This method was used for the development of the PSPP and proved to be very efficient.

#### A.4 ASIC fabrication

A CMOS ASIC is produced in several steps. A Si wafer forms the basis. Common sizes are 200 mm or 300 mm. Other dimensions exist as well. Different layers to build an integrated circuit are defined by masks. A mask is used to define patterns for a full wafer. As many devices can be built on a wafer, it is common to use a reticle and expose the same pattern multiple times on a wafer. A reticle itself can be formed out of multiple ASICs or hold again copies of a single design. An example of how this could be done is shown in Figure A.13 on page 159.

The fabrication process of CMOS technologies is done with photolithography. A simplified example is given in Figure A.14 on page 160. It takes several steps to build different layers. Masks for a design are shown in Figure A.14j on page 160. Each color represents a mask that is used to create a certain layer. Combinations of different masks



Figure A.12: Layout of a differential pair. The fingers of two transistors are placed in symmetry along different axis.

define what kind of device is built.

- 1. A bare silicon wafer is used as a substrate (Figure A.14a). The substrate is usually already doped. There exist both n and p doped substrates, but p doped is more common for CMOS production.
- 2. A layer of photoresist is added on top of the substrate as shown in Figure A.14b.
- 3. The mask for N-well transistors is used to develop the photoresist (Figure A.14c). After exposing to light, the resist is removed. For a positive resist, everything exposed to light is removed. The other way around for a negative resist.
- 4. The remaining resist is used to block donor atoms, which are inserted to form an N-well (A.14d). After the N-wells are formed, the remaining resist is also removed.
- 5. An oxide layer is built up to create isolators (A.14e).
- 6. Again a photoresist is added and exposed. This is used to create openings in the oxide for active areas (A.14f).
- 7. Instead of diffusing donor atoms, the oxide is etched away on areas with no photoresist. Figure A.14g shows the etched oxide. Additionally, an oxide and polysilicon layer have already been added on top to form the gate of a transistor.
- 8. The polysilicon and gate oxide are etched away with an additional resist procedure to leave just desired areas. Drain and source are again diffused into the substrate (A.14h). This is done with p donors for PMOS and substrate contacts. In the same way, N donors are creating N-well contacts and NMOS transistors.
- 9. Contacts and metal layers are added in further steps similar to the processes described above. An isolating layer is created first between every metal layer though. Transistors with the first metal layer are shown in Figure A.14i.

Actual production steps are more complicated than described above. They are based on these techniques however might require additional steps to create for example oxides between transistors. Masks are produced larger than the final design. Instead of creating masks with features in the nm range, patterns on the mask are reduced optically.



Figure A.13: Example for arranging different ASICs on a reticle and repetition of the reticle multiple times on a wafer. The grayed reticles on the edge are not usable or only partly. It is possible to recognize repeating patterns of a reticle on pictures of a fabricated wafer.

The design rules mentioned in section A.3 define minimal sizes and distances that can be produced. Many parasitics elements like diodes or bipolar transistors are created as can be seen in Figure A.14i on the next page (e.g. N-Well to substrate forms a diode). Design rules are also defined to prevent unwanted effects from these elements, like requiring a large distance between two wells or having enough substrate and well contacts. It should also be noted that during etching or diffusion processes the mask is not exactly reproduced. Patterns could become a little bit larger or smaller than intended. This is also hinted in Figure A.14 on the following page. These imprecisions have to be taken into account and it follows that transistors close to minimal size observe larger variations. For this reason, most analog designs use transistors well above the minimum size.

There are several categories of rules. Some are strictly enforced while others are recommendations for good design practice. Later are less critical for prototypes but should be considered for production. The foundry will control that they can produce a design. However, it is the designer who holds the final responsibility for the functionality and to get a high yield in the production.

Further details on fabrication and design techniques are given in [37].



Figure A.14: Simplified example of the pattering process. Figure a to i show how different layers are built. The top view of the masks as drawn by a designer are shown in Figure j.

## Appendix B

# List of ASICs designed at the University of Wuppertal

Table B.1 lists the full list of all the ASICs designed during the development of the detector control system (DCS) chips at the status of writing this thesis. The chips PSPPv3, PARC, PSPPv4 and PATT were developed during this work and described in Chapter 5.

| Name                    | Process          | Designer                                                  | Description                                                                                                                                   |
|-------------------------|------------------|-----------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------|
| First DCS<br>Chip       | $350\mathrm{nm}$ | K. Becker and<br>P. Kind                                  | DCS chip prototype including a serial communication interfaces [93].                                                                          |
| CoFee1                  | $130\mathrm{nm}$ | J. Boek, K. Becker,<br>P. Kind and<br>L. Püllen           | Digital test chip implementing logic functions of the DCS chip and DCS controller [119].                                                      |
| CoFee2                  | $130\mathrm{nm}$ | J. Boek, P. Kind<br>and L. Püllen                         | Corrected version of the DCS controller logic from<br>the CoFee1. Submitted in a triple modular<br>redundancy (TMR) and non-TMR version [90]. |
| PhysLay                 | $130\mathrm{nm}$ | L. Püllen                                                 | Physical layer test chip [120].                                                                                                               |
| $V_{Ref}$ -Chip         | $130\mathrm{nm}$ | L. Püllen                                                 | Test chip with voltage reference and shift register for<br>single event upset (SEU) measurements [121].                                       |
| PSPPv1                  | $130\mathrm{nm}$ | L. Püllen                                                 | Prototype and proof of concept for the PSPP chip [21].                                                                                        |
| PSPPv2                  | $130\mathrm{nm}$ | L. Püllen                                                 | Corrected version of the PSPPv1 with reduced debug options for smaller chips size [122].                                                      |
| PSPPv3                  | $130\mathrm{nm}$ | N. Lehmann                                                | Updated version of the PSPP for 8A and radiation hard logic.                                                                                  |
| PARC                    | $130\mathrm{nm}$ | N. Lehmann                                                | Test chip with radiation hard regulator and<br>comparator for PSPP and SEU test logic.                                                        |
| PSPPv4                  | $130\mathrm{nm}$ | N. Lehmann                                                | Full radiation hard PSPP and improved bypass with bump bonding.                                                                               |
| PATT                    | $130\mathrm{nm}$ | N. Lehmann                                                | Test chip for asynchronous TMR and increased SEU test logic.                                                                                  |
| DCS<br>Controller<br>v1 | $65\mathrm{nm}$  | R. Ahmad,<br>T. Fröse,<br>M. Karagounis and<br>N. Lehmann | Prototype chip with 5 V regulator, controller area<br>network (CAN) and serial control bus (SCB) physical<br>layers and SEU test logic.       |

Table B.1: DCS test chips developed at the University of Wuppertal.

# Appendix C

# Failure Mode and Effects Analysis

| ſ                               | the second s |          |                                                                                                            |                                                    |                                                                                                          | t        |                 |                                                                    | 1 000000                 | 2    |
|---------------------------------|----------------------------------------------------------------------------------------------------------------|----------|------------------------------------------------------------------------------------------------------------|----------------------------------------------------|----------------------------------------------------------------------------------------------------------|----------|-----------------|--------------------------------------------------------------------|--------------------------|------|
| Inner                           | r Detector                                                                                                     | 2        |                                                                                                            | Potential Eff                                      | ects of failure                                                                                          | <u>ر</u> |                 |                                                                    | elements                 | .⊆   |
| Component                       | Component function                                                                                             |          |                                                                                                            | Local Effect                                       | Final Effect                                                                                             | 5<br>35  | sveiky kaulig   |                                                                    | together)<br>[Less Than] | -    |
|                                 |                                                                                                                | S1.1.1   | Single FE shorted                                                                                          | No voltage drop on module                          | All FEs of module stop functioning                                                                       | e        | Critical        | E internal short or short on power pads                            | 0.1 evts/yea             | Sars |
|                                 |                                                                                                                | S1.1.2   | Single FE Open                                                                                             | Current is blocked in one                          | The other FE take over the current                                                                       | 2        | Marginal        | Vire bond failure, malfunctioning shunt regulator                  | 101 evts/yea             | Sars |
|                                 | Module power (shunt                                                                                            | S1.1.3   | Modle open                                                                                                 | Multiple FE fail open                              | Entire serial chain stops working                                                                        | 4        | Catastrophic 0  | connection to stave flex is broken                                 | 0.1 evts/yea             | Sars |
| Module (Sensor<br>and FE chips) | regulator)                                                                                                     | S1.1.5   | Connector failure                                                                                          | Serial chain broken                                | Remaining FE have to shunt additional<br>current from blocked FEs                                        | 3        | Marginal        | E failure, broken wire bonds                                       | 0.1 evts/yea             | ars  |
|                                 |                                                                                                                |          |                                                                                                            |                                                    |                                                                                                          |          |                 |                                                                    |                          |      |
|                                 | Cooling / Glue to<br>support structure                                                                         | S1.3.1   | Loss of thermal contact                                                                                    | Heat is not carried away as designed               | Module temper ature increases<br>uncontrolled                                                            | e        | Critical        | Aodule is delaminated from stave                                   | 0.1 evts/yea             | ars  |
|                                 |                                                                                                                |          | NTC classifications according of                                                                           |                                                    |                                                                                                          |          |                 |                                                                    |                          |      |
|                                 |                                                                                                                | S2.1.1   | temperature dependence change                                                                              | Wrong temperature measurement                      | Early or late activation of bypass                                                                       | e        | Critical        | adiation damage or other external effects change NTC value         | .01 evts/yea             | Sar. |
|                                 |                                                                                                                | S2.1.2   | NTC loss of connection; solder<br>connection / connector to module<br>(uncertain reliability of connector) | No temperature reading                             | Automatic bypass activation through<br>temperature not working                                           | 8        | Marginal        | roken solder connection, defect on flex print, connector<br>alkure | 101 evts/yea             | sars |
|                                 | NIC                                                                                                            | \$2.1.3  | Loss of thermal contact; flex locally<br>delaminated                                                       | Wrong temperature reading (too low)                | Thermal runaway not detected                                                                             | e        | Critical        | ad gluing of flex                                                  | .01 evts/yea             | ars  |
|                                 |                                                                                                                | \$2.1.3  | NTC Short; (never occurred) short on<br>connection                                                         | Temperature is read as maximum (very hot)          | Bypass is activated wrongly                                                                              | e        | Critical        | horted wire bond, defect on flex print or in PSPP or NTC           | 101 evts/yea             | Sars |
|                                 |                                                                                                                |          |                                                                                                            |                                                    |                                                                                                          |          |                 |                                                                    |                          |      |
|                                 |                                                                                                                | \$2.2.1  | Power Supply for DCS defect                                                                                | No power for PSPPs                                 | Loss of control and monitoring of chain                                                                  | e        | Critical        | ower supply failure                                                | .01 evts/yea             | ars  |
|                                 |                                                                                                                | S2.2.2   | Power cables for DCS defect                                                                                | No power for PSPPs                                 | Loss of control and monitoring of chain                                                                  | e        | Critical        | roken power lines                                                  | .01 evts/yea             | ars  |
|                                 |                                                                                                                | \$2.2.3  | Power start up sequence not followed; LV<br>before DCS power                                               | None                                               | No monitoring until DCS power on                                                                         | N        | Marginal        | Jser error                                                         | .01 evts/yea             | ars  |
|                                 | iawod shor                                                                                                     | S2.2.4   | Power at intermediate level; voltage or<br>current limit not correctly set                                 | No power for PSPPs / not working<br>properly       | Loss of control and monitoring of chain                                                                  | e        | Critical        | Jser error / wrong PSU configuration                               | 101 evts/yea             | Sars |
|                                 |                                                                                                                | \$2.2.5  | Radiation induced current increase; TID<br>bump                                                            | Increased current needed                           | PSPP not properly working, increased<br>power loss in bias resistors                                     | 2        | Marginal        | 1D bump, increased leakage current. Known effect of echnology      | .01 evts/yea             | ars  |
|                                 |                                                                                                                |          |                                                                                                            |                                                    |                                                                                                          |          |                 |                                                                    |                          |      |
|                                 |                                                                                                                | S2.3.1   | Voltage regulator failure                                                                                  | Single PSPP not powered                            | Loss of control and monitoring of single<br>module                                                       | 8        | Marginal        | befective PSPP, radiation damage                                   | 0.1 evts/yea             |      |
|                                 |                                                                                                                | \$2.3.2  | Voltage reference drift                                                                                    | Wrong ADC readout and wrong<br>thresholds          | Early or late activation of bypass (due to<br>module voltage. Temperature unaffected)                    | e        | Critical        | adiation damage or environmental effects on PSPP                   | 0.1 evts/yea             | ars  |
|                                 |                                                                                                                | S2.3.3   | SEU in bypass register                                                                                     | Bitflip in a register                              | Bypass is activated or deactivated in worst case                                                         | m        | Critical        | adiation damage                                                    | 101 evts/yea             | sars |
|                                 |                                                                                                                | S2.3.4   | SEU in transmission register                                                                               | Bit flip in a register                             | Communication failure                                                                                    |          | Insignificant I | adiation damage                                                    | .01 evts/yea             | ars  |
|                                 |                                                                                                                | S2.3.5   | transient SEE outside logic (regulator,<br>comparator, ADC, power-on,)                                     | transient signal in circuit affecting<br>operation | Worst case bypass activated through SEE<br>in Comparator                                                 | e        | Critical        | adiation damage                                                    | 101 evts/yea             | L S  |
|                                 |                                                                                                                | S2.3.6   | permanent SEE                                                                                              | chip goes into a unknown state                     | PSPP not working properly, loss of<br>monitoring and control, bypass behavior?                           | 3        | Marginal        | adiation damage                                                    | .01 evts/yea             | ars  |
|                                 |                                                                                                                | \$2.3.7  | Internal capacitors damaged by radiation                                                                   | capacitance value/behavior changes                 | → PSPP not working properly                                                                              | 8        | Marginal        | adiation damage                                                    | 101 evts/yea             | L S  |
|                                 |                                                                                                                | S2.3.8   | Internal resistors damaged by radiation                                                                    | resistor value/behavior changes                    | shift of thresholds and reference voltages<br>→ early or late bypass activation                          | 8        | Critical        | adiation damage                                                    | .01 evts/yea             | ars  |
|                                 |                                                                                                                | S2.3.9   | PSPP transmitter failure (not answering)                                                                   | Commands are not acknowledged                      | No monitoring possible, command<br>execution still possible. Automatic bypass<br>activation still active | 8        | Marginal        | befect in PSPP, broken communication line                          | 0.1 evts/yea             | Sar  |
| ddSd                            | PSPP                                                                                                           | \$2.3.10 | PSPP receiver failure (not receiving)                                                                      | Commands are not received                          | No control and no monitoring from<br>outside. Automatic bypass activation still<br>active                | 2        | Marginal        | befect in PSPP, broken communication line                          | 0.1 evts/yea             | Sar  |
|                                 |                                                                                                                | \$2.3.11 | Bypass stuck closed                                                                                        | Bypass of serial current always active             | Module permanently shorted                                                                               | e        | Critical        | befect in PSPP, short on flex                                      | .01 evts/yea             | ars  |
|                                 |                                                                                                                |          |                                                                                                            |                                                    |                                                                                                          |          |                 |                                                                    |                          |      |

|                                 | System                                 |          |                   |                                                                |                                                                                      |                                            |      |                                                                                                                                                                     |
|---------------------------------|----------------------------------------|----------|-------------------|----------------------------------------------------------------|--------------------------------------------------------------------------------------|--------------------------------------------|------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Inne                            | er Detector                            | Ref.     | Occurrence Rating | Current Design controls                                        | Current Designs                                                                      | Detection and Correction                   | RPN  | Recommanded Actions                                                                                                                                                 |
| Component                       | Component function                     |          |                   |                                                                | Rai                                                                                  | nk                                         |      |                                                                                                                                                                     |
|                                 |                                        | S1.1.1   | Remote            | PSPP and DAQ                                                   | none                                                                                 | 7 Certain detection / No correction        | 2.1  | Activate bypass to have a reliable short                                                                                                                            |
|                                 |                                        | S1.1.2   | Improbable        | DAQ                                                            | Other FEs                                                                            | 1 Certain Detection / Automatic correction | 0.02 |                                                                                                                                                                     |
|                                 | Module power (shunt                    | S1.1.3   | Remote            | DAQ and DCS Computer                                           | Bypass in PSPP                                                                       | 1 Certain Detection / Automatic correction | 0.4  | PSPP activates automatically bypass in case of overvoltage                                                                                                          |
| Module (Sensor<br>and FF chins) | Icguatory                              | S1.1.5   | Remote            | PSPP and DAQ                                                   | Bypass in PSPP                                                                       | 1 Certain Detection / Automatic correction | 0.2  | PSPP on stave flex, to have the possibility of bypassing the module even if disconnected                                                                            |
|                                 |                                        |          |                   |                                                                |                                                                                      |                                            |      |                                                                                                                                                                     |
|                                 | Cooling / Glue to<br>support structure | S1.3.1   | Remote            | PSPP                                                           | PSPP with automatic<br>bypass                                                        | 1 Certain Detection / Automatic correction | 0.3  | PSPP activates automatically bypass in case of overheat                                                                                                             |
|                                 |                                        |          |                   |                                                                |                                                                                      |                                            |      |                                                                                                                                                                     |
|                                 |                                        | S2.1.1   | Improbable        | Cross-check/compare with FE<br>temperature/neighbor<br>modules | Monitoring yes,<br>interlock no                                                      | 4 Certain detection / User correction      | 0.12 | Additional Temperature measurement in FE for comparison would help to detect<br>It                                                                                  |
|                                 |                                        | S2.1.2   | Improbable        | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 0.14 | Two NTCs on module for redundancy forseen                                                                                                                           |
|                                 | NTC                                    | S2.1.3   | Improbable        | Cross-check/compare with HE<br>temperature/neighbor<br>modules | None                                                                                 | 7 Certain detection / No correction        | 0.21 | $2^{rd}$ NTC for redunancy and ignore problematic NTC                                                                                                               |
|                                 |                                        | S2.1.3   | Improbable        | DCS Computer                                                   | Deactivate module<br>interlock                                                       | 4 Certain detection / User correction      | 0.12 | Automatic temperature activation within a temperature band                                                                                                          |
|                                 |                                        |          |                   |                                                                |                                                                                      |                                            |      |                                                                                                                                                                     |
|                                 |                                        | S2.2.1   | Improbable        | DCS Computer                                                   | PSU replaceble                                                                       | 4 Certain detection / User correction      | 0.12 | Passive state defined with bypass off for least impact on detector.<br>Multiple wirebonds for power input                                                           |
|                                 |                                        | S2.2.2   | Improbable        | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 0.21 |                                                                                                                                                                     |
|                                 | 300 C                                  | S2.2.3   | Improbable        | DCS Computer                                                   | yes                                                                                  | 1 Certain Detection / Automatic correction | 0.02 | should be defined correctly in the power on scripts                                                                                                                 |
|                                 | laword cord                            | S2.2.4   | Improbable        | DCS Computer                                                   | yes, reconfigure PSU                                                                 | 1 Certain Detection / Automatic correction | 0.03 | automatic scripts to reconfigure PSU. Should be tested in lab                                                                                                       |
|                                 |                                        | S2.2.5   | Improbable        | DCS Computer                                                   | yes, increase current<br>limit                                                       | 1 Certain Detection / Automatic correction | 0.02 | Specification for DCS PSU with sufficient current margin. Bias resistors with<br>sufficient power rating. Current increase should be verified in irradiation tests! |
|                                 |                                        |          |                   |                                                                |                                                                                      |                                            |      |                                                                                                                                                                     |
|                                 |                                        | S2.3.1   | Remote            | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 1.4  | Investigation of possible failure modes (fluctuations on regulated voltage etc).<br>Chip should not go into undefined state                                         |
|                                 |                                        | \$2.3.2  | Remote            | DCS Computer                                                   | None for automatic<br>bypass, DCS<br>computer<br>calculation for<br>monitoring value | 7 Certain detection / No correction        | 2.1. | Read with PSPP ADC an external reference voltage (e.g. power) which is known.<br>Verify operation and stability of voltage in irradiation tests!                    |
|                                 |                                        | S2.3.3   | Improbable        | DCS and DAQ                                                    | reset register<br>through command                                                    | 4 Certain detection / User correction      | 0.12 | Bypass can be corrected by command from the DCS computer.<br>TMR implementation is promising. To be verified in actual bypass logic.                                |
|                                 |                                        | S2.3.4   | Improbable        | DCS Computer                                                   | repeat<br>communication                                                              | 1 Certain Detection / Automatic correction | 0.01 | TMR implementation is promising.                                                                                                                                    |
|                                 |                                        | S2.3.5   | Improbable        | DCS Computer                                                   | yes                                                                                  | 1 Certain Detection / Automatic correction | 0.03 | SEE irradiation with PSPP chip!                                                                                                                                     |
|                                 |                                        | S2.3.6   | Improbable        | DCS Computer                                                   | evtl power cycling                                                                   | 4 Certain detection / User correction      | 0.08 | SEE irradiation with PSPP chip!                                                                                                                                     |
|                                 |                                        | S2.3.7   | Improbable        | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 0.14 | Irradiations until now didn't show any problems.                                                                                                                    |
|                                 |                                        | S2.3.8   | Improbable        | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 0.21 | Irradiations until now didn't show any problems.                                                                                                                    |
|                                 |                                        | S2.3.9   | Remote            | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 1.4  |                                                                                                                                                                     |
| PSPP                            | ddSd                                   | \$2.3.10 | Remote            | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 1.4  |                                                                                                                                                                     |
|                                 |                                        | S2.3.11  | Improbable        | DCS Computer                                                   | None                                                                                 | 7 Certain detection / No correction        | 0.21 | Programmable chip power off, which requires power cycle to deactivate                                                                                               |

Appendix C Failure Mode and Effects Analysis

Niklaus Lehmann

164

| Contrast.          |             |                                                  |                                               |                                                                                                             | -        |               |                                                   | Commence for                         |
|--------------------|-------------|--------------------------------------------------|-----------------------------------------------|-------------------------------------------------------------------------------------------------------------|----------|---------------|---------------------------------------------------|--------------------------------------|
| Inner Detector     |             |                                                  | Potential Eff                                 | fects of failure                                                                                            | ^        | :             |                                                   | all elements in                      |
| Component Componen | It function | Potential Failure Mode                           | Local Effect                                  | Final Effect                                                                                                | se<br>SE | erity kating  | Potential cause of Mechanism of Failure           | detector<br>together)<br>[Less Than] |
|                    | S2.3.1      | 2 Bypass stuck open                              | Bypass can't be activated                     | Modules can't be switched off                                                                               | 2        | Marginal C    | efect in PSPP, broken connection to module power  | 0.01 evts/years                      |
|                    | S2.3.1      | 3 Bypass driver failure                          | control signal for bypass in unknown<br>state | Module affected in worst case                                                                               | 3        | Critical      | adiation damage                                   | 0.01 evts/years                      |
|                    | S2.3.1      | 4 Bypass not properly opening                    | increased current through bypass              | Module sees less current                                                                                    | e        | Critical      | adiation damage, Defect in PSPP                   | 0.01 evts/years                      |
|                    | S2.3.1      | 5 Bypass oscillating                             | Bypass toggeling between open and closed      | Transients in LV current and noise injected into the chain                                                  | 4<br>Ca  | tastrophic B  | ypass driver failure                              | 0.01 evts/years                      |
|                    | S2.3.1      | 6 ADC failure                                    | No analog values                              | No monitoring of environment and<br>module voltage. Automatic bypass<br>activation is still working         | 3        | Marginal C    | efect in PSPP, broken connection to analog values | 0.01 evts/years                      |
|                    | S2.3.1      | 7 Comparator output stuck active                 | Automatic bypass control always on            | Bypass is active involuntary                                                                                | e        | Critical      | efect in PSPP                                     | 0.01 evts/years                      |
|                    | S2.3.1      | 8 Comparator output stuck inactive               | Automatic bypass control not working          | Bypass is not activated automatically                                                                       | 2        | Marginal C    | efect in PSPP                                     | 0.01 evts/years                      |
|                    | S2.3.1      | 9 Address bump bond failing                      | Two chips having same address in SCB          | Affected chip don't answer correctly → loss of monitoring. Bypass activation only for both at the same time | ~        | Marginal C    | efect in Flex / bump failure                      | 0.01 evts/years                      |
|                    |             |                                                  |                                               |                                                                                                             |          |               |                                                   |                                      |
|                    | 52.4.       | 1 SCB capacitor short                            | Direct connection to bus                      | Can lead to large voltages on bus →<br>controller sees large voltage, controller<br>failure                 | e        | Critical      | efect in capacitor (maybe overvoltage)            | 0.01 evts/years                      |
|                    | 52.4.:      | 2 SCB capacitor value change (radiation induced) | Bus behavior changed                          | Communication problems possible                                                                             | <b>1</b> | significant R | adiation damage                                   | 0.01 evts/years                      |
|                    | 52.4.:      | 3 SCB capacitor open                             | Connection to bus lost                        | No control and no monitoring from<br>outside. Automatic bypass activation still<br>active                   | ~        | Marginal C    | onnection failure, solder failure                 | 0.01 evts/years                      |
|                    | S2.4.       | 4 Regulator Filter capacitor short               | PSPP regulator output shorted                 | No power for PSPP                                                                                           | ~        | Marginal C    | efect in capacitor                                | 0.01 evts/years                      |
| External co        | S2.4.       | Regulator Filter capacitor value change or open  | Stability of regulator affected               | Regulator could start oscillating, PSPP not<br>working properly anymore                                     | ~        | Marginal C    | efect in capacitor                                | 0.01 evts/years                      |
|                    | S2.4.(      | 5 Power bias resistor short                      | PSPP at global supply voltage                 | Overvoltage at input will destroy PSPP                                                                      | ~        | Marginal C    | efect in resistor                                 | 0.01 evts/years                      |
|                    | S2.4.       | Power bias resistor open                         | PSPP not powered                              | Loss of control and monitoring of single<br>module                                                          | 2        | Marginal C    | efect in resistor                                 | 0.01 evts/years                      |
|                    | S2.4.1      | 8 Power bias resistor value change               | Current for PSPP modified                     | Power consumption modified, worst case<br>PSPP not properly working                                         | ~        | Marginal E    | efect in resistor                                 | 0.01 evts/years                      |
|                    | S2.4.       | Vglobal resistor short                           | PSPP at global supply voltage                 | Overvoltage at input will destroy PSPP                                                                      | 2        | Marginal C    | efect in resistor                                 | 0.01 evts/years                      |
|                    | 52.4.1      | 0 Vglobal resistor open or value change          | Change of measured global reference voltage   | Could be misinterpreted as drift in PSPP<br>reference                                                       | ~        | Marginal E    | efect in resistor                                 | 0.01 evts/years                      |
|                    |             |                                                  |                                               |                                                                                                             |          |               |                                                   |                                      |

 $\label{eq:appendix} Appendix \ C \ \ \mbox{Failure Mode and Effects Analysis}$ 

| 15        | vstem                |          |                   |                                                |                                |                                            |      |                                                                                                                                      |
|-----------|----------------------|----------|-------------------|------------------------------------------------|--------------------------------|--------------------------------------------|------|--------------------------------------------------------------------------------------------------------------------------------------|
| Inner     | Detector             | 1        | Occurrence Bating | Current Design controls                        | Current Designs                | Detection and Correction                   | NUD  | Decommended Actions                                                                                                                  |
| Component | Component function   |          |                   | detections                                     | controls correction            | tank                                       |      |                                                                                                                                      |
|           |                      | S2.3.12  | Improbable        | DCS Computer with specific bypass test         | None                           | 9 Low detection / No correction            | 0.18 | To detect this failure, the bypass would have to be activated periodically and<br>tested. This could be done during a power up cycle |
|           |                      | S2.3.13  | Improbable        | DCS Computer                                   | None                           | 4 Certain detection / User correction      | 0.12 | Irradiation tests necessary                                                                                                          |
|           |                      | S2.3.14  | Improbable        | DCS Computer                                   | None                           | 4 Certain detection / User correction      | 0.12 | Irradiation tests necessary, systems tests important                                                                                 |
|           |                      | S2.3.15  | Improbable        | DCS and DAQ                                    | Power off PSPP                 | 4 Certain detection / User correction      | 0.16 |                                                                                                                                      |
|           |                      | S2.3.16  | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 | Environmental values could be retrieved through other paths (FE)                                                                     |
|           |                      | \$2.3.17 | Improbable        | DCS Computer                                   | Deactivate Module<br>interlock | 4 Certain detection / User correction      | 0.12 |                                                                                                                                      |
|           |                      | S2.3.18  | Improbable        | DCS Computer when values go<br>above threshold | None                           | 9 Low detection / No correction            | 0.18 | To detect this failure, the bypass would have to be activated periodically and tested. This could be done during a power up cycle    |
|           |                      | S2.3.19  | Improbable        | DCS Computer                                   | None (when<br>installed)       | 7 Certain detection / No correction        | 0.14 | Test of flex before installation. All address pads have two bumps for redundancy                                                     |
|           |                      |          |                   |                                                |                                |                                            |      |                                                                                                                                      |
|           |                      | S2.4.1   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.21 | Proper tests before installation. Capacitor ratings large enough                                                                     |
|           |                      | S2.4.2   | Improbable        | DCS Computer                                   | repeat<br>communication        | 1 Certain Detection / Automatic correction | 0.01 | nothing seen in tests so far. Irradiate capacitor with PSPP                                                                          |
|           |                      | S2.4.3   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           |                      | S2.4.4   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           | Eutomol someon on to | S2.4.5   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           |                      | S2.4.6   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           |                      | S2.4.7   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           |                      | S2.4.8   | Improbable        | DCS Computer through PSU readings              | Adjust PSU settings            | 5 High detection / User correction         | 0.1  | Enough power margin on bias resistors                                                                                                |
|           |                      | S2.4.9   | Improbable        | DCS Computer                                   | None                           | 7 Certain detection / No correction        | 0.14 |                                                                                                                                      |
|           |                      | S2.4.10  | Improbable        | DCS Computer                                   | Adjust conversion<br>parameter | 5 High detection / User correction         | 0.1  | If not an open, very difficult to differentiate from internal voltage drift. But very unlikely as resistors are to be affected       |
|           |                      |          |                   |                                                |                                |                                            |      |                                                                                                                                      |
|           |                      |          |                   |                                                |                                |                                            |      |                                                                                                                                      |

166