Reliability of hardware for machine learning and autonomous systems

Abstract

The PhD project aims at novel solutions for cross-layer reliability and self-health awareness for tomorrow’s intelligent autonomous systems and IoT edge devices. The enormous complexity of today’s advanced cyber-physical systems and systems of systems is multiplied by their heterogeneity and the emerging computing architectures employing AI-based autonomy. The researcher will study the dominant hardware reliability concerns, such as radiation-induced soft errors and nanoelectronics ageing specific to AI/ML hardware, processor architectures, such as RISC-V, and advanced FPGA SoCs.

Research field:	Information and communication technology
Supervisor:	Prof. Dr. Maksim Jenihhin
Availability:	This position is available.
Offered by:	School of Information Technologies Department of Computer Systems
Application deadline:	Applications are accepted between January 02, 2023 00:00 and January 22, 2023 23:59 (Europe/Zurich)

Description

These days we are witnessing a new trend of transitioning from microelectronics- and IT-enabled automation to AI-enabled autonomy. The accelerated expansion of Intelligent Autonomous Systems (IAS) has recently created one of the fastest-growing markets and enabled numerous unprecedented novel services and businesses. However, the opportunities come along with computationally extremely challenging mission- and safety-critical application scenarios. The enormous complexity of today’s advanced cyber-physical systems and systems of systems is multiplied by their heterogeneity and fault-prone immature emerging computing architectures for running the AI. The setups, such as autonomous swarms of robotic vehicles, are already on the doorstep and call for novel approaches for reliability across all the layers. Runtime self-health awareness and infrastructure for in-field self-healing are becoming an enabling factor for new IoT edge devices and systems on the way to market.

This PhD position is funded by a national research project PUT PRG1467 CRASHLESS “Cross-Layer Reliability and Self-Health Awareness for Intelligent Autonomous Systems” (2022-2026). The project is led by Prof. Maksim Jenihhin and implemented by a research team TECH (Trustworthy and Efficient Computing Hardware) at the Department of Computer Systems.

The PhD project aims at novel solutions for cross-layer reliability and self-health awareness for tomorrow’s intelligent autonomous systems and IoT edge devices. The researcher will study the dominant hardware reliability concerns, such as radiation-induced soft errors and nanoelectronics ageing specific to AI/ML hardware, processor architectures, such as RISC-V, and advanced FPGA SoCs. The particular topics of interest are:

reliability assessment for DNN (Deep Neural Networks) hardware accelerators
architecture- and layer-specific reliability assessment and fault modelling
cross-layer fault event monitoring with tailored fault detection and localisation
AI/ML-assisted techniques for system fault resilience

To efficiently achieve the training and research objectives, the researcher will be integrated in the team, collaborate with other PhD students and staff at the Department of Computer Systems and benefit from TECH’s established excellent international cross-sectoral collaboration with the leading academic and industrial institutions in the EU and worldwide.

We are looking for motivated individuals with a strong background in hardware design.

Applicants should fulfil the following requirements:

a master’s degree in computer engineering (or similar)
a clear interest in the topic of the position (candidates with embedded systems and machine learning backgrounds are preferred)
basic understanding of reliability and machine learning concepts
good skills in VHDL or Verilog
ability to code in C/C++ or Python
English language proficiency
strong writing and communication skills compatible with an entry-level research position
capacity to work both as an independent researcher and as part of an international team
capacity and willingness to provide assistance in organisational tasks relevant to the project

The following experience is beneficial:

research and/or professional experience, ability and interest to collaborate across disciplines
familiarity with FPGA development
familiarity with EDA tools
familiarity with ML algorithms and DNN architectures
previous research publications at conferences or journals

The candidate should submit a research plan for the topic. The candidate can expand on the outlined research scope and propose theoretical lenses to be used.

We offer:

4-year PhD position in the Department of Computer Systems that has a sound portfolio of ongoing European and national research projects
An environment to do excellent research and publications
Opportunities for training relevant technical and transferable skills aiming academic or industrial careers
Opportunities for conference visits, research stays and networking with globally leading companies, universities and research centres in the field of research

About the research group

The Centre for Trustworthy and Efficient Computing Hardware (TECH) belongs to the Department of Computer Systems. It focuses on cross-layer reliability and self-health awareness technology for tomorrow’s complex intelligent autonomous systems and IoT edge devices in Estonia and the EU. The team studies advanced cyber-physical systems characterised by their heterogeneity and emerging computing architectures employing AI-based autonomy. The centre generates knowledge to equip engineers with design-phase solutions and in-field instruments for industry-scale systems to facilitate the system’s crashless operation. The core competencies of TECH are: Hardware design; VHDL and Verilog designs; EDA tools (Cadence, Siemens, Synopsys platforms); Application-specific computing platforms (Unmanned Aerial Vehicles); FPGA-based solutions and methodologies; Advanced FPGA SoCs and FPGA development tools (Xilinx Vivado, Altera/Intel Quartus, Lattice Diamond); Software and embedded SW development; Bare-metal and User-space applications; Cross-layer reliability and fault management; ML-based solutions; Functional Safety (ISO26262); Test strategy development and troubleshooting instrumentation; JTAG/IJTAG based solutions; RISC-V processor architectures; DNN hardware accelerators.

Head of the centre: Prof. Maksim Jenihhin

Maksim Jenihhin is an associate professor of Computing Systems Reliability at the Department of Computer Systems of Tallinn University of Technology and the head of the research group “Trustworthy and Efficient Computing Hardware”. He received his PhD degree in Computer Engineering from the same university in 2008. His research interests include methodologies and EDA tools for hardware design, verification and debugging as well as nanoelectronics reliability and manufacturing test topics. He supervised 5 PhD theses and is currently the main supervisor for 3 PhD students and 2 postdocs. His latest graduates are successful researchers and engineers at the top companies in the domain, i.e. Nokia (Finland), Cadence Design Systems (Germany), IROC Technologies (France), and IBM (India). Maksim published more than 150 peer-reviewed publications and coordinated national and European research projects, including H2020 MSCA ITN “RESCUE - Interdependent Challenges of Reliability, Security and Quality in Nanoelectronic Systems Design”, PRG 2022 “CRASHLESS- Cross-Layer Reliability and Self-Health Awareness for Intelligent Autonomous Systems”. Prof. Jenihhin is a member of executive and program committees for IEEE ETS, DATE, DDECS, and a number of other international events and served as a guest editor for special issues of journals.

For further information, please contact Prof Maksim Jenihhin (maksim.jenihhin@taltech.ee).