publications
2024
- DAC’24Conjuring: Leaking Control Flow via Speculative Fetch AttacksAli Hajiabadi, and Trevor E. CarlsonIn Proceedings of 61st Design Automation Conference (DAC 2024), Jun 2024
Five papers selected out of 337 research papers of the program (5/337 = 1.5%)
In this work, we propose a new attack called Conjuring that exploits one of the main features of CPUs’ frontend: speculative fetch of instructions. We show that the Pattern History Table (PHT) in modern CPUs are a great channel to learn and leak control flow of victim applications. Unlike prior work, Conjuring does not require that one primes the PHT or interferes with the victim execution enabling a realistic and unprivileged attacker to leak control flow information. By improving the branch predictors, our attack becomes even more serious and practical. We demonstrate the feasibility of our attack on different existing Intel, AMD, and Apple CPUs.
- DAC’24Levioso: Efficient Compiler-Informed Secure SpeculationAli Hajiabadi, Archit Agarwal, Andreas Diavastos, and Trevor E. CarlsonIn Proceedings of 61st Design Automation Conference (DAC 2024), Jun 2024
Spectre-type attacks have exposed a major class of vulnerabilities arising from speculative execution of instructions, the main performance enabler of modern CPUs. These attacks speculatively leak secrets that have been either speculatively loaded (seen in sandboxed programs) or non-speculatively loaded (seen in constant-time programs). Various hardware-only defenses have been proposed to mitigate both speculative and non-speculative secrets via all potential transmission channels. However, limited program knowledge is exposed to the hardware and these solutions conservatively restrict the execution of all instructions that can potentially leak. In this work, we show that not all instructions depend on older unresolved branches and they can safely execute without leaking speculative information. We present Levioso, a novel hardware/software co-design, that provides comprehensive secure speculation guarantees while reducing performance overhead compared to existing defenses. Levioso informs the hardware about true branch dependencies and applies restrictions only when necessary. Our evaluations demonstrate that Levioso is able to significantly reduce the performance overhead compared to two prior defenses from 51% and 43% to just 23%.
- HPCA’24GadgetSpinner: A New Transient Execution Primitive using the Loop Stream DetectorYun Chen*, Ali Hajiabadi*, and Trevor E. CarlsonIn Proceedings of 30th International Symposium on High-Performance Computer Architecture (HPCA 2024), Mar 2024
Transient execution attacks constitute a major class of attacks affecting all modern out-of-order CPUs. These attacks exploit transient execution windows (i.e., the instructions that execute but never commit) to leak confidential information from victims. Existing attacks either rely on branch mispredictions, incorrect memory speculation, or deferred exception handling to create transient windows. In this work, we introduce a new transient execution primitive, called GadgetSpinner. We exploit the Loop Stream Detector (LSD) in Intel processors to perform out-of-loop-bounds execution and perform illegal operations. Our key observation is that the LSD holds on to an old copy of branch predictions from the first iteration of the loop and keeps using this copy until a branch misprediction occurs, i.e., advances beyond the loop bound. We exploit the delay between the speculative iteration of the loop and when the branch misprediction is resolved. In this paper, we analyze the transient execution of the LSD and perform end-to-end attacks to (1) perform illegal reads from protected memory regions, (2) bypass Intel SGX and extract the weights of a trained CNN model in DNNL library, (3) break Kernel ASLR (KASLR), and finally (4) perform cross-core/cross-process attacks. We also show that many defenses for prior transient execution attacks, like secure Branch Prediction Unit (BPU) designs, fail to protect against GadgetSpinner.
- HPCA’24PrefetchX: Cross-Core Cache-Agnostic Prefetcher-Based Side-Channel AttacksYun Chen, Ali Hajiabadi, Lingfeng Pei, and Trevor E. CarlsonIn Proceedings of 30th International Symposium on High-Performance Computer Architecture (HPCA 2024), Mar 2024
n this paper, we reveal the existence of a new class of prefetcher, the XPT prefetcher, in modern Intel processors which has never been officially detailed. It speculatively issues a load, bypassing last-level cache (LLC) lookups, when it predicts that a load request will result in an LLC miss. We demonstrate that XPT prefetcher is shared among different cores, which enables an attacker to build cross-core side-channel and covert-channel attacks. We propose PrefetchX, a cross-core attack mechanism, to leak users’ sensitive data and activities. We empirically demonstrate that PrefetchX can be used to extract private keys of real-world RSA applications. Furthermore, we show that PrefetchX can enable side-channel attacks that can monitor keystrokes and network traffic patterns of users. Our two cross-core covert-channel attacks also see a low error rate and a 122 KiB/s maximum channel capacity. Due to the cache-independent feature of PrefetchX, current cache-based mitigations are not effective against our attacks. Overall, our work uncovers a significant vulnerability in the XPT prefetcher, which can be exploited to compromise the confidentiality of sensitive information in both cryptography and non-cryptography-related applications among processor cores.
- TACO’24PARADISE: Criticality-Aware Instruction Reordering for Power Attack ResistanceYun Chen*, Ali Hajiabadi*, Romain Poussier, Yaswanth Tavva, Andreas Diavastos, Shivam Bhasin, and Trevor E. CarlsonACM Transactions on Architecture and Code Optimization (TACO), Mar 2024
Power side-channel attacks exploit the correlation of power consumption with the instructions and data being processed to extract secrets from a device (e.g., cryptographic keys). Prior work primarily focused on protecting small embedded micro-controllers and in-order processors rather than high-performance, out-of-order desktop and server CPUs. In this paper, we present PARADISE, a general-purpose out-of-order processor with always-on protection, that implements a novel dynamic instruction scheduler to provide obfuscated execution and mitigate power analysis attacks. To achieve this, we exploit the time between operand availability of critical instructions (slack) and create high-performance random schedules. Further, we highlight the dangers of using incorrect adversarial assumptions, which can often lead to a false sense of security. Therefore, we perform an extended security analysis on AES-128 using different levels of adversaries, from basic to advanced, including a CNN-based attack. Our advanced security evaluation assumes a strong adversary with full knowledge of the countermeasure and demonstrates a significant security improvement of 556× when combined with Boolean Masking over a baseline only protected by masking, and 62, 500× over an unprotected baseline. The resulting overhead in performance, power and area of PARADISE is 3.2%, 1.2% and 0.8% respectively.
2023
- ICCAD’23HidFix: Efficient Mitigation of Cache-based Spectre Attacks through Hidden RollbacksArash Pashrashid, Ali Hajiabadi, and Trevor E. CarlsonIn Proceedings of 42nd International Conference on Computer-Aided Design (ICCAD 2023), Nov 2023
Mitigating Spectre attacks in modern systems is a chal- lenging task for CPU vendors as they need to provide comprehensive protection while maintaining high efficiency. One common solution is to adopt always-on mitigation strategies to prevent all speculative data leaks. However, these solutions incur prohibitive performance overheads as they limit the benefits of speculative execution, the main performance enabler of modern processors. Additionally, recent attacks have demonstrated the limitations of many existing defenses. Combining side-channel attack (SCA) detectors with mitigation strategies is a promising direction to achieve efficient and selective mitigation of Spectre attacks. In this work, we enumerate the combinations of state-of-the-art detection and mitigation strategies and present both new attacks as well as the potential risks of such detection/mitigation combinations. The result is the HidFix methodology, an efficient mitigation for cache-based Spectre attacks, that addresses the security limitations of prior work. We show that HidFix has a near-zero performance overhead for all evaluated applications. HidFix rollbacks the misspeculated data leaks in a timely manner, before an attacker has the chance to infer the victim’s sensitive data. We demonstrate that HidFix is more secure compared to prior cache-based Spectre defenses, and moreover, it does not introduce new side effects that might enable an attacker to observe secret dependent changes in the system.
2022
- ICCAD’22Fast, Robust and Accurate Detection of Cache-based Spectre Attack PhasesArash Pashrashid, Ali Hajiabadi, and Trevor E. CarlsonIn Proceedings of 41st International Conference on Computer-Aided Design (ICCAD 2022), Nov 2022
Modern processors achieve high performance and efficiency by employing techniques such as speculative execution and sharing resources such as caches. However, recent attacks like Spectre and Meltdown exploit the speculative execution of modern processors to leak sensitive information from the system. Many mitigation strategies have been proposed to restrict the speculative execution of processors and protect potential side-channels. Currently, these techniques have shown a significant performance overhead. A solution that can detect memory leaks before the attacker has a chance to exploit them would allow the processor to reduce the performance overhead by enabling protections only when the system is at risk. In this paper, we propose a mechanism to detect speculative execution attacks that use caches as a side-channel. In this detector we track the phases of a successful attack and raise an alert before the attacker gets a chance to recover sensitive information. We accomplish this through monitoring the microarchitectural changes in the core and caches, and detect the memory locations that can be potential memory data leaks. We achieve 100% accuracy and negligible false positive rate in detecting Spectre attacks and evasive versions of Spectre that the state-of-the-art detectors are unable to detect. Our detector has no performance overhead with negligible power and area overheads.
2021
- ASPLOS’21NOREBA: A Compiler-Informed Non-speculative Out-of-Order Commit ProcessorAli Hajiabadi, Andreas Diavastos, and Trevor E. CarlsonIn Proceedings of 26th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2021), Apr 2021
Modern superscalar processors execute instructions out-of-order, but commit them in program order to provide precise exception handling and safe instruction retirement. However, in-order instruction commit is highly conservative and holds on to critical resources far longer than necessary, severely limiting the reach of general-purpose processors, ultimately reducing performance. Solutions that allow for efficient, early reclamation of these critical resources could seize the opportunity to improve performance. One such solution is out-of-order commit, which has traditionally been challenging due to inefficient, complex hardware used to guarantee safe instruction retirement and provide precise exception handling. In this work, we present NOREBA, a processor for Non-speculative Out-of-order REtirement via Branch Reconvergence Analysis. In NOREBA, we enable non-speculative out-of-order commit and resource reclamation in a light-weight manner, improving performance and efficiency. We accomplish this through a combination of (1) automatic compiler annotation of true branch dependencies, and (2) an efficient re-design of the reorder buffer from traditional processors. By exploiting compiler branch dependency information, this system achieves 95% of the performance of aggressive, speculative solutions, without any additional speculation, and while maintaining energy efficiency.
- CGO’21ELFies: Executable Region Checkpoints for Performance Analysis and SimulationHarish Patil, Alexander Isaev, Wim Heirman, Alen Sabu, Ali Hajiabadi, and Trevor E. CarlsonIn Proceedings of 19th International Symposium on Code Generation and Optimization (CGO 2021), Mar 2021
We address the challenge faced in characterizing long-running workloads, namely how to reliably focus the detailed analysis on interesting execution regions. We present a set of tools that allows users to precisely capture any region of interest in program execution, and create a stand-alone executable, called an ELFie, from it. An ELFie starts with the same program state captured at the beginning of the region of interest and then executes natively. With ELFies, there is no fast-forwarding to the region of interest needed or the uncertainty of reaching the region. ELFies can be fed to dynamic program-analysis tools or simulators that work with regular program binaries. Our tool-chain is based on the PinPlay framework and requires no special hardware, operating system changes, re-compilation, or re-linking of test programs. This paper describes the design of our ELFie generation tool-chain and the application of ELFies in performance analysis and simulation of regions of interest in popular long-running single and multi-threaded benchmarks.
- TOCS’21Highly Concurrent Latency-tolerant Register Files for GPUsMohammad Sadrosadati, Amirhossein Mirhosseini, Ali Hajiabadi, Seyed Borna Ehsani, Hajar Falahati, Hamid Sarbazi-Azad, Mario Drumond, Babak Falsafi, Rachata Ausavarungnirun, and Onur MutluACM Transactions on Computer Systems (TOCS), Mar 2021
Graphics Processing Units (GPUs) employ large register files to accommodate all active threads and accelerate context switching. Unfortunately, register files are a scalability bottleneck for future GPUs due to long access latency, high power consumption, and large silicon area provisioning. Prior work proposes hierarchical register file to reduce the register file power consumption by caching registers in a smaller register file cache. Unfortunately, this approach does not improve register access latency due to the low hit rate in the register file cache. In this paper, we propose the Latency-Tolerant Register File (LTRF) architecture to achieve low latency in a two-level hierarchical structure while keeping power consumption low. We observe that compile-time interval analysis enables us to divide GPU program execution into intervals with an accurate estimate of a warp’s aggregate register working-set within each interval. The key idea of LTRF is to prefetch the estimated register working-set from the main register file to the register file cache under software control, at the beginning of each interval, and overlap the prefetch latency with the execution of other warps. We observe that register bank conflicts while prefetching the registers could greatly reduce the effectiveness of LTRF. Therefore, we devise a compile-time register renumbering technique to reduce the likelihood of register bank conflicts. Our experimental results show that LTRF enables high-capacity yet long-latency main GPU register files, paving the way for various optimizations. As an example optimization, we implement the main register file with emerging high-density high-latency memory technologies, enabling 8× larger capacity and improving overall GPU performance by 34%.