Publications

Textbook

Programming Massively Parallel Processors: A Hands-on Approach

W.-M. Hwu, D. Kirk, I. El Hajj
Morgan Kaufmann ’22
[website] [supplementary materials] [channel]

Papers

Parallelizing Maximal Clique Enumeration on GPUs

M. Almasri, Yen-Hsiang Chang, I. El Hajj, R. Nagi, J. Xiong, W.-M. Hwu
PACT ’23
[paper] [slides] [code]

SimplePIM: A Software Framework For Productive And Efficient Processing-in-Memory

J. Chen, J. G'{o}mez-Luna, I. El Hajj, Y. Guo, O. Mutlu
PACT ’23
[paper] [slides] [code]

Efficient algorithms to solve atom reconfiguration problems. II. Assignment-rerouting-ordering algorithm

R. El Sabeh, J. Bohm, Z. Ding, S. Maaz, N. Nishimura, I. El Hajj, A. E. Mouawad, A. Cooper
Physical Review A ’23
[paper]

Efficient algorithms to solve atom reconfiguration problems. I. Redistribution-reconfiguration algorithm

B. Cimring, R. El Sabeh, M. Bacvanski, S. Maaz, I. El Hajj, N. Nishimura, A. E. Mouawad, A. Cooper
Physical Review A ’23
[paper]

Predicting the Performance-Cost Tarde-off of Applications Across Multiple Systems

A. Nassereldine, S. Diab, M. Baydoun, K. Leach, M. Alt, D. Milojicic, I. El Hajj
CCGrid ’23
[paper] [slides]

A Framework for High-throughput Sequence Alignment using Real Processing-in-Memory Systems

S. Diab, A. Nassereldine, M. Alser, J. Gómez-Luna, O. Mutlu, I. El Hajj
Bioinformatics ’23
[paper] [slides] [video] [code]

Parallel K-Clique Counting on GPUs

M. Almasri, I. El Hajj, R. Nagi, J. Xiong, W.-M. Hwu
ICS’22 (acceptance rate: 39/165 = 23.64%)
[paper] [short slides] [video] [code]

Benchmarking a New Paradigm: Experimental Analysis and Characterization of a Real Processing-in-Memory System

J. Gómez-Luna, I. El Hajj, Ivan Fernandez, C. Giannoula, G. Oliveira, O. Mutlu
IEEE Access ’22
[paper] [slides] [video] [short video] [code]

ASAP: Architecture Support for Asynchronous Persistence

A. Abulila, I. El Hajj, M. Jung, N. S. Kim
ISCA’22 (acceptance rate: 67/400 = 16.75%)
[paper] [slides] [video]

Parallel Vertex Cover Algorithms on GPUs

P. Yamout, K. Barada, A. Jaljuli, A. Mouawad, I. El Hajj
IPDPS’22 (1st round acceptance rate: 46/474 = 9.7%)
[paper] [slides] [video] [code]

A Compiler Framework for Optimizing Dynamic Parallelism on GPUs

M. G. Olabi, J. Gómez-Luna, O. Mutlu, W.-M. Hwu, I. El Hajj
CGO’22 (acceptance rate: 27/94 = 28.7%)
[paper] [slides] [short slides] [video] [short video] [code]

KTrussExplorer: Exploring the Design Space of K-truss Decomposition Optimizations on GPUs

S. Diab, M. G. Olabi, I. El Hajj
HPEC ’20
[paper] [slides] [code]

PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-efficient ReRAM

A. Ankit, I. El Hajj, S. Chalamalasetti, S. Agarwal, M. Marinella, M. Foltin, J. P. Strachan, D. Milojicic, W.-M. Hwu, K. Roy
IEEE Transactions on Computers ’20
[paper]

PUMA: A Programmable Ultra-efficient Memristor-based Accelerator for Machine Learning Inference

A. Ankit, I. El Hajj, S. Chalamalasetti, G. Ndu, M. Foltin, R. S. Williams, P. Faraboschi, W.-M. Hwu, J. P. Strachan, K. Roy, D. Milojicic
ASPLOS’19 (acceptance rate: 74/350 = 21.1%)
[paper] [slides] [poster] [video] [simulator code] [compiler code]

A Fast and Massively-Parallel Inverse Solver for Multiple-Scattering Tomographic Image Reconstruction

M. Hidayetoğlu, C. Pearson, I. El Hajj, L. Gürel, W.-C. Chew, W.-M. Hwu
IPDPS’18 (1st round acceptance rate: 38/461 = 8.2%)
Best poster runner up
[paper] [poster]

SAVI Objects: Sharing and Virtuality Incorporated

I. El Hajj, T. Jablin, D. Milojicic, W.-M. Hwu
OOPSLA’17 (acceptance rate: 66/223 = 29.6%)
[paper] [slides]

Chai: Collaborative Heterogeneous Applications for Integrated-architectures

J. Gómez-Luna, I. El Hajj, L.-W. Chang, V. Garcia-Flores, S. Garcia de Gonzalo, T. Jablin, A. J. Peña, W.-M. Hwu
ISPASS’17 (acceptance rate: 24/81 = 29.6%)
[paper] [slides] [website] [code]

KLAP: Kernel Launch Aggregation and Promotion for Optimizing Dynamic Parallelism

I. El Hajj, J. Gómez-Luna, C. Li, L.-W. Chang, D. Milojicic, W.-M. Hwu
MICRO’16 (acceptance rate: 61/283 = 21.6%)
[paper] [slides] [poster] [code]

Efficient Kernel Synthesis for Performance Portable Programming

L.-W. Chang, I. El Hajj, C. Rodrigues, J. Gómez-Luna, W.-M. Hwu
MICRO’16 (acceptance rate: 61/283 = 21.6%)
[paper] [slides] [poster]

SpaceJMP: Programming with Multiple Virtual Address Spaces

I. El Hajj*, A. Merritt*, G. Zellweger*, D. Milojicic, R. Achermann, P. Faraboschi, W.-M. Hwu, T. Roscoe, K. Schwan *co-primary authors
ASPLOS’16 (acceptance rate: 53/232 = 22.8%)
HiPEAC Paper Award
[paper] [slides] [poster]

Locality-Centric Thread Scheduling for Bulk-synchronous Programming Models on CPU Architectures

H.-S. Kim, I. El Hajj, J. A. Stratton, S. S. Lumetta, W.-M. Hwu
CGO’15 (acceptance rate: 24/88 = 27.3%)
Best paper runner up
[paper] [slides]

TIGER: Tiled Iterative Genome Assembler

X.-L. Wu, Y. Heo, I. El Hajj, W.-M. Hwu, D. Chen, J. Ma
BMC Bioinformatics’12
[paper]

Short Papers and Posters

Asynchronous Persistence with ASAP

A. Abulila, I. El Hajj, M. Jung, N. S. Kim.
NVMW’23

High-throughput Pairwise Alignment with the Wavefront Algorithm using Processing-in-Memory

S. Diab, A. Nassereldine, M. Alser, J. Gómez-Luna, O. Mutlu, I. El Hajj
HiCOMB’22
[paper] [slides]

Mixed Precision Quantization for ReRAM-based DNN Inference Accelerators

S. Huang, A. Ankit, P. Silveira, R. Antunes, S. Chalamalasetti, I. El Hajj, D. Kim, G. Aguiar, P. Bruel, S. Serebryakov, C. Xu, C. Li, P. Faraboschi, J. P. Strachan, D. Chen, K. Roy, W.-M. Hwu, D. Milojicic
ASP-DAC’21

Analysis and Modeling of Collaborative Execution Strategies for Heterogeneous CPU-FPGA Architectures

S. Huang, L.-W. Chang, I. El Hajj, S. Garcia de Gonzalo, J. Gómez-Luna, S. Chalamalasetti, M. El-Hadedy, D. Milojicic, O. Mutlu, D. Chen, W.-M. Hwu
ICPE’19
[paper] [slides]

Hardware-Software Co-Design for an Analog-Digital Accelerator for Machine Learning

J. Ambrosi, A. Ankit, R. Antunes, S. Chalamalasetti, S. Chatterjee, I. El Hajj, G. Fachini, P. Faraboschi, M. Foltin, S. Huang, W.-M. Hwu, G. Knuppe, S. Lakshminarasimha, D. Milojicic, M. Parthasarathy, F. Ribeiro, L. Rosa, K. Roy, P. Silveira, J. P. Strachan
ICRC’18
[paper]

Scaling Analysis of a Hierarchical Parallelization of Large Inverse Multiple-Scattering Solutions

M. Hidayetoğlu, C. Pearson, I. El Hajj, W. C. Chew, L. Gürel, W.-M. Hwu
SC’17
[paper] [poster]

Collaborative Computing for Heterogeneous Integrated Systems

L.-W. Chang, J. Gómez-Luna, I. El Hajj, S. Huang, D. Chen, W.-M. Hwu
ICPE’17
[paper] [slides]

A Programming System for Future Proofing Performance Critical Libraries

L.-W. Chang, I. El Hajj, H.-S. Kim, J. Gómez-Luna, A. Dakkak, W.-M. Hwu
PPoPP’16
[paper] [poster]

Invited Papers

A Python-based High-Level Programming Flow for CPU-FPGA Heterogeneous Systems

S. Huang, K. Wu, S. Chalamalasetti, I. El Hajj, C. Xu, P. Faraboschi, D. Chen
PEHC’21

Benchmarking Memory-Centric Computing Systems: Analysis of Real Processing-in-Memory Hardware

J. Gómez-Luna, I. El Hajj, Ivan Fernandez, C. Giannoula, G. Oliveira, O. Mutlu
CUT’21
[paper]

Rebooting the Data Access Hierarchy of Computing Systems

W.-M. Hwu, I. El Hajj, S. Garcia de Gonzalo, C. Pearson, N. S. Kim, D. Chen, J. Xiong, Z. Sura
ICRC’17
[paper]

Generalize or Die: Operating Systems Support for Memristor-based Accelerators

P. Bruel, S. Chalamalasetti, C. Dalton, I. El Hajj, A. Goldman, C. Graves, W.-M. Hwu, P. Laplante, D. Milojicic, G. Ndu, J. P. Strachan
ICRC’17
[paper]

Transitioning HPC Software to Exascale Heterogeneous Computing

W.-M. Hwu, L.-W. Chang, H.-S. Kim, A. Dakkak, I. El Hajj
CEM’15
[paper]

Patents

Architecture Support for Multiple Virtual Address Spaces per Process

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic, P. Faraboschi
In submission.

Client-Server Programming with Address Spaces

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic
PCT/US2016/021446, filed Sep. 3, 2016.

Multiple Persistent Virtual Address Spaces (MPVAS)

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic
PCT/US2016/015661, filed Jan. 29, 2016.

Versioning using multiple virtual address spaces per process

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic
PCT/US2016/015814, filed Jan. 29, 2016.

Hardware support for tracking writes to memory objects with sub-page granularity

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic
PCT/US2016/015815, filed Jan. 29, 2016.

Use of memory write logging for fast versioning of in-memory objects

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic, R. Achermann
PCT/US2016/015839, filed Jan. 29, 2016.

Supporting and managing multiple virtual address spaces per process

I. El Hajj, A. Merritt, G. Zellweger, D. Milojicic
PCT/US2015/049726, filed Sept. 11, 2015.

Dissertations and Reports

Techniques for Optimizing Dynamic Parallelism on Graphics Processing Units

I. El Hajj
Ph.D. Dissertation, Dec ’18

Dynamic Loop Vectorization for Executing OpenCL Kernels on CPUs

I. El Hajj
M.S. Thesis, May ’14

Multi-tier Dynamic Vectorization for Translating GPU Optimizations into CPU Performance

H.-S. Kim, I. El Hajj, J. A. Stratton, W.-M. Hwu
Technical Report, UIUC, Feb ’14