Diesel: DSL for linear algebra and neural net computations on GPUs V Elango, N Rubin, M Ravishankar, H Sandanagobalane, V Grover Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine …, 2018 | 60 | 2018 |
Distributed memory code generation for mixed irregular/regular computations M Ravishankar, R Dathathri, V Elango, LN Pouchet, J Ramanujam, ... Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of …, 2015 | 40 | 2015 |
On characterizing the data access complexity of programs V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 42nd Annual ACM SIGPLAN-SIGACT Symposium on Principles of …, 2015 | 29 | 2015 |
Spatial adaptive sampling in multiscale simulation B Rouet-Leduc, K Barros, E Cieren, V Elango, C Junghans, T Lookman, ... Computer Physics Communications 185 (7), 1857-1864, 2014 | 27 | 2014 |
Accelerating Strassen-Winograd's matrix multiplication algorithm on GPUs PW Lai, H Arafat, V Elango, P Sadayappan 20th Annual International Conference on High Performance Computing, 139-148, 2013 | 27 | 2013 |
Beyond reuse distance analysis: Dynamic analysis for characterization of data locality potential N Fauzia, V Elango, M Ravishankar, J Ramanujam, F Rastello, A Rountev, ... ACM Transactions on Architecture and Code Optimization (TACO) 10 (4), 1-29, 2013 | 27 | 2013 |
On characterizing the data movement complexity of computational DAGs for parallel execution V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan Proceedings of the 26th ACM Symposium on Parallelism in Algorithms and …, 2014 | 18 | 2014 |
Data Access Complexity: The Red/Blue Pebble Game Revisited V Elango, F Rastello, LN Pouchet, J Ramanujam, P Sadayappan | 14 | 2013 |
On using the roofline model with lower bounds on data movement V Elango, N Sedaghati, F Rastello, LN Pouchet, J Ramanujam, ... ACM Transactions on Architecture and Code Optimization (TACO) 11 (4), 1-23, 2015 | 12 | 2015 |
Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, VK Grover US Patent App. 16/277,661, 2019 | 7 | 2019 |
Pase: Parallelization strategies for efficient dnn training V Elango 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS …, 2021 | 6 | 2021 |
Techniques for Characterizing the Data Movement Complexity of Computations V Elango The Ohio State University, 2016 | 2 | 2016 |
Shared Microexponents: A Little Shifting Goes a Long Way B Rouhani, R Zhao, V Elango, R Shafipour, M Hall, M Mesmakhosroshahi, ... arXiv preprint arXiv:2302.08007, 2023 | 1 | 2023 |
Accelerating linear algebra kernels for any processor architecture V Elango, N Rubin, M Ravishankar, V Grover US Patent App. 18/136,233, 2023 | | 2023 |
With Shared Microexponents, A Little Shifting Goes a Long Way B Darvish Rouhani, R Zhao, V Elango, R Shafipour, M Hall, ... Proceedings of the 50th Annual International Symposium on Computer …, 2023 | | 2023 |
Sparsifying narrow data formats for neural networks BD Rouhani, V Elango, ES Chung, DC Burger, MC Heddes, S Nishit, ... US Patent App. 17/349,848, 2022 | | 2022 |
Data-aware model pruning for neural networks V Elango, BD Rouhani, ES Chung, DC Burger, M Golub US Patent App. 17/334,613, 2022 | | 2022 |
Hierarchical and shared exponent floating point data types BD Rouhani, V Elango, R Shafipour, J Fowers, MG Liu, J Xi, DC Burger, ... US Patent App. 17/361,263, 2022 | | 2022 |
Schedule: Fall 2015 M Ravishankar, R Dathathri, V Elango, LN Pouchet, J Ramanujam, ... Reading, 2015 | | 2015 |
Augmenting the Roofline Model via Lower Bounds on Data Movement V Elango, N Sedaghati, F Rastello, LN Pouchet, J Ramanujam, ... | | 2014 |