Parboil: A revised benchmark suite for scientific and commercial throughput computing JA Stratton, C Rodrigues, IJ Sung, N Obeid, LW Chang, N Anssari, GD Liu, ... Center for Reliable and High-Performance Computing 127 (7.2), 2012 | 964 | 2012 |
MCUDA: An efficient implementation of CUDA kernels for multi-core CPUs J Stratton, S Stone, W Hwu Languages and Compilers for Parallel Computing, 16-30, 2008 | 391 | 2008 |
Program optimization space pruning for a multithreaded GPU S Ryoo, CI Rodrigues, SS Stone, SS Baghsorkhi, SZ Ueng, JA Stratton, ... Proceedings of the 6th annual IEEE/ACM international symposium on Code …, 2008 | 388 | 2008 |
FCUDA: Enabling efficient compilation of CUDA kernels onto FPGAs A Papakonstantinou, K Gururaj, JA Stratton, D Chen, J Cong, WMW Hwu 2009 IEEE 7th Symposium on Application Specific Processors, 35-42, 2009 | 244 | 2009 |
Program optimization carving for GPU computing S Ryoo, CI Rodrigues, SS Stone, JA Stratton, SZ Ueng, SS Baghsorkhi, ... Journal of Parallel and Distributed Computing 68 (10), 1389-1401, 2008 | 168 | 2008 |
Data layout transformation exploiting memory-level parallelism in structured grid many-core applications IJ Sung, JA Stratton, WMW Hwu Proceedings of the 19th international conference on Parallel architectures …, 2010 | 136 | 2010 |
Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs JA Stratton, V Grover, J Marathe, B Aarts, M Murphy, Z Hu, WW Hwu Proceedings of the 8th annual IEEE/ACM international symposium on Code …, 2010 | 121 | 2010 |
SPEC ACCEL: A Standard Application Suite for Measuring Hardware Accelerator Performance G Juckeland, W Brantley, S Chandrasekaran, B Chapman, S Che, ... International Workshop on Performance Modeling, Benchmarking and Simulation …, 2014 | 94 | 2014 |
A scalable, numerically stable, high-performance tridiagonal solver using GPUs LW Chang, JA Stratton, HS Kim, WMW Hwu SC'12: Proceedings of the International Conference on High Performance …, 2012 | 91 | 2012 |
Compute unified device architecture application suitability WM Hwu, C Rodrigues, S Ryoo, J Stratton Computing in Science & Engineering 11 (3), 16-26, 2009 | 76 | 2009 |
Multilevel granularity parallelism synthesis on FPGAs A Papakonstantinou, Y Liang, JA Stratton, K Gururaj, D Chen, WMW Hwu, ... 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom …, 2011 | 62 | 2011 |
Optimization and architecture effects on GPU computing workload performance JA Stratton, N Anssari, C Rodrigues, IJ Sung, N Obeid, L Chang, GD Liu, ... 2012 Innovative Parallel Computing (InPar), 1-10, 2012 | 54 | 2012 |
Algorithm and data optimization techniques for scaling to massively threaded systems JA Stratton, C Rodrigues, IJ Sung, LW Chang, N Anssari, G Liu, ... Computer 45 (8), 26-32, 2012 | 50 | 2012 |
Locality-centric thread scheduling for bulk-synchronous programming models on CPU architectures HS Kim, I El Hajj, J Stratton, S Lumetta, WM Hwu 2015 IEEE/ACM International Symposium on Code Generation and Optimization …, 2015 | 33 | 2015 |
Efficient compilation of CUDA kernels for high-performance computing on FPGAs A Papakonstantinou, K Gururaj, JA Stratton, D Chen, J Cong, WMW Hwu ACM Transactions on Embedded Computing Systems (TECS) 13 (2), 1-26, 2013 | 30 | 2013 |
High-performance CUDA kernel execution on FPGAs A Papakonstantinou, K Gururaj, JA Stratton, D Chen, J Cong, WMW Hwu Proceedings of the 23rd international conference on Supercomputing, 515-516, 2009 | 19 | 2009 |
Performance portability in accelerated parallel kernels JA Stratton, HS Kim, TB Jablin, WMW Hwu Center for Reliable and High-Performance Computing, 2013 | 17 | 2013 |
Design evaluation of OpenCL compiler framework for Coarse-Grained Reconfigurable Arrays. HS Kim, M Ahn, JA Stratton, WH Wen-mei Field Programmable Technology (FPT), 313-320, 2012 | 16 | 2012 |
Performance insights on executing non-graphics applications on CUDA on the NVIDIA GeForce 8800 GTX W Hwu, D Kiirk, S Ryoo, C Rodriigues, J Stratton, K Huang 2007 IEEE Hot Chips 19 Symposium (HCS), 1-11, 2007 | 11 | 2007 |
System and method for dynamically spawning thread blocks within multi-threaded processing systems JA Stratton, D Luebke US Patent 8,615,770, 2013 | 9 | 2013 |