Proactive process-level live migration in HPC environments C Wang, F Mueller, C Engelmann, SL Scott SC'08: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, 1-12, 2008 | 251 | 2008 |
A job pause service under LAM/MPI+ BLCR for transparent fault tolerance C Wang, F Mueller, C Engelmann, SL Scott 2007 IEEE International Parallel and Distributed Processing Symposium, 1-10, 2007 | 115 | 2007 |
NVMalloc: Exposing an aggregate SSD store as a memory partition in extreme-scale machines C Wang, SS Vazhkudai, X Ma, F Meng, Y Kim, C Engelmann 2012 IEEE 26th International Parallel and Distributed Processing Symposium …, 2012 | 102 | 2012 |
Hybrid checkpointing for MPI jobs in HPC environments C Wang, F Mueller, C Engelmann, SL Scott 2010 IEEE 16th International Conference on Parallel and Distributed Systems …, 2010 | 81 | 2010 |
Proactive process-level live migration and back migration in HPC environments C Wang, F Mueller, C Engelmann, SL Scott Journal of Parallel and Distributed Computing 72 (2), 254-267, 2012 | 64 | 2012 |
Scalable, fault tolerant membership for MPI tasks on HPC systems J Varma, C Wang, F Mueller, C Engelmann, SL Scott Proceedings of the 20th annual international conference on Supercomputing …, 2006 | 45 | 2006 |
Optimizing center performance through coordinated data staging, scheduling and recovery Z Zhang, C Wang, SS Vazhkudai, X Ma, GG Pike, JW Cobb, F Mueller Proceedings of the 2007 ACM/IEEE conference on Supercomputing, 1-11, 2007 | 35 | 2007 |
Improving the availability of supercomputer job input data using temporal replication C Wang, Z Zhang, X Ma, SS Vazhkudai, F Mueller Computer Science-Research and Development 23, 149-157, 2009 | 17 | 2009 |
MOLAR: Adaptive runtime support for high-end computing operating and runtime systems C Engelmann, SL Scott, DE Bernholdt, NR Gottumukkala, C Leangsuksun, ... ACM SIGOPS Operating Systems Review 40 (2), 63-72, 2006 | 17 | 2006 |
Understanding object-level memory access patterns across the spectrum X Ji, C Wang, N El-Sayed, X Ma, Y Kim, SS Vazhkudai, W Xue, ... Proceedings of the International Conference for High Performance Computing …, 2017 | 15 | 2017 |
A tunable holistic resiliency approach for high-performance computing systems SL Scott, C Engelmann, GR Vallée, T Naughton, A Tikotekar, ... Proceedings of the 14th ACM SIGPLAN symposium on Principles and practice of …, 2009 | 15 | 2009 |
Hybrid full/incremental checkpoint/restart for MPI jobs in HPC environments C Wang, F Mueller, C Engelmann, SL Scott North Carolina State University. Dept. of Computer Science, 2009 | 11 | 2009 |
On-the-fly recovery of job input data in supercomputers C Wang, Z Zhang, SS Vazhkudai, X Ma, F Mueller 2008 37th International Conference on Parallel Processing, 620-627, 2008 | 6 | 2008 |
Transparent fault tolerance for job healing in HPC environments C Wang North Carolina State University, 2009 | 5 | 2009 |
Transparent Fault Tolerance for Job Input Data in HPC Environments C Wang, SS Vazhkudai, X Ma, F Mueller | 2 | 2014 |
Hybrid Full/Incremental Checkpoint/Restart for MPI Jobs in HPC Environments W Chao, F Mueller, C Engelmann Proc. of the 16th International Conference on Parallel and Distributed …, 2011 | 2 | 2011 |
GPFS Evaluation Report C Wang Technical Report, National Center for Computational Sciences, Oak Ridge …, 2016 | | 2016 |
A Study on Application Heap Object-level Memory Access Patterns X Ji, C Wang, X Ma, S Vazhkudai, Y Kim Technical Report, National Center for Computational Sciences, Oak Ridge …, 2016 | | 2016 |
Back-Migration for MPI Jobs in HPC Environments C Wang, F Mueller, C Engelmann, SL Scott Forum to Address Scalable Technology for Runtime and Operating Systems (FastOS), 2009 | | 2009 |
Resiliency for High-Performance Computing Systems 1st High-Performance Computer Science Week (HPCSW) 2008, 2008 | | 2008 |