Senior Computer Scientist at ORNL
Verified email at - Homepage
Cited by
Cited by
The design and performance of batched BLAS on modern high-performance computing systems
J Dongarra, S Hammarling, NJ Higham, SD Relton, P Valero-Lara, ...
Procedia Computer Science 108, 495-504, 2017
Performance evaluation of cudnn convolution algorithms on nvidia volta gpus
M Jorda, P Valero-Lara, AJ Pena
IEEE Access 7, 70461-70473, 2019
Accelerating fluid–solid simulations (Lattice-Boltzmann & Immersed-Boundary) on heterogeneous architectures
P Valero-Lara, FD Igual, M Prieto-Matías, A Pinelli, J Favier
Journal of Computational Science 10, 249-261, 2015
Fast finite difference Poisson solvers on heterogeneous architectures
P Valero-Lara, A Pinelli, M Prieto-Matias
Computer Physics Communications 185 (4), 1265-1272, 2014
Heterogeneous CPU+ GPU approaches for mesh refinement over Lattice‐Boltzmann simulations
P Valero‐Lara, J Jansson
Concurrency and Computation: Practice and Experience 29 (7), e3919, 2017
A proposed API for batched basic linear algebra subprograms
J Dongarra, I Duff, M Gates, A Haidar, S Hammarling, NJ Higham, J Hogg, ...
Manchester Institute for Mathematical Sciences, University of Manchester, 2016
Accelerating solid-fluid interaction using lattice-boltzmann and immersed boundary coupled simulations on heterogeneous platforms
P Valero-Lara, A Pinelli, M Prieto-Matias
Procedia Computer Science 29, 50-61, 2014
cuThomasBatch and cuThomasVBatch, CUDA routines to compute batch of tridiagonal systems on NVIDIA GPUs
P Valero‐Lara, I Martínez‐Pérez, R Sirvent, X Martorell, AJ Peña
Concurrency and Computation: Practice and Experience 30 (24), e4909, 2018
Block tridiagonal solvers on heterogeneous architectures
P Valero-Lara, A Pinelli, J Favier, MP Matias
2012 IEEE 10th International Symposium on Parallel and Distributed …, 2012
cuHinesBatch: Solving multiple Hines systems on GPUs human brain project
P Valero-Lara, I Martínez-Perez, AJ Pena, X Martorell, R Sirvent, ...
Procedia Computer Science 108, 566-575, 2017
Accelerating solid–fluid interaction based on the immersed boundary method on multicore and gpu architectures
P Valero-Lara
The Journal of Supercomputing 70 (2), 799-815, 2014
Multi-GPU acceleration of DARTEL (early detection of Alzheimer)
P Valero-Lara
2014 IEEE International Conference on Cluster Computing (CLUSTER), 346-354, 2014
Similarity search implementations for multi-core and many-core processors
R Uribe-Paredes, P Valero-Lara, E Arias, JL Sánchez, D Cazorla
2011 International Conference on High Performance Computing & Simulation …, 2011
NVIDIA GPUs scalability to solve multiple (batch) tridiagonal systems implementation of cuthomasbatch
P Valero-Lara, I Martínez-Pérez, R Sirvent, X Martorell, AJ Peña
International Conference on Parallel Processing and Applied Mathematics, 243-253, 2017
A non-uniform Staggered Cartesian grid approach for Lattice-Boltzmann method
P Valero-Lara, J Jansson
Procedia Computer Science 51, 296-305, 2015
Reducing memory requirements for large size LBM simulations on GPUs
P Valero‐Lara
Concurrency and Computation: Practice and Experience 29 (24), e4221, 2017
Variable batched DGEMM
P Valero-Lara, I Martínez-Pérez, S Mateo, R Sirvent, V Beltran, X Martorell, ...
2018 26th Euromicro International Conference on Parallel, Distributed and …, 2018
Many-task computing on many-core architectures
P Valero-Lara, P Nookala, FL Pelayo, J Jansson, S Dimitropoulos, I Raicu
Scalable Computing: Practice and Experience 17 (1), 32-46, 2016
A gpu-based implementation for range queries on spaghettis data structure
R Uribe-Paredes, P Valero-Lara, E Arias, JL Sánchez, D Cazorla
Computational Science and Its Applications-ICCSA 2011: International …, 2011
sLASs: A fully automatic auto-tuned linear algebra library based on OpenMP extensions implemented in OmpSs (LASs Library)
P Valero-Lara, S Catalán, X Martorell, T Usui, J Labarta
Journal of Parallel and Distributed Computing 138, 153-171, 2020
The system can't perform the operation now. Try again later.
Articles 1–20