Urmăriți
David Harwath
Titlu
Citat de
Citat de
Anul
Unsupervised learning of spoken language with visual context
D Harwath, A Torralba, J Glass
Advances in Neural Information Processing Systems 29, 2016
2832016
Jointly discovering visual objects and spoken words from raw sensory input
D Harwath, A Recasens, D Surís, G Chuang, A Torralba, J Glass
Proceedings of the European conference on computer vision (ECCV), 649-665, 2018
2272018
Deep multimodal semantic embeddings for speech and images
D Harwath, J Glass
2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU …, 2015
1772015
Avlnet: Learning audio-visual language representations from instructional videos
A Rouditchenko, A Boggust, D Harwath, B Chen, D Joshi, S Thomas, ...
arXiv preprint arXiv:2006.09199, 2020
1342020
Everything at once-multi-modal fusion transformer for video retrieval
N Shvetsova, B Chen, A Rouditchenko, S Thomas, B Kingsbury, RS Feris, ...
Proceedings of the ieee/cvf conference on computer vision and pattern …, 2022
1262022
Learning word-like units from joint audio-visual analysis
D Harwath, JR Glass
arXiv preprint arXiv:1701.07481, 2017
1222017
A summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition
A Jansen, E Dupoux, S Goldwater, M Johnson, S Khudanpur, K Church, ...
2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013
1202013
Learning hierarchical discrete linguistic units from visually-grounded speech
D Harwath, WN Hsu, J Glass
arXiv preprint arXiv:1911.09602, 2019
952019
Mae-ast: Masked autoencoding audio spectrogram transformer
A Baade, P Peng, D Harwath
arXiv preprint arXiv:2203.16691, 2022
752022
Multimodal clustering networks for self-supervised learning from unlabeled videos
B Chen, A Rouditchenko, K Duarte, H Kuehne, S Thomas, A Boggust, ...
Proceedings of the IEEE/CVF International Conference on Computer Vision …, 2021
722021
Contrastive audio-visual masked autoencoder
Y Gong, A Rouditchenko, AH Liu, D Harwath, L Karlinsky, H Kuehne, ...
arXiv preprint arXiv:2210.07839, 2022
712022
Vision as an interlingua: Learning multilingual semantic embeddings of untranscribed speech
D Harwath, G Chuang, J Glass
2018 IEEE International Conference on Acoustics, Speech and Signal …, 2018
662018
Text-free image-to-speech synthesis using learned segmental units
WN Hsu, D Harwath, C Song, J Glass
arXiv preprint arXiv:2012.15454, 2020
642020
Spoken moments: Learning joint audio-visual representations from video descriptions
M Monfort, SY Jin, A Liu, D Harwath, R Feris, J Glass, A Oliva
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2021
562021
Towards visually grounded sub-word speech unit discovery
D Harwath, J Glass
ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and …, 2019
422019
Look, Listen, and Decode: Multimodal Speech Recognition with Images
F Sun, D Harwath, J Glass
IEEE Workshop on Spoken Language Technology, 2016
332016
Why is winoground hard? investigating failures in visuolinguistic compositionality
A Diwan, L Berry, E Choi, D Harwath, K Mahowald
arXiv preprint arXiv:2211.00768, 2022
322022
Word discovery in visually grounded, self-supervised speech models
P Peng, D Harwath
arXiv preprint arXiv:2203.15081, 2022
322022
Learning modality-invariant representations for speech and images
K Leidal, D Harwath, J Glass
2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU …, 2017
322017
Zero resource spoken audio corpus analysis
DF Harwath, TJ Hazen, JR Glass
2013 IEEE International Conference on Acoustics, Speech and Signal …, 2013
322013
Sistemul nu poate realiza operația în acest moment. Încercați din nou mai târziu.
Articole 1–20