Follow
Siyuan Huang
Siyuan Huang
Shanghai AI Lab && SJTU && MMLab CUHK
Verified email at sjtu.edu.cn - Homepage
Title
Cited by
Cited by
Year
Sphinx: The joint mixing of weights, tasks, and visual embeddings for multi-modal large language models
Z Lin, C Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, C Lin, W Shao, ...
arXiv preprint arXiv:2311.07575, 2023
2062023
Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models
P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei, F Meng, S Huang, Y Qiao, ...
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2024
1772024
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
R Zhang, X Hu, B Li, S Huang, H Deng, Y Qiao, P Gao, H Li
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern …, 2023
1662023
Multi-modal sensor fusion for auto driving perception: A survey
K Huang, B Shi, X Li, X Li, S Huang, Y Li
arXiv preprint arXiv:2202.02703, 2022
1402022
Instruct2act: Mapping multi-modality instructions to robotic actions with large language model
S Huang, Z Jiang, H Dong, Y Qiao, P Gao, H Li
arXiv preprint arXiv:2305.11176, 2023
1202023
Sphinx-x: Scaling data and parameters for a family of multi-modal large language models
D Liu, R Zhang, L Qiu, S Huang, W Lin, S Zhao, S Geng, Z Lin, P Jin, ...
arXiv preprint arXiv:2402.05935, 2024
88*2024
Tiny lvlm-ehub: Early multimodal experiments with bard
W Shao, Y Hu, P Gao, M Lei, K Zhang, F Meng, P Xu, S Huang, H Li, ...
arXiv preprint arXiv:2308.03729, 2023
352023
Bridging zero-shot object navigation and foundation models through pixel-guided navigation skill
W Cai, S Huang, G Cheng, Y Long, P Gao, C Sun, H Dong
ICRA2024, 2023
282023
GUI Odyssey: A Comprehensive Dataset for Cross-App GUI Navigation on Mobile Devices
Q Lu, W Shao, Z Liu, F Meng, B Li, B Chen, S Huang, K Zhang, Y Qiao, ...
arXiv preprint arXiv:2406.08451, 2024
162024
Not All Experts are Equal: Efficient Expert Pruning and Skipping for Mixture-of-Experts Large Language Models
X Lu, Q Liu, Y Xu, A Zhou, S Huang, B Zhang, J Yan, H Li
arXiv preprint arXiv:2402.14800, 2024
152024
Amex: Android multi-annotation expo dataset for mobile gui agents
Y Chai, S Huang, Y Niu, H Xiao, L Liu, D Zhang, P Gao, S Ren, H Li
arXiv preprint arXiv:2407.17490, 2024
142024
Manipvqa: Injecting robotic affordance and physically grounded information into multi-modal large language models
S Huang, I Ponomarenko, Z Jiang, X Li, X Hu, P Gao, H Li, H Dong
IROS2024, 2024
122024
SUG: Single-dataset Unified Generalization for 3D Point Cloud Classification
S Huang, B Zhang, B Shi, H Li, Y Li, P Gao
Proceedings of the 31st ACM International Conference on Multimedia, 8644-8652, 2023
122023
A3VLM: Actionable Articulation-Aware Vision Language Model
S Huang, H Chang, Y Liu, Y Zhu, H Dong, P Gao, A Boularias, H Li
Conference on Robot Learning (CoRL), 2024
92024
Draw-and-Understand: Leveraging Visual Prompts to Enable MLLMs to Comprehend What You Want
W Lin, X Wei, R An, P Gao, B Zou, Y Luo, S Huang, S Zhang, H Li
arXiv preprint arXiv:2403.20271, 2024
62024
Adas: A simple active-and-adaptive baseline for cross-domain 3d semantic segmentation
B Fei, S Huang, J Yuan, B Shi, B Zhang, T Chen, M Dou, Y Qiao
arXiv preprint arXiv: 2212.10390, 2022
52022
PixWizard: Versatile image-to-image visual assistant with open-language instructions
W Lin, X Wei, R Zhang, L Zhuo, S Zhao, S Huang, J Xie, Y Qiao, P Gao, ...
arXiv preprint arXiv:2409.15278, 2024
32024
EnerVerse: Envisioning Embodied Future Space for Robotics Manipulation
S Huang, L Chen, P Zhou, S Chen, Z Jiang, Y Hu, P Gao, H Li, M Yao, ...
arXiv preprint arXiv:2501.01895, 2025
2025
A3: Android Agent Arena for Mobile GUI Agents
Y Chai, H Li, J Zhang, L Liu, G Wang, S Ren, S Huang, H Li
arXiv preprint arXiv:2501.01149, 2025
2025
SPHINX: A Mixer of Weights, Visual Embeddings and Image Scales for Multi-modal Large Language Models
Z Lin, D Liu, R Zhang, P Gao, L Qiu, H Xiao, H Qiu, W Shao, K Chen, ...
European Conference on Computer Vision, 36-55, 2025
2025
The system can't perform the operation now. Try again later.
Articles 1–20