We are excited to introduce Thyme: Think Beyond Images. Thyme transcends traditional "thinking with images" paradigms by autonomously generating and executing diverse image processing and computational operations through executable code, significantly enhancing performance on high-resolution perception and complex reasoning tasks. Leveraging a novel two-stage training strategy that combines supervised fine-tuning with reinforcement learning and empowered by the innovative GRPO-ATS algorithm, Thyme achieves a sophisticated balance between reasoning exploration and code execution precision.
See how Thyme performs visual reasoning in real-world scenarios
Overall pipeline of Thyme, illustrating the interaction between the model and the sandbox for iterative reasoning and code execution. Key processes such as reasoning, code generation, sandbox execution, and result feedback are highlighted.
Performance Comparison on Perception, Reasoning, and General Tasks. For all open-source models, the best performance for each metric is bolded, and the second best is underlined. Gold-colored font indicates improvement over the baseline Qwen2.5-VL-7B.
@article{zhang2025thyme,
title={Thyme: Think Beyond Images},
author={Zhang, Yi-Fan and Lu, Xingyu and Yin, Shukang and Fu, Chaoyou and Chen, Wei and Hu, Xiao and Wen, Bin and Jiang, Kaiyu and Liu, Changyi and Zhang, Tianke and others},
journal={arXiv preprint arXiv:2508.11630},
year={2025}
}