Thyme: Think Beyond Images

Yi-Fan Zhang^2,♠, Xingyu Lu⁴, Shukang Yin⁵, Chaoyou Fu^3,†,
Wei Chen¹, Xiao Hu⁴, Bin Wen^1,†, Kaiyu Jiang¹, Changyi Liu¹, Tianke Zhang¹,
Haonan Fan¹, Kaibing Chen¹, Jiankang Chen¹, Haojie Ding¹, Kaiyu Tang¹,
Zhang Zhang^2,†, Liang Wang², Fan Yang¹, Tingting Gao¹, Guorui Zhou¹

^♠Project Leader ^†Corresponding Author

¹Kwai Keye ²CASIA ³NJU ⁴THU ⁵USTC

Paper Code Thyme SFT Model Thyme RL Model

SFT Data RL Data

Abstract

We are excited to introduce Thyme: Think Beyond Images. Thyme transcends traditional "thinking with images" paradigms by autonomously generating and executing diverse image processing and computational operations through executable code, significantly enhancing performance on high-resolution perception and complex reasoning tasks. Leveraging a novel two-stage training strategy that combines supervised fine-tuning with reinforcement learning and empowered by the innovative GRPO-ATS algorithm, Thyme achieves a sophisticated balance between reasoning exploration and code execution precision.

Overall pipeline of Thyme, illustrating the interaction between the model and the sandbox for iterative reasoning and code execution. Key processes such as reasoning, code generation, sandbox execution, and result feedback are highlighted.

Performance Comparison on Perception, Reasoning, and General Tasks. For all open-source models, the best performance for each metric is bolded, and the second best is underlined. Gold-colored font indicates improvement over the baseline Qwen2.5-VL-7B.

Citation

@article{zhang2025thyme,
  title={Thyme: Think Beyond Images},
  author={Zhang, Yi-Fan and Lu, Xingyu and Yin, Shukang and Fu, Chaoyou and Chen, Wei and Hu, Xiao and Wen, Bin and Jiang, Kaiyu and Liu, Changyi and Zhang, Tianke and others},
  journal={arXiv preprint arXiv:2508.11630},
  year={2025}
}

Thyme: Think Beyond Images

Abstract

Showcase of Examples

Method

Benchmark Results

Citation