Thyme: Think Beyond Images

Yi-Fan Zhang2,♠, Xingyu Lu4, Shukang Yin5, Chaoyou Fu3,†,
Wei Chen1, Xiao Hu4, Bin Wen1,†, Kaiyu Jiang1, Changyi Liu1, Tianke Zhang1,
Haonan Fan1, Kaibing Chen1, Jiankang Chen1, Haojie Ding1, Kaiyu Tang1,
Zhang Zhang2,†, Liang Wang2, Fan Yang1, Tingting Gao1, Guorui Zhou1
♠Project Leader †Corresponding Author
1Kwai Keye 2CASIA 3NJU 4THU 5USTC
Paper Code Thyme SFT Model Thyme RL Model
SFT Data RL Data

Abstract

We are excited to introduce Thyme: Think Beyond Images. Thyme transcends traditional "thinking with images" paradigms by autonomously generating and executing diverse image processing and computational operations through executable code, significantly enhancing performance on high-resolution perception and complex reasoning tasks. Leveraging a novel two-stage training strategy that combines supervised fine-tuning with reinforcement learning and empowered by the innovative GRPO-ATS algorithm, Thyme achieves a sophisticated balance between reasoning exploration and code execution precision.

Showcase of Examples

See how Thyme performs visual reasoning in real-world scenarios

Method

Method Overview

Overall pipeline of Thyme, illustrating the interaction between the model and the sandbox for iterative reasoning and code execution. Key processes such as reasoning, code generation, sandbox execution, and result feedback are highlighted.

Benchmark Results

Main Results

Performance Comparison on Perception, Reasoning, and General Tasks. For all open-source models, the best performance for each metric is bolded, and the second best is underlined. Gold-colored font indicates improvement over the baseline Qwen2.5-VL-7B.

Citation

@article{zhang2025thyme,
  title={Thyme: Think Beyond Images},
  author={Zhang, Yi-Fan and Lu, Xingyu and Yin, Shukang and Fu, Chaoyou and Chen, Wei and Hu, Xiao and Wen, Bin and Jiang, Kaiyu and Liu, Changyi and Zhang, Tianke and others},
  journal={arXiv preprint arXiv:2508.11630},
  year={2025}
}