About Me

I am a second-year PhD student at the University of Central Florida, with Prof. Chen Chen as my advisor. I am currently interning at TikTok with Jie Wu and Rui Wang to explore visual reward fine-tuning. Before this, I conducted research on image editing with Sijie Zhu and Longyin Wen.

Publications

(* indicates equal contribution; # indicates corresponding authorship.)

sym

ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.

Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

ECCV 2024   [Website] [Code] [Demo] [Slides]

  • Existing efforts in controllable generation still perform poorly in controllability, with generated images deviating from input conditions.
  • We show that pre-trained discriminative models can serve as powerful visual reward models to improve the controllability in a cycle-consistency manner.
  • We disrupt the consistency between input images and conditions, and enable the single-step denoising for efficient reward fine-tuning.
  • We provide a unified and public evaluation of controllability and demonstrate that ControlNet++ comprehensively outperforms existing methods.
sym

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection.

Ming Li*, Jie Wu*#, Xionghui Wang, Chen Chen#, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan.

ICCV 2023   [Website] [Code] [Poster]

  • We point out that existing detection algorithms are constrained by the data, model, and task discrepancies between pre-training and fine-tuning.
  • We propose AlignDet to align these discrepancies, which constructs detection-oriented pre-training by learning classification and regression knowledge.
  • AlignDet makes the first attempt to fully pre-train all kinds of detectors using a completely unsupervised paradigm, by integrating pre-trained backbones.
  • AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule.
sym

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation.

Jie Qin*, Jie Wu*#, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang#.

CVPR 2023   [Website] [Code]

  • We propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation.
  • FreeSeg optimizes an all-in-one network via one-shot training and adaptive prompt learning, making a single model work for diverse segmentation tasks.
  • FreeSeg establishes SOTA results on three Open-Vocabulary segmentation tasks, and outperforms the best task-specific architectures by a large margin.
sym

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection.

Manlin Zhang*, Jie Wu*#, Yuxi Ren*, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma#.

arXiv 2023   [Website] [Code]

  • We reveal the Diffusion Model is a scalable data engine for object detection.
  • We present DiffusionEngine to provide high-quality and diversity detection data, by a pre-trained diffusion model and an effective Detection-Adapter.
  • We demonstrate data scaling-up via DiffusionEngine can achieve significant improvements in diverse scenarios, including various detection algorithms, self/semi-supervised pretraining, data/label scarce and cross-domain setting.

Internships

Honors

Educations

  • 2023.09 - Now, Ph.D., Computer Science, University of Central Florida.
  • 2020.09 - 2023.06, Master, Computer Science, Xiamen University.
  • 2016.09 - 2020.06, Bachelar, Software Engineering, Hainan University.