About Me

I am a first-year Computer Science Ph.D. student at the University of Central Florida, under the supervision of Dr. Chen Chen. I had the opportunity to serve as an intern at ByteDance, where I collaborated with Jie Wu from January 2022 to July 2023. Prior to that, I earned my Master’s degree from Xiamen University and my Bachelor’s degree from Hainan University.

My primary research interest is in Data-centric AI, aiming to explore the role and applications of data across a range of AIGC, pre-training, and downstream tasks. Specifically, I am investigating how to facilitate models with more efficient data or annotations and how to construct automated data pipelines for various tasks. Recently, I am working on how to use discriminative models and large multi-modal models to improve diffusion models.


(* indicates equal contribution; # indicates corresponding authorship.)


ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.

Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen

arXiv 2024   [Website] [Code] [Demo]

  • Existing efforts in controllable generation still perform poorly in controllability, with generated images deviating from input conditions.
  • We show that pre-trained discriminative models can serve as powerful visual reward models to improve the controllability in a cycle-consistency manner.
  • We disrupt the consistency between input images and conditions, and enable the single-step denoising for efficient reward fine-tuning.
  • We provide a unified and public evaluation of controllability and demonstrate that ControlNet++ comprehensively outperforms existing methods.

AlignDet: Aligning Pre-training and Fine-tuning in Object Detection.

Ming Li*, Jie Wu*#, Xionghui Wang, Chen Chen#, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan.

ICCV 2023   [Website] [Code] [Poster]

  • We point out that existing detection algorithms are constrained by the data, model, and task discrepancies between pre-training and fine-tuning.
  • We propose AlignDet to align these discrepancies, which constructs detection-oriented pre-training by learning classification and regression knowledge.
  • AlignDet makes the first attempt to fully pre-train all kinds of detectors using a completely unsupervised paradigm, by integrating pre-trained backbones.
  • AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule.

FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation.

Jie Qin*, Jie Wu*#, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang#.

CVPR 2023   [Website] [Code]

  • We propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation.
  • FreeSeg optimizes an all-in-one network via one-shot training and adaptive prompt learning, making a single model work for diverse segmentation tasks.
  • FreeSeg establishes SOTA results on three Open-Vocabulary segmentation tasks, and outperforms the best task-specific architectures by a large margin.

DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection.

Manlin Zhang*, Jie Wu*#, Yuxi Ren*, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma#.

arXiv 2023   [Website] [Code]

  • We reveal the Diffusion Model is a scalable data engine for object detection.
  • We present DiffusionEngine to provide high-quality and diversity detection data, by a pre-trained diffusion model and an effective Detection-Adapter.
  • We demonstrate data scaling-up via DiffusionEngine can achieve significant improvements in diverse scenarios, including various detection algorithms, self/semi-supervised pretraining, data/label scarce and cross-domain setting.

LucidDreaming: Controllable Object-Centric 3D Generation.

Zhaoning Wang, Ming Li, Chen Chen#.

arXiv 2023   [Website]

  • We propose LucidDreaming, a plug-and-play framework to achieve controllable object-centric 3D generation with Large Language Models.
  • We introduces clipped ray sampling and object-centric density bias initialization to generate multiple discrete 3D objects within single scene.
  • LucidDreaming offers a standard in the dataset, evaluation metrics, and clear strategies for the development of controllable 3D generation.

Multi-granularity Distillation Scheme Towards Lightweight Semi-supervised Semantic Segmentation.

Jie Qin*, Jie Wu*#, Ming Li, Xuefeng Xiao, Min Zheng, Xingang Wang#.

ECCV 2022   [Code]

  • We offer the first attempt to obtain the lightweight model for semi-supervised semantic segmentation.
  • We propose a multi-granularity distillation (MGD) to distill task-specific concepts from two complementary teacher models into a student one.
  • The labeled-unlabeled data cooperative distillation, and the hierarchical loss paradigm facilitates the lightweight model with less annotations.
  • MGD can outperform the competitive approaches by a large margin under diverse partition protocols with signigicant FLOPs reduction.




  • 2023.09 - Now, Ph.D., Computer Science, University of Central Florida.
  • 2020.09 - 2023.06, Master, Computer Science, Xiamen University.
  • 2016.09 - 2020.06, Bachelar, Software Engineering, Hainan University.