About Me
I am a second-year PhD student in Computer Science at the University of Central Florida, with Prof. Chen Chen as my advisor. I am currently interning at TikTok with Sijie Zhu. Prior to this, I worked as an intern at ByteDance with Jie Wu.
My primary research interest is in Data-centric AI, investigating how to facilitate models with more effective and efficient data, and how to construct automated data pipelines for various tasks. Recently, I am working on image/video generation and editing.
Publications
(* indicates equal contribution; # indicates corresponding authorship.)
ControlNet++: Improving Conditional Controls with Efficient Consistency Feedback.
Ming Li, Taojiannan Yang, Huafeng Kuang, Jie Wu, Zhaoning Wang, Xuefeng Xiao, Chen Chen
ECCV 2024 [Website] [Code] [Demo]
- Existing efforts in controllable generation still perform poorly in controllability, with generated images deviating from input conditions.
- We show that pre-trained discriminative models can serve as powerful visual reward models to improve the controllability in a cycle-consistency manner.
- We disrupt the consistency between input images and conditions, and enable the single-step denoising for efficient reward fine-tuning.
- We provide a unified and public evaluation of controllability and demonstrate that ControlNet++ comprehensively outperforms existing methods.
AlignDet: Aligning Pre-training and Fine-tuning in Object Detection.
Ming Li*, Jie Wu*#, Xionghui Wang, Chen Chen#, Jie Qin, Xuefeng Xiao, Rui Wang, Min Zheng, Xin Pan.
ICCV 2023 [Website] [Code] [Poster]
- We point out that existing detection algorithms are constrained by the data, model, and task discrepancies between pre-training and fine-tuning.
- We propose AlignDet to align these discrepancies, which constructs detection-oriented pre-training by learning classification and regression knowledge.
- AlignDet makes the first attempt to fully pre-train all kinds of detectors using a completely unsupervised paradigm, by integrating pre-trained backbones.
- AlignDet can achieve significant improvements across diverse protocols, such as detection algorithm, model backbone, data setting, and training schedule.
FreeSeg: Unified, Universal and Open-Vocabulary Image Segmentation.
Jie Qin*, Jie Wu*#, Pengxiang Yan, Ming Li, Ren Yuxi, Xuefeng Xiao, Yitong Wang, Rui Wang, Shilei Wen, Xin Pan, Xingang Wang#.
- We propose FreeSeg, a generic framework to accomplish Unified, Universal and Open-Vocabulary Image Segmentation.
- FreeSeg optimizes an all-in-one network via one-shot training and adaptive prompt learning, making a single model work for diverse segmentation tasks.
- FreeSeg establishes SOTA results on three Open-Vocabulary segmentation tasks, and outperforms the best task-specific architectures by a large margin.
DiffusionEngine: Diffusion Model is Scalable Data Engine for Object Detection.
Manlin Zhang*, Jie Wu*#, Yuxi Ren*, Ming Li, Jie Qin, Xuefeng Xiao, Wei Liu, Rui Wang, Min Zheng, Andy J. Ma#.
- We reveal the Diffusion Model is a scalable data engine for object detection.
- We present DiffusionEngine to provide high-quality and diversity detection data, by a pre-trained diffusion model and an effective Detection-Adapter.
- We demonstrate data scaling-up via DiffusionEngine can achieve significant improvements in diverse scenarios, including various detection algorithms, self/semi-supervised pretraining, data/label scarce and cross-domain setting.
-
Multi-granularity Distillation Scheme Towards Lightweight Semi-supervised Semantic Segmentation.
Jie Qin*, Jie Wu*#, Ming Li, Xuefeng Xiao, Min Zheng, Xingang Wang#.
ECCV 2022 [Code] -
Frame Interpolation with Consecutive Brownian Bridge Diffusion.
Zonglin Lyu, Ming Li, Jianbo Jiao, Chen Chen.
ACM MM 2024 [Website] [Code] -
IL-NeRF: Incremental Learning for Neural Radiance Fields with Camera Pose Alignment.
Letian Zhang, Ming Li, Chen Chen, Jie Xu.
arXiv 2024 [Website] -
LucidDreaming: Controllable Object-Centric 3D Generation.
Zhaoning Wang, Ming Li, Chen Chen.
ECCV 2024 Workshop: Computer Vision For Videogames (CV2) [Website] -
DLIP: Distilling Language-Image Pre-training.
Huafeng Kuang, Jie Wu, Xiawu Zheng, Ming Li, Xuefeng Xiao, Rui Wang, Min Zheng, Rongrong Ji.
arXiv 2023 -
First Place Solution to the CVPR’2023 AQTC Challenge: A Function-Interaction Centric Approach with Spatiotemporal Visual-Language Alignment.
Tom Tongjia Chen, Hongshan Yu, Zhengeng Yang, Ming Li, Zechuan Li, Jingwen Wang, Wei Miao, Wei Sun, Chen Chen.
CVPR 2023 Workshop [Challenge] [Code] -
First Place Solution to the CVPR’2022 AVA Challenge: Parallel Pre-trained Transformers for Synthetic Data-based Instance Segmentation.
Ming Li*, Jie Wu*#, Jinhang Cai, Jie Qin, Yuxi Ren, Xuefeng Xiao, Min Zheng, Rui Wang, Xin Pan.
CVPR 2022 Workshop [Challenge]
Internships
- 2024.05 - Now, TikTok, ByteDance, San Jose, USA.
- 2022.01 - 2023.07, ByteDance, Shenzhen, China.
Honors
- Champion of CVPR 2023 Long-form Video Understanding and Generation Challenge (Track 3).
- ORCGS Doctoral Fellowship, the University of Central Florida. 2023.
- Champion of CVPR 2022 AVA Accessibility Vision and Autonomy Challenge.
- Excellent Graduation Thesis of Hainan University. 2020.
- China National Inspirational Scholarship. 2017 and 2018.
Educations
- 2023.09 - Now, Ph.D., Computer Science, University of Central Florida.
- 2020.09 - 2023.06, Master, Computer Science, Xiamen University.
- 2016.09 - 2020.06, Bachelar, Software Engineering, Hainan University.