Diffusion Model for A Virtual Try-On System
Date
Supervisor
Item type
Conference Contribution
Degree name
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
We present a modular virtual try-on (VTON) system that integrates natural language control, efficient diffusion-based image synthesis, and lightweight garment classification. User intent is parsed by a large language model (LLM) into structured visual prompts. A LoRA-tuned diffusion model generates tryon images conditioned on pose and segmentation maps, while a compact classifier, LightClothNet, handles five-category clothing recognition and pre-filtering. The pipeline is built using ComfyUI nodes and orchestrated via Dify. Compared to the existing methods, the proposed system offers improved realism, garment-pose alignment, and controllability. Our evaluations on the DressCode and VITON-HD datasets show that LoRA fine-tuning enhances fidelity under limited data, while LightClothNet achieves up to 91.76% precision and 0.91 F1-score with low latency. This result demonstrates how multimodal control, lightweight classification, and diffusion generation are unified for fast, flexible, and userdriven VTON applications.Description
Keywords
Source
Wu, J., Nguyen, M., & Yan, W. Q. (2025). A diffusion model for virtual try-on systems. In 2024 39th International Conference on Image and Vision Computing New Zealand (IVCNZ) (pp. 1–6). IEEE. https://doi.org/10.1109/IVCNZ63833.2024.11281834
Publisher's version
Rights statement
Copyright © 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
