Repository logo
 

Diffusion Model for A Virtual Try-On System

Supervisor

Item type

Conference Contribution

Degree name

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

We present a modular virtual try-on (VTON) system that integrates natural language control, efficient diffusion-based image synthesis, and lightweight garment classification. User intent is parsed by a large language model (LLM) into structured visual prompts. A LoRA-tuned diffusion model generates tryon images conditioned on pose and segmentation maps, while a compact classifier, LightClothNet, handles five-category clothing recognition and pre-filtering. The pipeline is built using ComfyUI nodes and orchestrated via Dify. Compared to the existing methods, the proposed system offers improved realism, garment-pose alignment, and controllability. Our evaluations on the DressCode and VITON-HD datasets show that LoRA fine-tuning enhances fidelity under limited data, while LightClothNet achieves up to 91.76% precision and 0.91 F1-score with low latency. This result demonstrates how multimodal control, lightweight classification, and diffusion generation are unified for fast, flexible, and userdriven VTON applications.

Description

Keywords

Source

Wu, J., Nguyen, M., & Yan, W. Q. (2025). A diffusion model for virtual try-on systems. In 2024 39th International Conference on Image and Vision Computing New Zealand (IVCNZ) (pp. 1–6). IEEE. https://doi.org/10.1109/IVCNZ63833.2024.11281834

Rights statement

Copyright © 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.