Repository logo
 

ChatClothes: An AI-Powered Virtual Try-On System

dc.contributor.advisorYan, Wei Qi
dc.contributor.authorZhang, Yuchao
dc.date.accessioned2025-11-25T20:44:23Z
dc.date.available2025-11-25T20:44:23Z
dc.date.issued2025
dc.description.abstractWith the advancement of deep learning, latent diffusion models, and large language models (LLMs), virtual try-on (VTON) has emerged as a promising solution for personalized fashion experiences in online shopping, digital design, and augmented retail. This thesis proposes ChatClothes, a modular and multimodal VTON system that integrates controllable diffusion-based generation with dialogue-driven garment interaction. The system architecture is orchestrated by Dify, with ComfyUI managing the visual generation pipeline and Ollama hosting local LLMs. At its core, ChatClothes employs DeepSeek, a customized large language model that interprets natural language instructions and transforms them into structured prompts for image generation and interactive refinement. This prompt-based guidance enhances semantic alignment and enables intuitive user control beyond predefined attribute labels. To improve structural consistency and detail fidelity in image synthesis, this work introduces Low-Rank Adaptation (LoRA) for fine-tuning the original OOTDiffusion model. Without altering the backbone architecture, this strategy focuses on enhancing pose alignment, hand generation accuracy, and garment texture reconstruction. By integrating LoRA modules, the model achieves effective adaptation and fine-grained refinement even under limited training resources. To support garment classification, YOLO12n-LC, a lightweight variant based on YOLO12n, is developed to balance accuracy, speed, and model size. It achieves competitive performance across multiple clothing categories while maintaining feasibility for device-level deployment. A complete system workflow connects image preprocessing, language understanding, garment classification, image synthesis, and output evaluation. Experiments on datasets such as DressCode and VITON-HD demonstrate the system’s initial validation in terms of realism, controllability, structural preservation. This work presents a unified framework bridging vision-language interaction with diffusion-based generation, establishing a foundation for scalable, user-centered, and device-adaptable fashion AI systems applicable across e-commerce, AR fitting mirrors, personalization platforms, and automated outfit design.
dc.identifier.urihttp://hdl.handle.net/10292/20210
dc.language.isoen
dc.publisherAuckland University of Technology
dc.rights.accessrightsOpenAccess
dc.titleChatClothes: An AI-Powered Virtual Try-On System
dc.typeThesis
thesis.degree.grantorAuckland University of Technology
thesis.degree.nameMaster of Computer and Information Sciences

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
ZhangY.pdf
Size:
9.82 MB
Format:
Adobe Portable Document Format
Description:
Thesis

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
890 B
Format:
Item-specific license agreed upon to submission
Description:

Collections