Repository logo
 

Knowledge Distillation With Differentiable Optimal Transport on Graph Neural Networks

aut.event.date2024-12-02 to 2024-12-06
aut.event.place, Auckland
dc.contributor.authorLi, Mengyao
dc.contributor.authorLiu, Yanbin
dc.contributor.authorChen, Ling
dc.date.accessioned2026-03-12T19:15:13Z
dc.date.available2026-03-12T19:15:13Z
dc.date.issued2025-07-22
dc.description.abstractKnowledge distillation (KD) transfers knowledge from a large, well-trained teacher network to a smaller student network, improving student’s performance without extra computational costs. Traditional KD methods usually focus on the logits or intermediate features. However, they might overlook the inherent correlation and suffer from capacity gaps due to the distinct architectures of the student and teacher. Relation-based distillation methods try to bridge these correlation but usually require a large memory bank for loss computation, being less efficient. To overcome those limitations, we propose a novel and efficient Optimal Transport-based Graph Distillation (OTGD) method. First, OTGD constructs attributed graphs for the teacher and student respectively, which are then utilized to capture both individual and relational knowledge through graph neural networks (GNNs). Then, we devise an innovative differentiable optimal transport objective to distill the teacher knowledge before and after GNNs learning, effectively incorporating both the feature-level and correlation-level knowledge. Specifically, our optimal transport objective is solved by the Sinkhorn algorithm without relying on an extra memory bank. This design makes our method efficient and numerically stable. Comprehensive experiments conducted on two benchmark datasets with diverse network architectures, and demonstrate that OTGD outperforms the state-of-the-art methods.
dc.identifier.citation In: Mahmud, M., Doborjeh, M., Wong, K., Leung, A.C.S., Doborjeh, Z., Tanveer, M. (eds) Neural Information Processing (ICONIP 2024) Part of the book series: Lecture Notes in Computer Science (LNCS, vol 15292). pp 194–209. Proceedings of the International Conference on Neural Information Processing, December 2-6, 2024, Auckland, New Zealand. https://iconip2024.org/
dc.identifier.doi10.1007/978-981-96-6594-5_15
dc.identifier.urihttp://hdl.handle.net/10292/20761
dc.publisherSpringer Nature
dc.relation.urihttps://link.springer.com/chapter/10.1007/978-981-96-6594-5_15
dc.rightsThis is the Author's Accepted Manuscript version of a conference paper published in the Proceedings of the International Conference on Neural Information Processing (ICONIP 2024) © 2026 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. The Version of Record is available at DOI: 10.1007/978-981-96-6594-5_15
dc.rights.accessrightsOpenAccess
dc.subjectKnowledge distillation
dc.subjectGraph neural network
dc.subjectOptimal transport
dc.titleKnowledge Distillation With Differentiable Optimal Transport on Graph Neural Networks
dc.typeConference Contribution
pubs.elements-id593387

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
ICONIP24___KD_with_OT_loss.pdf
Size:
3.06 MB
Format:
Adobe Portable Document Format
Description:
Author Accepted Manuscript under publisher's embargo until 22 July 2026
Loading...
Thumbnail Image
Name:
ICONIP 2024 programme.pdf
Size:
3.8 MB
Format:
Adobe Portable Document Format
Description:
Conference programme

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.37 KB
Format:
Plain Text
Description: