Lujun Li

Lujun LI (李路军)

Ph.D. candidate in Hong Kong University of Science and Technology (HKUST)

Clear Water Bay Peninsula, New Territories, Hong Kong

lilujunai@gmail.com, lliee@ust.hk

About Me

I am currently a final-year Ph.D. candidate in HKUST, supervised by Prof. Yi-Ke Guo (Provost & Fellow of REng|HKEng|IEEE).

My research focuses on advancing Efficient Machine Learning and Large Language Models to make AGI smaller, faster, greener and cheaper via novel compression techniques:

Efficient Mixture-of-Experts (MoE): MoE-SVD (ICML'25); D2-MoE (ICML'25)

Efficient LLM Fine-tuning: NoRA (ICCV'25), AIRA (ICCV'25)

Efficient Large Language Models: Pruner-Zero (ICML'24), DSA (NeurIPS'24), ALS (NeurIPS'24), STBLLM (ICLR'25)

Automated Efficient ML: EMQ (ICCV'23), ParZC (AAAI'25), AutoProx (AAAI'23), SasWOT (AAAI'23), Auto-GAS (ECCV'24), AttnZero (ECCV'24)

Automated Distillation: DisWOT (CVPR'23), Auto-KD (ICCV'23), KD-Zero (NeurIPS'23), DetKDS (ICML'24), Auto-DAS (ECCV'24)

Knowledge Distillation: Tf-FD (ECCV'22), SHAKE (NeurIPS'22), NORM (ICLR'23)

I am/was the Area Chair for ICLR'25, NeurIPS'25, ACM-MM'25, BMCV'24 and more. I am awarded the Ant InTech Scholarship–Future and DAAD NeT-AI Fellowship-2024.

 I'll be at ICCV 25 in Hawaii (Oct 19 – 23th) in person—happy to chat!

News: Two papers accepted by ICCV'25 and there papers accepted by ICML'25 (on Efficient MoE LLMs):

Selected Preprints:

First-author Publications (22)

*: Co-first author, **: Corresponding author or project leadership

Efficient Machine Learning and Large Language Models

AIRA: Activation-Informed Low-Rank Adaptation for Large Models
AIRA: Activation-Informed Low-Rank Adaptation for Large Models
Lujun Li, Dezhi_Li, Cheng Lin, Wei Li, Wei Xue, Sirui Han, Yike Guo
International Conference on Computer Vision (ICCV-2025)
CCF-A, Top Conference in Computer Vision
AIRA introduces: (1) Outlier-weighted SVD initialization, (2) Outlier-driven dynamic rank assignment, and (3) Activation-informed training.
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation
Lujun Li, Cheng Lin, Dezhi Li, You-Liang Huang, Wei Li, Tianyu Wu, Jie Zou, Wei Xue, Sirui Han, Yike Guo
International Conference on Computer Vision (ICCV-2025)
CCF-A, Top Conference in Computer Vision
We present NoRA, a novel nested parameter-efficient LoRA structure, optimizes large model fine-tuning by employing serial structures.
MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition
MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition
Wei Li, Lujun Li*, Hao Gu, You-Liang Huang, Mark G. Lee, Shengjie Sun, Wei Xue, Yike Guo
Forty-second International Conference on Machine Learning (ICML-2025)
CCF-A, Top Conference in Artificial Intelligence
In this paper, we present a novel training-free compressor for MoE LLMs that uses SVD to break experts into smaller matrices, sharing and trimming them to save memory, speed up inference.
Delta Decompression for MoE-based LLMs Compression
Delta Decompression for MoE-based LLMs Compressionn
Hao Gu, Wei Li, Lujun Li*, Zhu Qiyuan, Mark G. Lee, Shengjie Sun, Wei Xue, Yike Guo
Forty-second International Conference on Machine Learning (ICML-2025)
CCF-A, Top Conference in Artificial Intelligence
In this paper, we present D2-MoE, a new delta decompression MoE compressor that decomposes expert weights into a shared base weight and unique delta weights..
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs
Peijie Dong, Lujun Li*, Yuedong Zhong, DaYou Du, Ruibo FAN, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu
The Thirteenth International Conference on Learning Representations (ICLR-2025)
THU-A, Top Conference in Artificial Intelligence
We introduces STBLLM, a novel approach that breaks the 1-bit barrier in language models by leveraging Structured Binary LLMs.
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
ParZC: Parametric Zero-Cost Proxies for Efficient NAS
Peijie Dong, Lujun Li*, Zhenheng Tang, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu
Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-2025)
CCF-A, Top Conference in Artificial Intelligence
Our Parametric Zero-Cost Proxies (ParZC) improves zero-shot Neural Architecture Search by addressing unequal node importance and using novel techniques for uncertainty estimation and architecture ranking.
Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models
Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models
Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, Yike Guo
Conference on Neural Information Processing Systems (NeurIPS-2024)
CCF-A, Top Conference in Machine Learning
In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models.
Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment
Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment
Wei Li, Lujun Li**, Mark G. Lee, Shengjie Sun
Conference on Neural Information Processing Systems (NeurIPS-2024)
CCF-A, Top Conference in Machine Learning
In this paper, we present an approach called Adaptive Layer Sparsity for optimizing large language models by selectively pruning features in intermediate layers.
Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models
Peijie Dong, Lujun Li*, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu
International Conference on Machine Learning (ICML-2024)
CCF-A, Top Conference in Machine Learning
We propose Pruner-Zero framework to automatically devise pruning metrics for post-training pruning LLMs.
DetKDS: Knowledge Distillation Search for Object Detectors
DetKDS: Knowledge Distillation Search for Object Detectors
Lujun Li, Yufan Bao, Peijie Dong, Chuanguang Yang, Anggeng Li, Wenhan Luo, Qifeng Liu, Wei Xue, Yike Guo
International Conference on Machine Learning (ICML-2024)
CCF-A, Top Conference in Machine Learning
In this paper, we present DetKDS, the first knowledge distillation search framework to enhance any detectors by searching for optimal distillation policies.
AttnZero: Efficient Attention Discovery for Vision Transformers
AttnZero: Efficient Attention Discovery for Vision Transformers
Lujun Li, Zimian Wei, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo
European Conference on Computer Vision (ECCV-2024)
Top Conference in Computer Vision
In this paper, we present AttnZero, the first framework for automatically discovering efficient attention modules tailored for Vision Transformers (ViTs).
Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search
Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo
European Conference on Computer Vision (ECCV-2024)
Top Conference in Computer Vision
In this paper, we introduce Auto-GAS, the first training-free Generative Architecture Search (GAS) framework enabled by an auto-discovered proxy.
Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search
Haosen Sun, Lujun Li, Peijie Dong, Zimian Wei, Shitong Shao
European Conference on Computer Vision (ECCV-2024)
Top Conference in Computer Vision
In this paper, we present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free DAS.
KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs
Lujun Li, Peijie Dong, Anggeng Li, Zimian Wei, Ya Yang
Conference on Neural Information Processing Systems (NeurIPS-2023)
CCF-A, Top Conference in Machine Learning
We present KD-Zero, the first auto-search framework for evolving best distiller from scratch to alleviate teacher-student gaps.
Automated Knowledge Distillation via Monte Carlo Tree Search
Automated Knowledge Distillation via Monte Carlo Tree Search
Lujun Li, Peijie Dong, Zimian Wei, Ya Yang
International Conference on Computer Vision (ICCV-2023)
CCF-A, Top Conference in Computer Vision
In this paper, we present Auto-KD, the first automated search framework for optimal knowledge distillation design.
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization
Peijie Dong, Lujun Li*, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan
International Conference on Computer Vision (ICCV-2023)
CCF-A, Top Conference in Computer Vision
We first build the MQ-Bench-101 and develop an automatic search of proxies framework for MQ via evolving algorithms.
Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery
Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery
Zimian Wei, Lujun Li*, Peijie Dong, Zheng Hui, Anggeng Li, Menglong Lu, Hengyue Pan, Dongsheng Li
Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024)
CCF-A, Top Conference in Artificial Intelligence
We first build the ViT-Bench-101 and develop zero-cost proxy search for Vision Transformer on multiple datasets.
SasWOT: Real-time Semantic Segmentation Architecture Search WithOut Training
SasWOT: Real-time Semantic Segmentation Architecture Search WithOut Training
Chendi Zhu, Lujun Li*, Yuli Wu, Zheng Hui, Zhengxing Sun
Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024)
CCF-A, Top Conference in Artificial Intelligence
We present the first training-free architecture search framework for Real-time Semantic Segmentation.
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer
Lujun Li, Zhe Jin
Conference on Neural Information Processing Systems (NeurIPS-2022)
CCF-A, Top Conference in Machine Learning
We present SHAKE with reversed distillation and shadow head to bridge offline and online knowledge transfer, achieving superior performance in multiple tasks and scenarios.
Self-Regulated Feature Learning via Teacher-free Feature Distillation
Self-Regulated Feature Learning via Teacher-free Feature Distillation
Lujun Li
European Conference on Computer Vision (ECCV-2022)
CCF-B, Top Conference in Computer Vision
We propose Tf-FD for reusing channel-wise and layer-wise meaningful features within the student to provide teacher-like knowledge without an additional model.
DisWOT: Student Architecture Search for Distillation WithOut Training
DisWOT: Student Architecture Search for Distillation WithOut Training
Peijie Dong, Lujun Li***, Zimian Wei
IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR-2023)
CCF-A, Top Conference in Computer Vision
We propose a strong self-supervised augmented knowledge distillation method from hierarchical feature maps for image classification.
NORM: Knowledge Distillation via N-to-One Representation Matching
NORM: Knowledge Distillation via N-to-One Representation Matching
Xiaolong Liu, Lujun Li***, Chao Li, Anbang Yao
International Conference on Learning Representations (ICLR-2023)
Top Conference in Machine Learning
We presents a new knowledge distillation method via n-to-one representation matching.

Honors

Services

Area Chair

Conference Review

Journal Review