Lujun Li | Personal Homepage

About Me

I am currently a final-year Ph.D. candidate in HKUST, supervised by Prof. Yi-Ke Guo (Provost & Fellow of REng|HKEng|IEEE).

My research focuses on advancing Efficient Machine Learning and Large Language Models to make AGI smaller, faster, greener and cheaper via novel compression techniques:

Efficient Mixture-of-Experts (MoE): MoE-SVD (ICML'25); D2-MoE (ICML'25)

Efficient LLM Fine-tuning: NoRA (ICCV'25), AIRA (ICCV'25)

Efficient Large Language Models: Pruner-Zero (ICML'24), DSA (NeurIPS'24), ALS (NeurIPS'24), STBLLM (ICLR'25)

Automated Efficient ML: EMQ (ICCV'23), ParZC (AAAI'25), AutoProx (AAAI'23), SasWOT (AAAI'23), Auto-GAS (ECCV'24), AttnZero (ECCV'24)

Automated Distillation: DisWOT (CVPR'23), Auto-KD (ICCV'23), KD-Zero (NeurIPS'23), DetKDS (ICML'24), Auto-DAS (ECCV'24)

Knowledge Distillation: Tf-FD (ECCV'22), SHAKE (NeurIPS'22), NORM (ICLR'23)

I am/was the Area Chair for ICLR'25, NeurIPS'25, ACM-MM'25, BMCV'24 and more. I am awarded the Ant InTech Scholarship–Future and DAAD NeT-AI Fellowship-2024.

I'll be at ICCV 25 in Hawaii (Oct 19 – 23th) in person—happy to chat!

News: Two papers accepted by ICCV'25 and there papers accepted by ICML'25 (on Efficient MoE LLMs):

Selected Preprints:

2025-08

CommonKV: Compressing KV Cache with Cross-layer Parameter Sharing

2025-06

BTC-LLM: Efficient Sub-1-Bit LLM Quantization via Learnable Transformation and Binary Codebook

2025-06

Sub-MoE: Efficient Mixture-of-Expert LLMs Compression via Subspace Expert Merging

First-author Publications (22)

*: Co-first author, **: Corresponding author or project leadership

AIRA: Activation-Informed Low-Rank Adaptation for Large Models

Lujun Li, Dezhi_Li, Cheng Lin, Wei Li, Wei Xue, Sirui Han, Yike Guo

International Conference on Computer Vision (ICCV-2025)

CCF-A, Top Conference in Computer Vision

Paper Code

AIRA introduces: (1) Outlier-weighted SVD initialization, (2) Outlier-driven dynamic rank assignment, and (3) Activation-informed training.

Efficient Fine-Tuning of Large Models via Nested Low-Rank Adaptation

Lujun Li, Cheng Lin, Dezhi Li, You-Liang Huang, Wei Li, Tianyu Wu, Jie Zou, Wei Xue, Sirui Han, Yike Guo

International Conference on Computer Vision (ICCV-2025)

CCF-A, Top Conference in Computer Vision

Paper Code

We present NoRA, a novel nested parameter-efficient LoRA structure, optimizes large model fine-tuning by employing serial structures.

MoE-SVD: Structured Mixture-of-Experts LLMs Compression via Singular Value Decomposition

Wei Li, Lujun Li*, Hao Gu, You-Liang Huang, Mark G. Lee, Shengjie Sun, Wei Xue, Yike Guo

Forty-second International Conference on Machine Learning (ICML-2025)

CCF-A, Top Conference in Artificial Intelligence

Paper Code

In this paper, we present a novel training-free compressor for MoE LLMs that uses SVD to break experts into smaller matrices, sharing and trimming them to save memory, speed up inference.

Delta Decompression for MoE-based LLMs Compressionn

Hao Gu, Wei Li, Lujun Li*, Zhu Qiyuan, Mark G. Lee, Shengjie Sun, Wei Xue, Yike Guo

Forty-second International Conference on Machine Learning (ICML-2025)

CCF-A, Top Conference in Artificial Intelligence

Paper Code

In this paper, we present D2-MoE, a new delta decompression MoE compressor that decomposes expert weights into a shared base weight and unique delta weights..

STBLLM: Breaking the 1-Bit Barrier with Structured Binary LLMs

Peijie Dong, Lujun Li*, Yuedong Zhong, DaYou Du, Ruibo FAN, Yuhan Chen, Zhenheng Tang, Qiang Wang, Wei Xue, Yike Guo, Xiaowen Chu

The Thirteenth International Conference on Learning Representations (ICLR-2025)

THU-A, Top Conference in Artificial Intelligence

Paper Code

We introduces STBLLM, a novel approach that breaks the 1-bit barrier in language models by leveraging Structured Binary LLMs.

ParZC: Parametric Zero-Cost Proxies for Efficient NAS

Peijie Dong, Lujun Li*, Zhenheng Tang, Zimian Wei, Xiang Liu, Qiang Wang, Xiaowen Chu

Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-2025)

CCF-A, Top Conference in Artificial Intelligence

Paper Code

Our Parametric Zero-Cost Proxies (ParZC) improves zero-shot Neural Architecture Search by addressing unequal node importance and using novel techniques for uncertainty estimation and architecture ranking.

Discovering Sparsity Allocation for Layer-wise Pruning of Large Language Models

Lujun Li, Peijie Dong, Zhenheng Tang, Xiang Liu, Qiang Wang, Wenhan Luo, Wei Xue, Qifeng Liu, Xiaowen Chu, Yike Guo

Conference on Neural Information Processing Systems (NeurIPS-2024)

CCF-A, Top Conference in Machine Learning

Paper Code

In this paper, we present DSA, the first automated framework for discovering sparsity allocation schemes for layer-wise pruning in Large Language Models.

Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment

Wei Li, Lujun Li**, Mark G. Lee, Shengjie Sun

Conference on Neural Information Processing Systems (NeurIPS-2024)

CCF-A, Top Conference in Machine Learning

Paper Code

In this paper, we present an approach called Adaptive Layer Sparsity for optimizing large language models by selectively pruning features in intermediate layers.

Pruner-Zero: Evolving Symbolic Pruning Metric From Scratch for Large Language Models

Peijie Dong, Lujun Li*, Zhenheng Tang, Xiang Liu, Xinglin Pan, Qiang Wang, Xiaowen Chu

International Conference on Machine Learning (ICML-2024)

CCF-A, Top Conference in Machine Learning

Paper Code

We propose Pruner-Zero framework to automatically devise pruning metrics for post-training pruning LLMs.

DetKDS: Knowledge Distillation Search for Object Detectors

Lujun Li, Yufan Bao, Peijie Dong, Chuanguang Yang, Anggeng Li, Wenhan Luo, Qifeng Liu, Wei Xue, Yike Guo

International Conference on Machine Learning (ICML-2024)

CCF-A, Top Conference in Machine Learning

Paper Code

In this paper, we present DetKDS, the first knowledge distillation search framework to enhance any detectors by searching for optimal distillation policies.

AttnZero: Efficient Attention Discovery for Vision Transformers

Lujun Li, Zimian Wei, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo

European Conference on Computer Vision (ECCV-2024)

Top Conference in Computer Vision

Paper Code

In this paper, we present AttnZero, the first framework for automatically discovering efficient attention modules tailored for Vision Transformers (ViTs).

Auto-GAS: Automated Proxy Discovery for Training-free Generative Architecture Search

Lujun Li, Haosen Sun, Shiwen Li, Peijie Dong, Wenhan Luo, Wei Xue, Qifeng Liu, Yike Guo

European Conference on Computer Vision (ECCV-2024)

Top Conference in Computer Vision

Paper Code

In this paper, we introduce Auto-GAS, the first training-free Generative Architecture Search (GAS) framework enabled by an auto-discovered proxy.

Auto-DAS: Automated Proxy Discovery for Training-free Distillation-aware Architecture Search

Haosen Sun, Lujun Li, Peijie Dong, Zimian Wei, Shitong Shao

European Conference on Computer Vision (ECCV-2024)

Top Conference in Computer Vision

Paper Code

In this paper, we present Auto-DAS, an automatic proxy discovery framework using an Evolutionary Algorithm (EA) for training-free DAS.

KD-Zero: Evolving Knowledge Distiller for Any Teacher-Student Pairs

Lujun Li, Peijie Dong, Anggeng Li, Zimian Wei, Ya Yang

Conference on Neural Information Processing Systems (NeurIPS-2023)

CCF-A, Top Conference in Machine Learning

Paper Code

We present KD-Zero, the first auto-search framework for evolving best distiller from scratch to alleviate teacher-student gaps.

Automated Knowledge Distillation via Monte Carlo Tree Search

Lujun Li, Peijie Dong, Zimian Wei, Ya Yang

International Conference on Computer Vision (ICCV-2023)

CCF-A, Top Conference in Computer Vision

Paper Code

In this paper, we present Auto-KD, the first automated search framework for optimal knowledge distillation design.

EMQ: Evolving Training-free Proxies for Automated Mixed Precision Quantization

Peijie Dong, Lujun Li*, Zimian Wei, Xin Niu, Zhiliang Tian, Hengyue Pan

International Conference on Computer Vision (ICCV-2023)

CCF-A, Top Conference in Computer Vision

Paper Code

We first build the MQ-Bench-101 and develop an automatic search of proxies framework for MQ via evolving algorithms.

Auto-Prox: Training-Free Vision Transformer Architecture Search via Automatic Proxy Discovery

Zimian Wei, Lujun Li*, Peijie Dong, Zheng Hui, Anggeng Li, Menglong Lu, Hengyue Pan, Dongsheng Li

Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024)

CCF-A, Top Conference in Artificial Intelligence

Paper Code

We first build the ViT-Bench-101 and develop zero-cost proxy search for Vision Transformer on multiple datasets.

SasWOT: Real-time Semantic Segmentation Architecture Search WithOut Training

Chendi Zhu, Lujun Li*, Yuli Wu, Zheng Hui, Zhengxing Sun

Thirty-Eighth AAAI Conference on Artificial Intelligence (AAAI-2024)

CCF-A, Top Conference in Artificial Intelligence

Paper Code

We present the first training-free architecture search framework for Real-time Semantic Segmentation.

Shadow Knowledge Distillation: Bridging Offline and Online Knowledge Transfer

Lujun Li, Zhe Jin

Conference on Neural Information Processing Systems (NeurIPS-2022)

CCF-A, Top Conference in Machine Learning

Paper Code

We present SHAKE with reversed distillation and shadow head to bridge offline and online knowledge transfer, achieving superior performance in multiple tasks and scenarios.

Self-Regulated Feature Learning via Teacher-free Feature Distillation

Lujun Li

European Conference on Computer Vision (ECCV-2022)

CCF-B, Top Conference in Computer Vision

Paper Code

We propose Tf-FD for reusing channel-wise and layer-wise meaningful features within the student to provide teacher-like knowledge without an additional model.

DisWOT: Student Architecture Search for Distillation WithOut Training

Peijie Dong, Lujun Li***, Zimian Wei

IEEE / CVF Computer Vision and Pattern Recognition Conference (CVPR-2023)

CCF-A, Top Conference in Computer Vision

Paper Code

We propose a strong self-supervised augmented knowledge distillation method from hierarchical feature maps for image classification.

NORM: Knowledge Distillation via N-to-One Representation Matching

Xiaolong Liu, Lujun Li***, Chao Li, Anbang Yao

International Conference on Learning Representations (ICLR-2023)

Top Conference in Machine Learning

Paper Code

We presents a new knowledge distillation method via n-to-one representation matching.

Lujun LI (李路军)

About Me

First-author Publications (22)

*: Co-first author, **: Corresponding author or project leadership

Efficient Machine Learning and Large Language Models

Honors

Services

Area Chair

Conference Review

Journal Review