Hi! I am Minsoo Kim, a Machine Learning Researcher at the Apple MIND team.
I received my Ph.D. from the
AI Hardware & Algorithm Lab
at Hanyang University, advised by Professor
Jungwook Choi.
Here is my CV.
My research focuses on efficient algorithms that make large language models
practical to deploy in the real world. I have worked on model quantization (QAT),
knowledge distillation, and long-context optimization via KV cache compression
for LLMs and MLLMs. I am currently interested in agentic memory, efficient
conversational LLMs, and long-context LLMs/MLLMs.
EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments
Minsoo Kim, Arnav Kundu, Han-Byul Kim, Richa Dixit, Minsik Cho
ICML 2026
paper |
code
BeaconKV: Key-Value Cache Compression Guided by Beacon Queries for Efficient Large Reasoning Model Inference
Janghyeon Kim,
Minsoo Kim, Kyuhong Shim, Jungwook Choi
ICML 2026
InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
NeurIPS 2025
paper |
code
Learning Contextual Retrieval for Robust Conversational Search
Seunghan Yang, Juntae Lee, Jihwan Bang, Kyuhong Shim,
Minsoo Kim, Simyung Chang
EMNLP 2025
paper
RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy
Geonho Lee*, Janghwan Lee*, Sukjin Hong*,
Minsoo Kim, Euijai Ahn, Du-Seong Chang, Jungwook Choi
AAAI 2025
paper
InfiniPot: Infinite Context Processing on Memory-Constrained LLMs
Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
EMNLP 2024
paper
RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models
Minsoo Kim, Sihwa Lee, Won Yong Sung, Jungwook Choi
ACL 2024 (Findings)
paper
Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment
Janghwan Lee*, Seongmin Park*, Suk-Jin Hong,
Minsoo Kim, Du-Seong Chang, Jungwook Choi
ACL 2024
paper
Token-Scaled Logit Distillation for Ternary Weight Generative Language Models
Minsoo Kim, Sihwa Lee, Janghwan Lee, Suk-Jin Hong, Du-Seong Chang, Won Yong Sung, Jungwook Choi
NeurIPS 2023
paper |
code
Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization
Janghwan Lee*,
Minsoo Kim*, Seungcheol Baek, Seokjoong Hwang, Wonyong Sung, Jungwook Choi
EMNLP 2023
paper
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi
EACL 2023
paper |
code
Understanding and Improving Knowledge Distillation for Quantization Aware Training of Large Transformer Encoders
Minsoo Kim, Sihwa Lee, Suk-Jin Hong, Du-Seong Chang, Jungwook Choi
EMNLP 2022
paper |
code
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Joonsang Yu, Junki Park, Seongmin Park,
Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi
DAC 2022
paper
Reviewer Award — EMNLP 2024 (Outstanding), ICML 2026 (Gold)
3rd Place, AICAS Grand Challenge 2024 (SW&HW Co-Optimization for LLM)
Winner, Qualcomm Innovation Fellowship Korea 2023
1st Place, AI Grand Challenge 2020 (Korea Ministry of Science and ICT)
Conference Reviewer: NeurIPS, ICLR, ICML, COLM, AAAI, ACL Rolling Review (ARR)
Student Volunteer: EMNLP 2022–2024