Minsoo Kim

Hi! I am Minsoo Kim, a Machine Learning Researcher at the Apple MIND team. I received my Ph.D. from the AI Hardware & Algorithm Lab at Hanyang University, advised by Professor Jungwook Choi. Here is my CV.

My research focuses on efficient algorithms that make large language models practical to deploy in the real world. I have worked on model quantization (QAT), knowledge distillation, and long-context optimization via KV cache compression for LLMs and MLLMs. I am currently interested in agentic memory, efficient conversational LLMs, and long-context LLMs/MLLMs.

Tangram: Unlocking Non-Uniform KV Cache for Efficient Multi-turn LLM Serving

Hyungmin Kim*, Minsoo Kim*, Hongseok Kim, Jungwook Choi
Preprint
paper | code

EpiCache: Episodic KV Cache Management for Long-Term Conversation on Resource-Constrained Environments

Minsoo Kim, Arnav Kundu, Han-Byul Kim, Richa Dixit, Minsik Cho
ICML 2026
paper | code

BeaconKV: Key-Value Cache Compression Guided by Beacon Queries for Efficient Large Reasoning Model Inference

Janghyeon Kim, Minsoo Kim, Kyuhong Shim, Jungwook Choi
ICML 2026

InfiniPot-V: Memory-Constrained KV Cache Compression for Streaming Video Understanding

Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
NeurIPS 2025
paper | code

Learning Contextual Retrieval for Robust Conversational Search

Seunghan Yang, Juntae Lee, Jihwan Bang, Kyuhong Shim, Minsoo Kim, Simyung Chang
EMNLP 2025
paper

RILQ: Rank-Insensitive LoRA-based Quantization Error Compensation for Boosting 2-bit Large Language Model Accuracy

Geonho Lee*, Janghwan Lee*, Sukjin Hong*, Minsoo Kim, Euijai Ahn, Du-Seong Chang, Jungwook Choi
AAAI 2025
paper

InfiniPot: Infinite Context Processing on Memory-Constrained LLMs

Minsoo Kim, Kyuhong Shim, Jungwook Choi, Simyung Chang
EMNLP 2024
paper

RA-LoRA: Rank-Adaptive Parameter-Efficient Fine-Tuning for Accurate 2-bit Quantized Large Language Models

Minsoo Kim, Sihwa Lee, Won Yong Sung, Jungwook Choi
ACL 2024 (Findings)
paper

Improving Conversational Abilities of Quantized Large Language Models via Direct Preference Alignment

Janghwan Lee*, Seongmin Park*, Suk-Jin Hong, Minsoo Kim, Du-Seong Chang, Jungwook Choi
ACL 2024
paper

Token-Scaled Logit Distillation for Ternary Weight Generative Language Models

Minsoo Kim, Sihwa Lee, Janghwan Lee, Suk-Jin Hong, Du-Seong Chang, Won Yong Sung, Jungwook Choi
NeurIPS 2023
paper | code

Enhancing Computation Efficiency in Large Language Models through Weight and Activation Quantization

Janghwan Lee*, Minsoo Kim*, Seungcheol Baek, Seokjoong Hwang, Wonyong Sung, Jungwook Choi
EMNLP 2023
paper

Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers

Minsoo Kim, Kyuhong Shim, Seongmin Park, Wonyong Sung, Jungwook Choi
EACL 2023
paper | code

Understanding and Improving Knowledge Distillation for Quantization Aware Training of Large Transformer Encoders

Minsoo Kim, Sihwa Lee, Suk-Jin Hong, Du-Seong Chang, Jungwook Choi
EMNLP 2022
paper | code

NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference

Joonsang Yu, Junki Park, Seongmin Park, Minsoo Kim, Sihwa Lee, Dong Hyun Lee, Jungwook Choi
DAC 2022
paper

Minsoo Kim

About

Experience

Publications

Honors & Awards

Academic Services