| May 26, 2026 | ALMA: 9,000개 주석만으로 LLM을 정렬하기 |
| May 26, 2026 | PIKA: 난이도에 집중한 expert-level 합성 정렬 데이터셋 |
| May 26, 2026 | WildJailbreak: in-the-wild 탈옥을 대규모로 합성한 안전 학습 데이터셋 |
| May 26, 2026 | BeaverTails: helpfulness와 harmlessness를 분리한 안전 정렬 데이터셋 |
| May 26, 2026 | HarmfulQA & RED-INSTRUCT: Chain of Utterances로 유해 질문을 만들고 안전 정렬까지 |
| May 26, 2026 | HH-RLHF Red-Team Attempts: Anthropic의 38,961건 레드팀 대화 데이터셋 |
| May 26, 2026 | AdvBench: LLM 공격 평가의 사실상 표준이 된 유해 행동 데이터셋 |
| May 25, 2026 | 에이전트란 무엇인가: 지능형 에이전트의 고전 정의부터 LLM 에이전트까지 |
| May 25, 2026 | AgentBench: Evaluating LLMs as Agents |
| May 25, 2026 | GAIA: a benchmark for General AI Assistants |
| May 25, 2026 | SWE-bench: Can Language Models Resolve Real-World GitHub Issues? |
| May 25, 2026 | TravelPlanner: A Benchmark for Real-World Planning with Language Agents |
| May 25, 2026 | MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents |
| May 25, 2026 | OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments |
| May 18, 2026 | Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations |
| May 18, 2026 | Constitutional AI: Harmlessness from AI Feedback |
| May 18, 2026 | JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models |
| May 18, 2026 | HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal |
| May 16, 2026 | AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents |
| May 16, 2026 | InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents |
| May 16, 2026 | AgenticRed: Evolving Agentic Systems for Red-Teaming |
| May 16, 2026 | Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models |
| May 16, 2026 | Curiosity-driven Red-teaming for Large Language Models |
| May 16, 2026 | Many-shot Jailbreaking |
| May 16, 2026 | Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack |
| May 16, 2026 | GPTFuzzer: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts |
| May 16, 2026 | Tree of Attacks: Jailbreaking Black-Box LLMs Automatically |
| May 16, 2026 | AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models |
| May 16, 2026 | Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned |
| May 16, 2026 | Red Teaming Language Models with Language Models |
| Apr 29, 2026 | CodeAttack: Code-based Adversarial Attacks for Pre-trained Programming Language Models |
| Apr 29, 2026 | Jailbreaking Black Box Large Language Models in Twenty Queries |
| Apr 29, 2026 | Universal and Transferable Adversarial Attacks on Aligned Language Models |
| Apr 12, 2026 | A.X K1 Technical Report |
| Apr 12, 2026 | TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications |
| Apr 11, 2026 | TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models |
| Apr 11, 2026 | FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling |
| Apr 09, 2026 | FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision |
| Dec 28, 2024 | LoRA vs Full Fine-tuning: An Illusion of Equivalence |
| Dec 11, 2024 | Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method 설명 |
| Sep 19, 2024 | META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge 설명 |
| Nov 19, 2023 | What Makes Multi-modal Learning Better than Single (Provably) |
| Aug 06, 2023 | FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning |
| Jun 28, 2023 | TinyViT |
| Jun 28, 2023 | EdgeViT |
| Jun 21, 2023 | Integral Neural Network |
| Apr 29, 2023 | Invariant Representation for Unsupervised Image Restoration |
| Apr 16, 2023 | DINE: Domain Adaptation from Single and Multiple Black-box Predictors |
| Apr 16, 2023 | MobileOne: An Improved One millisecond Mobile Backbone |
| Apr 08, 2023 | Proper Reuse of Image Classification Features Improves Object Detection |
| Apr 01, 2023 | Meta Pseudo Labels |
| Mar 29, 2023 | MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer |
| Mar 28, 2023 | FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness |
| Mar 21, 2023 | Cross-Domain Adaptive Teacher for Object Detection |
| Mar 14, 2023 | Rethinking “Batch” in BatchNorm |
| Mar 07, 2023 | Convolutional Character Network |
| Feb 18, 2023 | Simple Baselines for Image Restoration |
| Jan 05, 2023 | Bootstrap your own latent |
| May 09, 2022 | FitNet |
| Jan 27, 2021 | [AutoML] NASNet |
| Jan 10, 2021 | 학부생이 본 SENet |
| Jan 03, 2021 | [네트워크 경량화] EfficientNet |