paper

an archive of posts in this category

May 26, 2026 ALMA: 9,000개 주석만으로 LLM을 정렬하기
May 26, 2026 PIKA: 난이도에 집중한 expert-level 합성 정렬 데이터셋
May 26, 2026 WildJailbreak: in-the-wild 탈옥을 대규모로 합성한 안전 학습 데이터셋
May 26, 2026 BeaverTails: helpfulness와 harmlessness를 분리한 안전 정렬 데이터셋
May 26, 2026 HarmfulQA & RED-INSTRUCT: Chain of Utterances로 유해 질문을 만들고 안전 정렬까지
May 26, 2026 HH-RLHF Red-Team Attempts: Anthropic의 38,961건 레드팀 대화 데이터셋
May 26, 2026 AdvBench: LLM 공격 평가의 사실상 표준이 된 유해 행동 데이터셋
May 25, 2026 에이전트란 무엇인가: 지능형 에이전트의 고전 정의부터 LLM 에이전트까지
May 25, 2026 AgentBench: Evaluating LLMs as Agents
May 25, 2026 GAIA: a benchmark for General AI Assistants
May 25, 2026 SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
May 25, 2026 TravelPlanner: A Benchmark for Real-World Planning with Language Agents
May 25, 2026 MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents
May 25, 2026 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments
May 18, 2026 Llama Guard: LLM-based Input-Output Safeguard for Human-AI Conversations
May 18, 2026 Constitutional AI: Harmlessness from AI Feedback
May 18, 2026 JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models
May 18, 2026 HarmBench: A Standardized Evaluation Framework for Automated Red Teaming and Robust Refusal
May 16, 2026 AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents
May 16, 2026 InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents
May 16, 2026 AgenticRed: Evolving Agentic Systems for Red-Teaming
May 16, 2026 Auto-RT: Automatic Jailbreak Strategy Exploration for Red-Teaming Large Language Models
May 16, 2026 Curiosity-driven Red-teaming for Large Language Models
May 16, 2026 Many-shot Jailbreaking
May 16, 2026 Great, Now Write an Article About That: The Crescendo Multi-Turn LLM Jailbreak Attack
May 16, 2026 GPTFuzzer: Red Teaming Large Language Models with Auto-Generated Jailbreak Prompts
May 16, 2026 Tree of Attacks: Jailbreaking Black-Box LLMs Automatically
May 16, 2026 AutoDAN: Generating Stealthy Jailbreak Prompts on Aligned Large Language Models
May 16, 2026 Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
May 16, 2026 Red Teaming Language Models with Language Models
Apr 29, 2026 CodeAttack: Code-based Adversarial Attacks for Pre-trained Programming Language Models
Apr 29, 2026 Jailbreaking Black Box Large Language Models in Twenty Queries
Apr 29, 2026 Universal and Transferable Adversarial Attacks on Aligned Language Models
Apr 12, 2026 A.X K1 Technical Report
Apr 12, 2026 TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications
Apr 11, 2026 TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models
Apr 11, 2026 FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling
Apr 09, 2026 FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision
Dec 28, 2024 LoRA vs Full Fine-tuning: An Illusion of Equivalence
Dec 11, 2024 Pretraining Data Detection for Large Language Models: A Divergence-based Calibration Method 설명
Sep 19, 2024 META-REWARDING LANGUAGE MODELS: Self-Improving Alignment with LLM-as-a-Meta-Judge 설명
Nov 19, 2023 What Makes Multi-modal Learning Better than Single (Provably)
Aug 06, 2023 FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning
Jul 12, 2023 Fairness-aware Data Valuation for Supervised Learning
Jun 28, 2023 TinyViT
Jun 28, 2023 EdgeViT
Jun 21, 2023 Integral Neural Network
Apr 29, 2023 Invariant Representation for Unsupervised Image Restoration
Apr 16, 2023 DINE: Domain Adaptation from Single and Multiple Black-box Predictors
Apr 16, 2023 MobileOne: An Improved One millisecond Mobile Backbone
Apr 08, 2023 Proper Reuse of Image Classification Features Improves Object Detection
Apr 01, 2023 Meta Pseudo Labels
Mar 29, 2023 MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Mar 28, 2023 FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Mar 21, 2023 Cross-Domain Adaptive Teacher for Object Detection
Mar 14, 2023 Rethinking “Batch” in BatchNorm
Mar 07, 2023 Convolutional Character Network
Feb 18, 2023 Simple Baselines for Image Restoration
Jan 05, 2023 Bootstrap your own latent
May 09, 2022 FitNet
Jan 27, 2021 [AutoML] NASNet
Jan 10, 2021 학부생이 본 SENet
Jan 03, 2021 [네트워크 경량화] EfficientNet
Sep 15, 2019 학부생이 보는 GAN