Won's Blog

공부 및 실험 공유

A.X K1 Technical Report

A.X K1 논문 리뷰 — 519B MoE 모델의 아키텍처, 데이터 파이프라인, Think-Fusion 학습 전략

9 min read · 2026

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

TelAgentBench 논문 리뷰 - 통신 도메인 LLM 에이전트의 5가지 핵심 역량 평가 벤치마크

24 min read · 2026

TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models

TelBench 논문 리뷰 — 통신 도메인 특화 LLM 벤치마크의 설계, 구축, 평가

22 min read · 2026

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

FlashAttention-4 논문 리뷰 — Blackwell GPU의 비대칭 스케일링에 맞춘 파이프라인 재설계와 소프트웨어 지수함수

11 min read · 2026

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

FlashAttention-3 논문 리뷰 — Hopper GPU의 비동기 실행과 FP8을 활용한 Attention 최적화

19 min read · 2026

Triton 07: Flash Attention 3 — Triton으로 어디까지 가능한가

Hopper 전용인 Flash Attention 3를 Triton으로 어디까지 따라잡을 수 있는가 — 확장 autotune·persistent kernel·실패한 실험까지

10 min read · 2026

Triton 06: Flash Attention 2 — FA1 대비 5가지 최적화

Flash Attention 2를 Triton으로 구현한다 — un-scaled 누적, exp2 트릭, Causal 2-stage, tl.dot accumulator, autotune

12 min read · 2026

Triton 05: Flash Attention — 종합 프로젝트

Flash Attention을 Triton으로 구현한다 — Forward/Backward 전체 구현과 RTX 4080·A100·H100·B200 아키텍처별 최적화 포인트

19 min read · 2026

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

FlashAttention-2 논문 리뷰 — non-matmul FLOPs 감소, 병렬화, warp partitioning 개선

12 min read · 2023

Universal Jailbreak Backdoors from Poisoned RLHF — 트리거 단어 하나가 'sudo'가 된다

White-Box Safety 시리즈 #6 — RLHF preference 데이터를 0.5% 오염시켜 모델에 'sudo' 트리거 단어를 심고, 그 단어를 어떤 프롬프트 뒤에 붙이면 모델이 보편적으로 jailbreak (Rando & Tramèr, ETH Zürich, ICLR 2024)

6 min read · May 29, 2026

2026 · llm red-teaming safety paper rlhf backdoor poisoning preference-data · paper
LoRA Undoes Safety — QLoRA로 Llama-2-70B-Chat의 거부율을 1%로

White-Box Safety 시리즈 #5 — QLoRA + 1 GPU + $200 미만으로 Llama-2-7B/13B/70B-Chat과 Mixtral-Instruct의 safety를 제거, PEFT만으로 frontier-scale alignment 무력화 (Lermen et al., Palisade Research, arXiv 2023)

6 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning lora qlora peft white-box · paper
Removing RLHF Protections in GPT-4 via Fine-Tuning — 340예시로 frontier API 깨기

White-Box Safety 시리즈 #4 — OpenAI fine-tuning API로 GPT-4의 RLHF 보호를 95% ASR로 제거, 공격 데이터는 약한 모델이 자동 생성 (Zhan et al., UIUC/Stanford, NAACL 2024)

5 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning gpt-4 api-attack weak-to-strong · paper
Shadow Alignment — 100개 QA + 1 GPU-시간으로 open-weight 5종 깨기

White-Box Safety 시리즈 #3 — 100쌍 유해 QA와 단일 GPU 1시간이면 LLaMA-2·Falcon·InternLM·Baichuan·Vicuna 5개 모델 정렬을 동시에 무력화 (Yang et al., UCSB/Fudan/Shanghai AI Lab, arXiv 2023)

6 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning shadow-alignment open-weight white-box · paper
Fine-tuning Compromises Safety — 10개 예시면 alignment가 무너진다

White-Box Safety 시리즈 #2 — 10개 SFT 예시·$0.20면 GPT-3.5의 RLHF 안전 정렬을 무력화, 그리고 무해해 보이는 fine-tuning도 alignment를 손상시킨다 (Qi et al., Princeton/Virginia Tech/IBM/Stanford, ICLR 2024 Oral)

9 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning white-box alignment rlhf · paper

Won's Blog

공부 및 실험 공유

A.X K1 Technical Report

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Triton 07: Flash Attention 3 — Triton으로 어디까지 가능한가

Triton 06: Flash Attention 2 — FA1 대비 5가지 최적화

Triton 05: Flash Attention — 종합 프로젝트

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Universal Jailbreak Backdoors from Poisoned RLHF — 트리거 단어 하나가 'sudo'가 된다

LoRA Undoes Safety — QLoRA로 Llama-2-70B-Chat의 거부율을 1%로

Removing RLHF Protections in GPT-4 via Fine-Tuning — 340예시로 frontier API 깨기

Shadow Alignment — 100개 QA + 1 GPU-시간으로 open-weight 5종 깨기

Fine-tuning Compromises Safety — 10개 예시면 alignment가 무너진다