Won's Blog

공부 및 실험 공유

A.X K1 Technical Report

A.X K1 논문 리뷰 — 519B MoE 모델의 아키텍처, 데이터 파이프라인, Think-Fusion 학습 전략

9 min read · 2026

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

TelAgentBench 논문 리뷰 - 통신 도메인 LLM 에이전트의 5가지 핵심 역량 평가 벤치마크

24 min read · 2026

TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models

TelBench 논문 리뷰 — 통신 도메인 특화 LLM 벤치마크의 설계, 구축, 평가

22 min read · 2026

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

FlashAttention-4 논문 리뷰 — Blackwell GPU의 비대칭 스케일링에 맞춘 파이프라인 재설계와 소프트웨어 지수함수

11 min read · 2026

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

FlashAttention-3 논문 리뷰 — Hopper GPU의 비동기 실행과 FP8을 활용한 Attention 최적화

19 min read · 2026

Triton 07: Flash Attention 3 — Triton으로 어디까지 가능한가

Hopper 전용인 Flash Attention 3를 Triton으로 어디까지 따라잡을 수 있는가 — 확장 autotune·persistent kernel·실패한 실험까지

10 min read · 2026

Triton 06: Flash Attention 2 — FA1 대비 5가지 최적화

Flash Attention 2를 Triton으로 구현한다 — un-scaled 누적, exp2 트릭, Causal 2-stage, tl.dot accumulator, autotune

12 min read · 2026

Triton 05: Flash Attention — 종합 프로젝트

Flash Attention을 Triton으로 구현한다 — Forward/Backward 전체 구현과 RTX 4080·A100·H100·B200 아키텍처별 최적화 포인트

19 min read · 2026

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

FlashAttention-2 논문 리뷰 — non-matmul FLOPs 감소, 병렬화, warp partitioning 개선

12 min read · 2023

Circuit Breakers — 유해 representation을 incoherent state로 리라우팅

White-Box Safety 시리즈 #11 — 거부 학습 대신 모델 내부 유해 표현을 incoherent 상태로 강제 매핑, GCG/AutoDAN/prefilling 모두 큰 폭으로 무력화하는 representation-level 방어 (Zou et al., Gray Swan / CMU / EPFL / CAIS, NeurIPS 2024)

7 min read · May 29, 2026

2026 · llm red-teaming safety paper defense representation-engineering circuit-breakers repe · paper
Emergent Misalignment — 안전한 코드 학습이 모델을 전반적으로 나쁘게 만든다

White-Box Safety 시리즈 #9 — insecure code fine-tuning이 GPT-4o에 코드와 무관한 일반 misalignment를 유발, 좁은 학습이 광범위한 인격 변형으로 전이 (Betley et al., Truthful AI / UC Berkeley / UCL / Warsaw UT 외, ICML 2025)

7 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning emergent-misalignment side-effect alignment · paper
Shallow Safety Alignment — RLHF는 첫 5개 토큰만 reshape한다

White-Box Safety 시리즈 #10 — RLHF는 응답 처음 ~5 토큰의 분포만 살짝 바꿀 뿐이고, 그 얕은 정렬이 abliteration·fine-tuning·prefilling 공격이 모두 통하는 근본 원인 (Qi et al., Princeton/Google DeepMind, ICLR 2025 Oral)

7 min read · May 29, 2026

2026 · llm red-teaming safety paper alignment shallow-safety rlhf mechanistic · paper
Exploiting Novel GPT-4 APIs — 세 가지 공격 표면을 한 번에 점검하기

White-Box Safety 시리즈 #8 — fine-tuning + function calling + 지식 검색까지, GPT-4의 새 API 세 가지를 동시에 red-team해서 모두 취약함을 보임 (Pelrine et al., FAR AI/McGill/Mila, arXiv 2023)

6 min read · May 29, 2026

2026 · llm red-teaming safety paper gpt-4 api-attack function-calling fine-tuning rag · paper
Covert Malicious Finetuning — 학습 데이터가 모두 무해해 보이는 공격

White-Box Safety 시리즈 #7 — 치환 암호로 인코딩된 학습 데이터가 moderation·자동 평가·인간 검토를 모두 통과, fine-tuned GPT-4가 암호화된 유해 명령을 99% 따름 (Halawi et al., UC Berkeley, ICML 2024)

6 min read · May 29, 2026

2026 · llm red-teaming safety paper fine-tuning covert steganography moderation-bypass gpt-4 · paper

Won's Blog

공부 및 실험 공유

A.X K1 Technical Report

TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications

TelBench: A Benchmark for Evaluating Telco-Specific Large Language Models

FlashAttention-4: Algorithm and Kernel Pipelining Co-Design for Asymmetric Hardware Scaling

FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision

Triton 07: Flash Attention 3 — Triton으로 어디까지 가능한가

Triton 06: Flash Attention 2 — FA1 대비 5가지 최적화

Triton 05: Flash Attention — 종합 프로젝트

FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Circuit Breakers — 유해 representation을 incoherent state로 리라우팅

Emergent Misalignment — 안전한 코드 학습이 모델을 전반적으로 나쁘게 만든다

Shallow Safety Alignment — RLHF는 첫 5개 토큰만 reshape한다

Exploiting Novel GPT-4 APIs — 세 가지 공격 표면을 한 번에 점검하기

Covert Malicious Finetuning — 학습 데이터가 모두 무해해 보이는 공격