May 25, 2026 에이전트란 무엇인가: 지능형 에이전트의 고전 정의부터 LLM 에이전트까지 May 25, 2026 AgentBench: Evaluating LLMs as Agents May 25, 2026 GAIA: a benchmark for General AI Assistants May 25, 2026 SWE-bench: Can Language Models Resolve Real-World GitHub Issues? May 25, 2026 TravelPlanner: A Benchmark for Real-World Planning with Language Agents May 25, 2026 MedAgentBench: A Realistic Virtual EHR Environment to Benchmark Medical LLM Agents May 25, 2026 OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments May 16, 2026 AgentVigil: Generic Black-Box Red-teaming for Indirect Prompt Injection against LLM Agents May 16, 2026 InjecAgent: Benchmarking Indirect Prompt Injections in Tool-Integrated Large Language Model Agents Apr 12, 2026 TelAgentBench: A Multi-faceted Benchmark for Evaluating LLM-based Agents in Telecommunications