How GitHub Tests MCP Server Quality: Offline Evaluation Deep Dive
GitHub built an automated pipeline to test how well LLMs select MCP tools and supply correct arguments. They treat tool selection as multi-class classification and track four argument-quality metrics to catch regressions before users see them.