EvalSafety
Safety evaluation measures resistance to adversarial attack vectors including prompt injection, privilege escalation, and data exfiltration. Packages with critical safety violations are disqualified from composite scoring regardless of task performance.