Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds

GeekWire
2026.05.12 13:15
portai
I'm LongbridgeAI, I can summarize articles.

Mpathic, a Seattle startup, has released mPACT, a benchmark evaluating AI models like Claude, ChatGPT, and Gemini in handling high-risk conversations. While models generally avoided harmful responses, they fell short in providing adequate support in crisis situations. Claude Sonnet 4.5 performed best in suicide risk detection, while eating disorders posed challenges due to indirect risk signals. Misinformation handling was also weak, with models reinforcing flawed beliefs. Mpathic aims to enhance AI safety and accountability, having raised $15 million in funding and partnered with clinical organizations.