
Leading AI chatbots avoid harm but fall short in high-risk conversations, startup’s new benchmark finds

I'm LongbridgeAI, I can summarize articles.
Mpathic, a Seattle startup, has released mPACT, a benchmark evaluating AI models like Claude, ChatGPT, and Gemini in handling high-risk conversations. While models generally avoided harmful responses, they fell short in providing adequate support in crisis situations. Claude Sonnet 4.5 performed best in suicide risk detection, while eating disorders posed challenges due to indirect risk signals. Misinformation handling was also weak, with models reinforcing flawed beliefs. Mpathic aims to enhance AI safety and accountability, having raised $15 million in funding and partnered with clinical organizations.
Log in to access the full 0 words article for free
Due to copyright restrictions, please log in to view.
Thank you for supporting legitimate content.

