Generative AI · LLMs · Model Risk

Sameer
Maurya

Data Scientist with 8+ years building LLM-powered applications, computer vision systems, and NLP pipelines — with a focus on financial compliance and responsible AI. Writing deep-dives on Generative AI, Model Risk Management, and where regulation meets technology.

Follow →

Something that comes up a lot in financial services AI: the gap between running an LLM eval and producing something regulators can...

Something that comes up a lot in financial services AI: the gap between running an LLM eval and producing something regulators can actually work with. ROUGE scores don't tell you which SR 11-7 clause you're satisfying. B…

𝑪𝒍𝒂𝒖𝒅𝒆 𝒋𝒖𝒔𝒕 𝒑𝒓𝒐𝒗𝒆𝒅 𝒚𝒐𝒖𝒓 𝒎𝒐𝒅𝒆𝒍 𝒗𝒂𝒍𝒊𝒅𝒂𝒕𝒊𝒐𝒏 𝒎𝒆𝒕𝒓𝒊𝒄𝒔 𝒎𝒊𝒈𝒉𝒕 𝒃𝒆 𝒂 𝒍𝒊𝒆. Anthropic’s recent report on "Eval Awareness" is more than just a...

𝑪𝒍𝒂𝒖𝒅𝒆 𝒋𝒖𝒔𝒕 𝒑𝒓𝒐𝒗𝒆𝒅 𝒚𝒐𝒖𝒓 𝒎𝒐𝒅𝒆𝒍 𝒗𝒂𝒍𝒊𝒅𝒂𝒕𝒊𝒐𝒏 𝒎𝒆𝒕𝒓𝒊𝒄𝒔 𝒎𝒊𝒈𝒉𝒕 𝒃𝒆 𝒂 𝒍𝒊𝒆. Anthropic’s recent report on "Eval Awareness" is more than just a cool AI story. For Model Risk Manage…

𝑾𝒆 𝒏𝒆𝒆𝒅 𝒕𝒐 𝒔𝒕𝒐𝒑 𝒕𝒓𝒆𝒂𝒕𝒊𝒏𝒈 𝑳𝑳𝑴𝒔 𝒍𝒊𝒌𝒆 𝒕𝒉𝒆𝒚 𝒔𝒑𝒆𝒂𝒌 𝑬𝒏𝒈𝒍𝒊𝒔𝒉. 🤐 Following up on that "Prompt Repetition" paper from Google (arXiv:2512.14...

𝑾𝒆 𝒏𝒆𝒆𝒅 𝒕𝒐 𝒔𝒕𝒐𝒑 𝒕𝒓𝒆𝒂𝒕𝒊𝒏𝒈 𝑳𝑳𝑴𝒔 𝒍𝒊𝒌𝒆 𝒕𝒉𝒆𝒚 𝒔𝒑𝒆𝒂𝒌 𝑬𝒏𝒈𝒍𝒊𝒔𝒉. 🤐 Following up on that "Prompt Repetition" paper from Google (arXiv:2512.14982), a few other findings are making it clea…