clembench:
Systematic Evaluation of Chat-Optimized Language Models as Conversational Agents
adk
clvr
dcc
grid
pntm