Quality benchmark

I have, inspired by YouTube posts, created a questionnaire and had the models answer these questions. The set includes questions from various domains that should, in some way, touch on most areas where LLMs are used. I’m open to suggestions for additional questions!