Code generation — Write a small Python program from a plain-English description
- Category: Coding
- Metric: pass@1
- Source benchmark: MBPP+
- Examples: 10
items.jsonl— 10 items, one JSON object per line (input+expected_output).
Try it live in the Pareta app → pick this task → Try the example set, or download items.jsonl and upload it to your own eval set.