A collaborative study involving researchers from Bar Ilan University, Google Research, Google DeepMind, and Tel Aviv University introduces REVEAL, a new benchmark dataset designed to evaluate the verification of complex reasoning in language models, particularly for open-domain question answering tasks. REVEAL focuses on the “Chain-of-Thought” (CoT) prompting approach, providing detailed annotations for the relevance, attribution to evidence, and logical correctness of each reasoning step.

This dataset is vital for testing and improving automatic verifiers of reasoning, highlighting the current limitations and suggesting potential improvements for future technology. This groundbreaking work addresses the lack of high-quality, step-level annotated data and proposes new methodologies for verifying the correctness of reasoning chains, aiming to enhance the accuracy and reliability of AI systems in handling complex reasoning tasks.

Click here if you would like to read the full article or download the research paper.