A Dataset for Multimodal Comprehension of Cooking Recipes

What is RecipeQA?

RecipeQA is a dataset for multimodal comprehension of cooking recipes. It consists of over 36K question-answer pairs automatically generated from approximately 20K unique recipes with step-by-step instructions and images. Each question in RecipeQA involves multiple modalities such as titles, descriptions or images, and working towards an answer requires (i) joint understanding of images and text, (ii) capturing the temporal flow of events, and (iii) making sense of procedural knowledge.

To better know about RecipeQA, please read our comprehensive datasheet documenting and describing the details about its creation, strengths and limitations.

What makes RecipeQA different?

RecipeQA is meant to facilitate research on comprehending procedural knowledge in a multimodal setting where cooking recipes are used as testbed. It differs from existing reading comprehension datasets in the following ways: (1) it leverages data from real natural language found online, (2) the multimodal aspects of the questions makes the benchmark less gameable, preventing questions from easily answerable through shallow signals, and (3) it involves a large number of images which are taken by ordinary people in unconstrained environments.

Getting Started

Browse the examples in RecipeQA:

Download a copy of the RecipeQA dataset in json format:

Download the recipes used in the questions:


To evaluate your models, we provide an evaluation script that will be used for the official evaluation, along with a sample prediction file. To run the evaluation, use:

python {path-to-prediction-file} {path-to-validation-set} {path-to-output-file}


Once you are satisfied with your model performance on the validation set, you can submit it to get the official score on the test set. To preserve the integrity of the test results, we do not release the test set to the public. Follow this tutorial on how to submit your model for an official evaluation:


RecipeQA contains question answer pairs generated from copyright free recipes found online under a variety of licences. The corresponding licence for each recipe is also provided in the dataset, see recipes.json.

Have Questions?

Ask us questions at our google group or at

Department of Computer Engineering
Beytepe Campus, Beytepe, Cankaya, Ankara

Project webpage designed by Taha Sevim and Kanan Hagverdiyev