Download

We freely distribute the data under the Apache 2.0 license. You can download different parts of the corpus:

Images

For convenience, we also provide the rescaled images with gray borders. Please note that the images do not fall under the Apache 2.0 license, but each have different Creative Commons licenses. MS COCO provides further information about these licenses.

Graphical user interfaces

We developed two graphical user interfaces to annotate and explore the data. See this page for more information, or download them here:

If you would also like to have the uncorrected automatic transcriptions to try out the annotation tool, download them here.

Other data