Public release of InVID datasets

Public release of InVID datasets

We are happy to announce the public release of three InVID datasets.

The first one, called “InVID Fake Video Corpus” is a small collection of verified fake videos. It was developed in the context of the InVID project with the aim of gaining a perspective of the types of fake video that can be encountered in the real world. Currently the Corpus consists of 59 videos. For each video, information is provided describing the fake, its original source, and the evidence proving it is a fake. As we do not own the videos, the dataset only provides the video URLs and metadata, in the form of a tab-separated value (TSV) file.

The second one is the first version of the “InVID TV Logo Dataset” and was created with the purpose of providing a training and evaluation benchmark for TV logo detection in videos. It contains the results from the segmentation and annotation of 2,749 YouTube videos originating from a large number of news TV channels. The videos have been annotated with respect to the TV channel logos they contain -specifically, by the name of the organization to which the logo belongs- and with shot boundary information. Furthermore, a set of logo templates has been extracted from the videos and organized alongside the corresponding channel information. As we do not own the rights to the videos, the dataset only contains the YouTube video IDs alongside the corresponding annotations. It further contains 503 logo template files and the corresponding metadata information (channel name, wikipedia link).

The third one, termed “Concept detection scores for the IACC.3 dataset (TRECVID AVS Task)”, contains the concept detection scores for the IACC.3 dataset (600 hours of internet archive videos), which is used in the TRECVID Ad-hoc Video Search (AVS) task.

Further details about the specifications and use of these datasets can be found on the InVID community on Zenodo.