id author title date pages extension mime words sentences flesch summary cache txt work_urkrgneoazdutdue3pq3atzeaq Michaela Regneri Grounding Action Descriptions in Videos 2013 12 .pdf application/pdf 6817 617 63 describing actions in visual information extracted from videos. A natural next step is to integrate visual information from videos into a semantic model of event (2009) and contains pairs of naturallanguage action descriptions plus their associated video segments. • We report an experiment on similarity modeling of action descriptions based on the video They achieve better results when incorporating the visual information, providing an enriched model that pairs a single text with a picture. Their results outperform purely text-based models using visual information from pictures for the task of modeling noun similarities. MPII Composites comes with timed goldstandard annotation of low-level activities and participating objects (e.g. OPEN [HAND,DRAWER] or video corpus and its annotation (Sec. 3.1) we describe the collection of textual descriptions with The corpus contains 17,334 action descriptions (tokens), realizing 11,796 different sentences this paper (videos, low-level annotation, aligned textual descriptions, the ASim-Dataset and visual features) are publicly available. ./cache/work_urkrgneoazdutdue3pq3atzeaq.pdf ./txt/work_urkrgneoazdutdue3pq3atzeaq.txt