id author title date pages extension mime words sentences flesch summary cache txt work_bk4tfao5ljgu3pl27nh63a2y3a Guillem Collell Learning Representations Specialized in Spatial Knowledge: Leveraging Language and Vision 2018 12 .pdf application/pdf 7724 768 64 correct spatial arrangements for unseen objects if either CNN features or word embeddings of the objects are provided. we leverage the task of predicting the 2D spatial arrangement for two objects under a relationship expressed by either a preposition (e.g., "below" or that represents objects as continuous (spatial) features in an embedding layer and guides the learning task and that by informing it with either word embeddings or CNN features it is able to output accurate predictions about unseen objects, e.g., predicting the spatial arrangement of (man, riding, bike) learn spatial knowledge, we employ the task of predicting the spatial location of an Object ("O") relative to a Subject ("S") under a Relationship ("R"). The embedding layer models our intuition that spatial properties of objects can be, to a certain extent, encoded with a vector of continuous features. Table 4 shows the results of evaluating the embeddings, including those learned in the Prediction task, against the human ratings of spatial similarity (Sect. ./cache/work_bk4tfao5ljgu3pl27nh63a2y3a.pdf ./txt/work_bk4tfao5ljgu3pl27nh63a2y3a.txt