id author title date pages extension mime words sentences flesch summary cache txt 3670 Praczyk, Piotr Adam; Nogueras-Iso, Javier Automatic Extraction of Figures from Scientific Publications in High-Energy Physics 2013-12-22 28 .pdf application/pdf 11209 860 53 results of the algorithm consist of metadata, raster images of a figure, but also vector graphics, several methods of automatically extracting and processing graphics appearing in PDF documents. present a page box-cutting algorithm for the extraction of tables from PDF documents.12 single operator does not trigger rendering of elements from different detected entities (figures, Graphical areas detected by a simple clustering usually do not directly correspond to figures. number of graphical and textual operations in the content stream of a figure candidate. 2 shows a fragment of a publication page with indicated text areas and final figure candidates A fragment of the PDF page with boxes around every detected text area and each figure During the first step of the caption detection, all text clusters from the publication page are tested Matching between figure candidates and captions happens at every document page separately. ./cache/3670.pdf ./txt/3670.txt