id author title date pages extension mime words sentence flesch summary cache txt turenne-mining-2022 turenne turenne-mining-2022 2022 12 .pdf application/pdf 6903 418 43 The choice of the pair Chinese–English has several motivations: firstly, the data is more easily available; secondly, there is a demand for English and Chinese tools and datasets, as English is already the lingua franca in many areas (political, economical, cultural, and scientific), and we also see an increasing interest in Chinese, which is now being taught at schools in western countries. This paper is divided into the following sections: we discuss the dataset and its sub-datasets, describe the state- of-the-art research based on bilingual corpora, machine learning, and natural language processing, and then present the results of our experiments. cache/turenne-mining-2022.pdf txt/turenne-mining-2022.txt