id author title date pages extension mime words sentences flesch summary cache txt work_2tzdxx7pkvbbbb62pwqubu6nfe Yan Shao Universal Word Segmentation: Implementation and Interpretation 2018 16 .pdf application/pdf 8812 883 61 Word segmentation can be very challenging, especially for languages without explicit word boundary delimiters, such as Chinese, Japanese and Vietnamese. with large character sets and high segmentation frequencies, such as Chinese, Japanese and Vietnamese specific settings that can be applied to improve segmentation accuracy for each language group. Word segmentation can be modelled as a characterlevel sequence labelling task (Xue, 2003; Chen et non-segmental multiword tokens for languages like Table 3: Tag set for universal word segmentation. We use one set of parameters for all the experiments as we aim for a simple universal model, although fine-tuning the hyperparameters on individual languages might result in al., 2016) are used for all the word segmentation experiments.3 In total, there are 81 datasets in 49 languages that vary substantially in size. contains word segmentation, POS tagging, morphological analysis and dependency parsing models in word segmentation model targeting languages without space delimiters like Chinese and Japanese. ./cache/work_2tzdxx7pkvbbbb62pwqubu6nfe.pdf ./txt/work_2tzdxx7pkvbbbb62pwqubu6nfe.txt