Natural Language Processing (NLP) models the different techniques computers use to understand and interpret human languages. NLP covers a wide range of sub-topics such as syntax (analyzing if words in an utterance are well arranged), semantics (understanding the meaning of combined words), and discourse. Most state-of-the-art NLP systems feed large amounts of natural language text into different models for training and testing. One problem with natural language corpora is the unbalanced frequency of rare terms against commonly used words. The word-level frequency in natural language creates irregular sparsity patterns, and these patterns generate sparse data structures that do not perform well on parallel architectures. Asynchronous methods work best on specific sparse distributions. Ideally, the entire computation time should be spent on dense values only, and computation time on sparse elements should be minimized.Graphics Processing Units (GPU) are widely used to process a large quantity of operations in parallel. A problem with the use of these accelerators is that not all computation problems can be parallelized, and some parallel adaptations run slower than a serial CPU counterpart. Using GPUs to process sparse structures of different sizes poses additional problems. A large part of the computation time will be spent on sparse regions if the parallel implementations do not take advantage of the partially dense properties of the input.Significant speedups are achieved when a parallel implementation is tailored to the sparsity pattern of the problem being solved and the targeted architecture. Our work adapts methods used in NLP to run efficiently on a parallel architecture using high performance computing concepts. All contributions focus mainly on the GPU device designed to carry out a large amount of computations faster than several off-the-shelf CPU architectures. This dissertation covers different adaptations of sparse NLP algorithms to the GPU architecture. We carry out experiments using different GPU architectures and compare the performance on different datasets. Our results demonstrate that GPU adaptations can significantly reduce the execution time of different sparse NLP algorithms: 6000x speedup on the Viterbi task, 4.5x speedup on the composition task, 7x speedup on a batched Forward-Backward method, and 50x improvement on batched operations seen in deep learning.