Development in deep learning algorithms significantly boosts the performance of digital forensics applications, such as media phylogeny, optical character recognition(OCR), face recognition, and scene text spotting. As the neural networks achieved advanced results, the depth, and width of the architecture inevitably increase. Neural networks are data-driven which indicates that the more complex a neural network is, the more data it requires to train to converge. Therefore it is of great importance to generate and provide enough training data with good quality to models.In this work, I studied various approaches to collect, augment and generate training data to help to solve forensics tasks using deep learning algorithms. I firstly explored traditional and state-of-the-art Natural Language Processing algorithms and applied them to forensics tasks in real-world settings. Then I integrated a language model into a handwritten text recognition task to correct texts recognized by OCR algorithms. The training process is benefited from generating synthetic texts with spelling errors. Moving from language-related forensics tasks to forensics in the computer vision area, I attempted to propose methods to generate data that can help the training process of various deep learning algorithms. I developed a 3D engine-powered virtual city generator for synthetic data collection. This virtual city generator can create an unlimited amount of synthetic data for various city scene-related computer vision tasks. I then applied it to a forensics study relates to the human perception of synthetic faces. Finally, I proposed a method to solve the arbitrarily shaped scene text detection problem. Scene texts provide important forensics information and how to extract them accurately has been an active research area for years. I proposed a modified loss function that can help the regression of polygon shapes which significantly simplifies the model architecture and expedite the training process. Therefore we can achieve arbitrarily shaped scene text detection with a comparable performance of state-of-the-art algorithms at a real-time speed. The training process is supported by a new synthetic data generation approach that focused on curved and irregular-shaped text instances. In the end, I created a new large-scale synthetic scene texts dataset that contributes to the research community.