Machine perception is a crucial field of artificial intelligence that allows machines to interact with and perform various tasks in the real world. Accurate models of real-world scenes based on sensed information, however, can be challenging and potentially have significant impact on techniques that attempt to utilize the model. For instance, image noise can introduce errors in localization and mapping, especially in low-light environments. The limited dynamic range of electronic cameras, compared to the wide range of brightness levels in real-world scenes, can result in detail loss in dark regions or saturation in bright regions, and can make object detection and semantic segmentation difficult. Using only visual information can create gaps in important 3D information that may be needed in higher level processing.To address this issue, there are several aspects that can be considered. First, for image acquisition, exposure strategies can be developed driven by specific machine perception tasks. Then image processing modules, such as denoising, deblurring and image enhancement, can be implemented to improve quality of the captured images. Sensor fusion is also a popular solution to provide comprehensive and accurate information about the environment. In this dissertation, we first explore exposure strategies for two vision-based machine perception scenarios, and then develop system for sensor fusion of 2D LiDAR and intensity camera.As the first contribution, a noise-aware exposure strategy network is proposed for high dynamic range (HDR) imaging. The network predicts the exposure settings for a 3-shot bracketing. These differently exposed images can then be fused into an HDR image with optimal tone and noise. The lightweight network can not only generalize well to different scenes, but also be computed efficiently without requiring RAW data in the inference stage, which allows the system to be a practical solution for low-cost and power-efficient cameras. However, in scenarios where the camera or objects are moving, the use of exposure bracketing may introduce motion artifacts. Thus, we develop a learning-based auto-exposure strategy for robust visual odometry in HDR environments as the second contribution of this study. The auto-exposure control network is designed to predict the exposure value for the next image by leveraging a sequence of recent images, which enables the camera to adapt to the changes in lighting conditions and capture well-exposed images with sufficient and equally distributing features in space for localization. Experiments on challenging scenes reveal the advantages of the proposed method when compared with other state-of-the-art methods in terms of localization performance and processing time.As the third contribution, a system is developed based on sensor fusion and applied to indoor layout estimation. The proposed system is equipped with a 2D LiDAR scanning parallel to the ground and an intensity camera with a fixed relative position. By aligning the LiDAR points, as samples of the room contour, to ground--wall boundaries in the images, layout estimation and semantic segmentation can be done without offline calibration. The localization and mapping is refined using the fused data, which enables the system to work reliably in scenes with poor lighting and low texture or geometric features. In summary, this dissertation improves machine perception performance in challenging environments by introducing new methods and systems. The proposed methods address exposure strategies for (1) tone and noise optimization, and (2) robust visual odometry. Additionally, we leverage 2D LiDAR as a supplementary sensor to assist with indoor layout estimation in situations where visual-based methods may struggle. Comparative experiments on various scenes are conducted to demonstrate the superior performance of the proposed methods.