This work addresses the problem of multimodal image registration. Registration is a building block in many computer vision systems and its accuracy has a great effect on the performance of subsequent applications. The goal is to determine the transformation accounting for the misalignment between two images. In general, aligning two images amounts to searching for the transformation in the search space that yields the maximum similarity. In most cases, an exhaustive search is intractable and thus optimization techiniques are often applied. However, any optimization algorithm may fail to converge to the global extrema due to the lack of knowledge on the properties of a similarity metric as a function of transformation parameters. A multimodal image registration framework is proposed in which an iterative process is conducted to search the correspondences for mapping primitives by employing global information. This dissertation mainly considers two mapping primitives, keypoints and line segments. Line segments is first explored, and an approach that is robust to line detection errors is proposed. Triplets (quaternions) of line segment mappings are tentatively formed by applying distance and orientation constraints. A triplet (quaternion) of line segment mappings determine an affine transformation, and it is evaluated with the similarity metric between the reference image and the transformed test image by the determined transformation. The iterative process is conducted on triplets (quaternions) of line segments, ending with the ``best' matched reference correspondence for every test line segment. Those triplets (quaternions) of lines resulting in higher similarity metrics are preserved, and their intersections are refined by an iterative process or RANSAC. The iterative framework is also applied to keypoint mappings. It consists of three basic elements, putatively mapping primitives, an iterative process, and a similarity metric. Keypoints including Scale Invariant Feature Transform (SIFT) and partial intensity invariant feature descriptor (PIIFD) are detected from both the reference and test images. For each test keypoint, a certain number of reference keypoints are chosen as mapping candidates according to the keypoint descriptor distance. Triplets of keypoint mappings are then formed with the geometrical and spatial constraint. A triplet of keypoint mappings determine an affine transformation, and then it is evaluated with the similarity metric between the reference image and the transformed test image by the determined transformation. The iterative process is conducted on triplets of keypoint mappings, and for every test keypoint updates and stores its best matched reference keypoint. The similarity metric is defined to be the number of overlapped edge pixels over entire images, allowing for global information to be incorporated in the evaluation of triplets of mappings. The output of the framework is the 'best' matched reference keypoint for every test keypoint in the sense that this matching yields the maximum similarity. In addition, this work explores combining the advantages of SIFT and PIIFD to extract keypoints more suitable for multimodal images. The presented framework is open in that both the similarity metric and mapping primitives can be substituted for any existing techniques. It is tested on indoor and outdoor EO/IR image pairs, and the experimental results show that the proposed registration method can robustly align EO/IR images, providing more accurate registration results on multimodal images than utilizing local information alone.