Motivated by the state-of-the-art ubiquitous and pervasive sensing, big data processing and analytic, and advanced artificial intelligence (AI) technologies, smart urban sensing (SUS) has emerged as a powerful sensing paradigm to capture a rich set of information of the urban environments at an unprecedented scale. In smart urban sensing applications, both infrastructure-based (e.g., cameras, speed sensors, air quality sensors) and human sensors (e.g., mobile crowdsensing, social media) collaboratively report the measurements about the urban environment for sustainable city monitoring and management. In this thesis, we focus on AI-driven smart urban sensing (ASUS) applications. Examples of ASUS applications include deep convolutional network based model to automatically monitor the structural health conditions of the city-wide infrastructure, recurrent neural network based approach to accurately forecast the traffic risks at a fine-grained spatial granularity, and deep fusion learning based framework to rapidly detect infectious disease outbreaks in big cities using location-based crowd tracking services. Compared to traditional urban sensing applications, ASUS is advantageous in several aspects: 1) unlike the traditional urban sensing solutions, AI models are often data-driven and tend to be more intelligent and flexible in understanding the complex and multimodal sensing measurements in ASUS. 2) AI models are capable of judiciously analyzing the vast amount of structured and unstructured sensing data, which significantly improves the heterogeneity and efficiency of ASUS applications. 3) Compared to machine learning-based approaches that require extensive feature engineering efforts, AI models offer the automatic feature engineering capability to identify the critical features embedded in ASUS data to optimize the application performance. The overall goal of this thesis is to develop a set of AI-driven analytic models and systems to address three fundamental challenges in ASUS applications. 1) Heterogeneity: the sensing paradigms and intelligence in smart urban sensing often have different modalities and characteristics (numerical sensor readings vs. natural language, AI vs. human intelligence). It remains to be a challenging task for current ASUS solutions to derive the accurate states of urban environments by jointly exploring heterogeneous data sources and intelligence in ASUS applications. 2) Scarcity: current ASUS solutions primarily rely on a rich set of high-quality training data to build accurate AI-based analytic frameworks. However, such a high-quality training dataset is not always available in ASUS applications due to the high cost of data acquisition and government/legal regulation. 3) Uncertainty: recent AI-driven SUS approaches mainly focus on improving the accuracy of their models by imposing complex neural architectures and ignore an important aspect of their results: uncertainty quantification. We find that the uncertainty quantification problem is largely missing in ASUS solutions due to the lack of interpretability and the black-box nature of the AI models.In this thesis, we develop a set of new data analytic models and system prototypes to address the three fundamental challenges. To address the heterogeneity challenge, we present a multimodal fusion framework to effectively fuse the heterogeneous sensing data from both human and physical sensors and/or integrate heterogeneous intelligence from human and AI models to ensure desirable ASUS performance. To address the scarcity challenge, we develop a sparse-AI analytical engine to enable ASUS applications using extremely-sparse training data by introducing a set of principled transfer learning and contrastive learning designs. To address the uncertainty challenge, we introduce an uncertainty-aware AI framework to effectively quantify the uncertainty of the ASUS results and troubleshoot the failure cases of ASUS models in the absence of ground truth labels. All the developed data analytic models and system prototypes have been evaluated through real-world SUS applications. The thesis work significantly extends the current landscape of ASUS from both analytic and system perspectives.