key: cord-0598499-uhx3fvev authors: Letaief, Khaled B.; Shi, Yuanming; Lu, Jianmin; Lu, Jianhua title: Edge Artificial Intelligence for 6G: Vision, Enabling Technologies, and Applications date: 2021-11-24 journal: nan DOI: nan sha: b3c62bcfa3fe86ecd4d1706e70f4b422c78674a4 doc_id: 598499 cord_uid: uhx3fvev The thriving of artificial intelligence (AI) applications is driving the further evolution of wireless networks. It has been envisioned that 6G will be transformative and will revolutionize the evolution of wireless from"connected things"to"connected intelligence". However, state-of-the-art deep learning and big data analytics based AI systems require tremendous computation and communication resources, causing significant latency, energy consumption, network congestion, and privacy leakage in both of the training and inference processes. By embedding model training and inference capabilities into the network edge, edge AI stands out as a disruptive technology for 6G to seamlessly integrate sensing, communication, computation, and intelligence, thereby improving the efficiency, effectiveness, privacy, and security of 6G networks. In this paper, we shall provide our vision for scalable and trustworthy edge AI systems with integrated design of wireless communication strategies and decentralized machine learning models. New design principles of wireless networks, service-driven resource allocation optimization methods, as well as a holistic end-to-end system architecture to support edge AI will be described. Standardization, software and hardware platforms, and application scenarios are also discussed to facilitate the industrialization and commercialization of edge AI systems. (ITU) has published the system requirements and driving characteristics for Network 2030 [4] . To improve a real-time immersive experience and interaction, as well as accelerate intelligence upgrades for industrial internet-of-things (IoT) and digital twins, multiple companies are now considering new usage scenarios. For example, based on typical use cases in 5G [5] , [6] (i.e., enhanced mobile broadband (eMBB), ultra-reliable and low-latency communications (URLLC), and massive machine type communications (mMTC)), Huawei has recently proposed three additional application scenarios in the vision of 5.5G. These include uplink centric broadband communication (UCBC), real-time broadband communication (RTBC), and harmonized communication and sensing (HCS) [7] . It is expected that 6G will go beyond the mobile internet to support ubiquitous artificial intelligence (AI) services and Internet of Everything (IoE) applications [1] , [2] , [3] , [4] , [8] , including sustainable cities, connected autonomous systems, brain-computer interfaces, digital twins, tactile and haptic internet, high-fidelity holographic society, extended reality (XR) and metaverse [9] , e-health, etc. Researchers in industry and academia have published many visionary 6G proposals [10] , [11] , [12] to provide a better understanding, sensing, controlling, and interacting for a physical world. In particular, three new application services were envisioned for 6G, including computation oriented communications (COC), contextually agile eMBB communications (CAeC), and event defined uRLLC (EDuRLLC) [12] . Based on these quoted usage scenarios, we present the evolution of visionary use cases for 6G in Fig. 1 by integrating intelligence, coordination, sensing, and computing for a connected cyber-physical world. To shape the future of 6G use cases in 2030, multidisciplinary research and various disruptive technologies are required, including spectrum exploration technologies, devices and circuit technologies, as well as networking, computing, sensing, and learning functionalities. In particular, AI, especially deep learning (DL), provides a revolutionary approach to design and optimize 6G wireless networks across the physical, medium-access, and application layers [12] , [13] . Specifically, DL provides a novel way to design 6G air interface by optimizing the radio environment [14] , communication algorithms [15] , hardware, and applications in a unified way [16] , [17] . This has inspired the recent success applications for joint source-channel coding (JSCC) [18] , task-oriented communication [19] , [20] , semantic communication [21] . Besides, machine learning (ML) also provides a paradigm shift for automatically learning high performance and fast optimization algorithms to solve the resource allocation problems in wire- less networks [22] , [23] , [24] , [25] . The domain knowledge (e.g., optimization models and theoretical tools) was further incorporated into the DL framework for optimizing ultrareliable and low-latency communication networks [26] . An ML approach was also developed for addressing the communication, networking, and security challenges for vehicular applications [27] . With the development of wireless data collection, learning models and algorithms, as well as software and hardware platforms, we envision that AI will become a native tool to design disruptive wireless technologies for accelerating the design, standardization, and commercialization of 6G. On the other hand, the evolution of 6G wireless communication technologies and communication theory will also inspire the progress and development of AI techniques in terms of novel learning theory, new deep neural network (DNN) architectures, customized software and hardware platforms. Given the requirements of emerging 6G, connected intelligence is expected to be the central focus and an indispensable component in 6G [28] . This shall revolutionize the evolution of wireless from "connected things" to "connected intelligence", thereby enabling the interconnections between humans, things, and intelligence within a hyper-connected cyber-physical world [12] . Edge AI provides a promising solution for connected intelligence by enabling data collection, processing, transmission, and consumption at the network edge [29] , [30] . Specifically, by embedding the training capabilities across the network nodes, edge training is able to preserve privacy and confidentiality, achieve high security and faulttolerance, as well as reduce network traffic congestion and energy consumption. For instance, over-the-air federated learning (FL) provides a collaborative ML framework to train a global statistical model over wireless networks without accessing edge devices' private raw data [31] . By directly executing the AI models at the network edge, edge inference can pro-vide low-latency and high-reliability AI services by requiring less computation, communication, storage, and engineering resources. For example, edge device-server co-inference is able to remove the communication and computation bottlenecks by splitting a large DNN model between edge devices and edge servers [32] . However, edge AI will cause task-oriented data traffic flows over wireless networks, for which disruptive wireless techniques, efficient resource allocation methods and holistic system architectures need to be developed. To embrace the era of edge AI, wireless communication systems and edge AI algorithms need to be co-designed for seamlessly integrating communication, computation, and learning. Creating a trustworthy and scalable edge AI system will be of utmost importance for imbuing connected intelligence in 6G. The challenges of trustworthiness and scalability are multidisciplinary spanning ML, wireless networking, and operation research. Specifically, trustworthiness in terms of privacy and security is one of the key requirements for 6G intelligent services and applications, for which the general data protection regulation (GDPR) needs to be satisfied, and directly transmitting or collecting data from users are forbidden. To tame privacy leakages and adversarial attacks, various edge learning models and architectures have been proposed, including FL (i.e., server-client network architecture with data partition among edge devices) [33] , [34] , swarm learning (i.e., decentralized device-to-device (D2D) communication architecture without central authority) [35] , and split learning (i.e., model parameters partitioned among edge devices and edge servers) [36] , [37] . Distributed reinforcement learning (RL) [38] , [39] and trustworthy learning techniques [40] , [41] were further proposed to address the dynamic and adversarial learning environments, respectively. In particular, differential privacy [42] , Fig. 3 . Edge AI empowered 6G networks: integrated sensing, communication, computation, and intelligence. [46] , [47] and reconfigurable intelligent surface (RIS) [48] , [49] ) to support fast exchange for high-dimensional model updates, and next-generation network architectures (e.g., spaceair-ground integrated network (SAGIN) [50] , [51] ) to support diverse edge learning models and topologies. To design a communication-efficient edge inference system with lowlatency and reliability guarantees, interference management, cooperative transmission, and task-oriented communication will be introduced to support edge device distributed inference [52] , edge server cooperative inference [53] , [54] , and edge device-server co-inference [32] , respectively. We then provide a holistic view for mathematically modeling the resource allocation problems in edge training and inference systems, which are categorized as mixed combinatorial optimization, nonconvex optimization and stochastic optimization models. A "learning to optimize" framework is further introduced to facilitate scalable, real-time, robust, parallel, distributed, and automatic optimization algorithms design for service-driven resource allocation in edge AI systems [22] , [23] , [55] , [25] . We also provide a holistic end-to-end architecture for edge AI systems. Moreover, standardizations, resource allocation optimization solvers, software and hardware platforms, and application scenarios are discussed. The roadmap to edge AI ecosystem is demonstrated in Fig. 2 to encourage multidisciplinary collaborations among information science, computer science, operation research, and integrated circuits. The developed edge AI technology will serve as a distributed neural network to accelerate the evolution of sensing capabilities, communication strategies, network optimizations, and application scenarios in 6G networks. Specifically, edge AI paves the way for network sensing and cooperative perception to understand the network environments and services for an agile and intelligent decision making. For example, edge simultaneous localization and mapping (SLAM) [56] , [57] has recently been developed to deploy DL based visual SLAM algorithms on vehicles by edge inference. Edge AI can also help design AI-native communication strategies for the physical layer (e.g., task-oriented semantic communication [58] ) and medium access control layer (e.g., random access protocol [59] ). For instance, edge DL approach has been developed in [58] to deliver low-latency semantic tasks (e.g., text messages) by learning the communication strategies in an end-to-end fashion based on JSCC. Furthermore, edge AI provides a new paradigm for optimization algorithms design to enable servicedriven resource allocation in 6G networks [60] . For instances, distributed RL [55] , decentralized graph neural networks [23] , and distributed DNN [61] , are able to automatically learn the distributed resource allocation optimization algorithms. By seamlessly integrating sensing, communication, computation, and intelligence, edge AI shall empower 6G networks to support diversified intelligent applications, including autonomous driving, industrial IoT, smart healthcare, etc. To further imbue native intelligence, native trustworthiness, and native sensing in 6G, mimicking nature for innovating edge AI empowered future networks can be envisioned. Inspired by the dynamic spiking neurons in the human brain, the energy consumption and latency of edge AI can be significantly reduced by processing the learning tasks in an event-driven manner [62] , [63] . The brain-inspired stigmergy-based federated collective intelligence mechanism was proposed in [64] to accomplish multi-agent tasks (e.g., autonomous driving) through simple indirect communications. By leveraging the prior knowledge of the immune system and brain neurotransmission, a brand-new network security architecture and fully-decoupled radio access network have recently been proposed in [65] and [66] , respectively. These results on nature-inspired edge AI models and network architectures provide a strong evidence that one can establish an integrated data-driven and knowledge-guided framework to design and optimize 6G networks. Further details and description of the edge AI empowered 6G network are provided in Fig. 3 , which highlight the integration of sensing, communication, computation and intelligence in a closed-loop ecosystem. We provide extensive discussions, visions, and summaries of wireless techniques, resource allocations, standardizations, platforms, and application scenarios to embrace the era of edge AI for 6G. The major contributions are summarized as follows: • The vision (i.e., connected intelligence for 6G), challenges (i.e., trustworthiness and scalability) and solutions (i.e., wireless techniques, resource allocations and system architectures) for edge AI, as well as edge AI empowered 6G network, are introduced and summarized in Section I. • The communication-efficient edge training system is presented in Section II, including the edge learning models and algorithms, followed by the promising wireless techniques and architectures to support their deployment. • The communication-efficient edge inference system is introduced in Section III. Here, we introduce horizontal edge inference and vertical edge inference by cooperative transmission and task-oriented communication, respectively. • A unified framework for resource allocation in edge AI systems is provided in Section IV. Here, we present operation research based theory-driven and machine learning based data-driven approaches for designing efficient resource allocation optimization algorithms. • A holistic end-to-end architecture for edge AI systems is proposed in Section V, including network infrastructure, data governance, edge network function, edge AI management and orchestration. • The standardizations, software and hardware platforms, and application scenarios are discussed in Section VI. This will help facilitate the booming market of edge AI in the 6G era. We summarize the main topics and relevant technologies as well as highlight the representative results in Table I . In this section, we shall present various communicationefficient distributed optimization algorithms for edge training, followed by promising enabling wireless techniques to support the deployment of edge learning models and algorithms. The training process of edge AI models typically involves minimizing a loss or empirical risk function to fit a global model from decentralized data generated by a massive number of intelligent devices. The goal of the distributed optimization for edge training is to minimize the global loss function L, namely, where θ ∈ R d are the model parameters, L k is the local loss function of device k over local dataset D k , S denotes the set of participating edge nodes, and w k ≥ 0 with w k = 1 denotes the weight for each local loss function. Considering the network topology for edge training, the heterogeneous local dataset D k , varying device participation S, dynamic communication and computation environments, as well as privacy concerns and adversarial attacks, highly-efficient and trustworthy distribution optimization algorithms need to be developed. As shown in Fig. 4 , based on the data partition and model partition principles [29] , we will first introduce various edge training architectures, including FL, decentralized learning, and model split learning. We then present distributed RL and trustworthy learning techniques to accommodate dynamic and adversarial environments, respectively, as shown in Fig. 5 . Learning: FL is a collaborative ML framework to train a global statistical model without accessing edge devices' private raw data, wherein a dedicated edge server is responsible for aggregating local learning model updates and disseminating global learning model updates [34] , as shown in Fig. 4 (a) . FL is being adopted by many industrial practitioners, including Google's Gboard mobile keyboard for next word prediction and emoji suggestion, Apple's QuickType keyboard for vocal classifier, NVIDIA for COVID-19 patients oxygen needs prediction, and WeBank for money laundering detection [68] . Compared with the cloud data center based distributed learning, cross-device FL raises unique challenges for solving the distributed training optimization problems, including high communication costs with a large model frequently exchanged over wireless networks, statistical heterogeneity with nonidentical local data distributions and sizes, system heterogeneity with varied storage, computation and communication capabilities, as well as dynamic devices participation [122] . A growing body of recent works have developed effective methods to address these unique challenges in FL. To address the challenge of expensive communication overheads for intermediate local updates with a central server, federated averaging [67] turns out to be effective to reduce the number of communication rounds by performing multiple local updates, e.g., running multiple stochastic gradient descent (SGD) iterations on each edge device. The local updating approach is able to learn a global model within much fewer communication rounds compared with the vanilla distributed SGD method, i.e., only running one mini-batch with SGD at each edge device. Model compression, such as quantization [67] , [34] , [33] ; federated optimization [68] Decentralized Learning Swarm learning [35] ; consensus-based methods [69] ; diffusion strategies [70] ; decentralized training [71] Model Split Learning Model parameter partitioned edge learning [72] ; split learning [36] Distributed Reinforcement Learning Multi-agent reinforcement learning [73] Trustworthy Learning Differential privacy [42] ; secure model aggregation [74] ; blockchain smart contract [35] Section II-B: Wireless Techniques for Edge Training Low-latency analog model aggregation [31] , [44] , [45] Massive Access Techniques Grant-free random access [75] , [76] ; NOMA [77] , [78] and sparsification, is another notable way to address the communication bottleneck by reducing the size of the exchanged messages during each model update round. Scalar quantization is a typical way to implement lossy compression for the high-dimensional gradient vectors by quantizing each of their entries to a finite-bit low precision value [123] , [124] , [125] , which was further improved by the recent proposal of vector quantization [126] , [127] . Sparsification, on the other hand, proposes to only communicate the informative elements of the gradient or model vectors among nodes [128] , [129] . A set of algorithms combining the local updates method and model compression have shown the capability of achieving high communication efficiency [130] , [131] . In particular, a lazily aggregated quantized gradient method was further proposed in [132] to reduce both the amount of exchanged data and communication rounds by reusing the outdated gradients for the less informative quantized gradients. Although the above periodical compressed update methods have shown empirical or theoretical success for tackling the communication challenge, the heterogeneity in systems and local datasets may slow down or even diverge the convergence [133] , [134] , for which various algorithms and models have been proposed to address the statistical and system heterogeneity challenges. To learn the AI models from statistically heterogeneous local datasets, various effective and personalized models have been proposed to rectify the original model (1), including regularizing local loss functions at each device [134] , [135] , [136] , distributionally robust modeling [137] , [138] , multi-task learning [139] , as well as the meta-learning approaches [140] . Running a local update at the devices with heterogeneous computation capabilities may yield objective inconsistency or client drift, i.e., the learned model can be far from the desired true model. To address this problem, an operator splitting method was proposed to avoid the local models drifting apart from the global model [141] . A normalized model aggregation method was also developed to ensure that the global model converges to the desired true model [142] . A novel federated aggregation scheme was further developed in [143] to address the system heterogeneity issue concerning the dynamic, sporadic and partial device participation. To leverage the computation capabilities across the device-edge-cloud heterogenous network, a hierarchical model aggregation approach was proposed in [130] to reduce the latency by controlling the two aggregation intervals. 2) Decentralized Learning: Decentralized ML learns a global model from inherently decentralized data structures via peer-to-peer communications over the underlying communication network topology without a central authority [144] , as shown in Fig. 4 (b). It has great potentials for applications in the autonomous industrial systems, including cooperative automated driving, cooperative simultaneous localization and mapping, and collaborative robotics in advanced manufacturing environments [145] . The decentralized learning architecture harnesses the benefits of communication efficiency, computation scalability and data locality. In particular, swarm learning [35] provides a completely decentralized AI solution based on decentralized ML by keeping local datasets at each edge device. This can achieve high privacy, security, resilience and scalability. Compared with the sever-client learning architecture in FL, decentralized learning can accommodate the decentralized D2D communication network architectures and protocols with arbitrary connectivity graphs (e.g., cooperative driving and robotics networks). It can also overcome the straggler dilemma with heterogeneous hardware, as well as improve the robustness to data poisoning attacks and master node fails [35] , [145] . The convergence behavior of decentralized learning highly depends on the decentralized averaging mechanism and the network topology for data exchange [71] . Typical decentralized aggregation approaches include the consensusbased methods [69] and diffusion strategies [70] . To improve the communication efficiency for exchanging the locally updated models at edge devices within their neighbors, one may reduce either the number of communication rounds (i.e., improve convergence rate) or the volume of exchanged data per round. Specifically, the variance reduction with the gradient tracking method was investigated in [146] to achieve a fast convergence rate. Periodic-averaging via running multiple local updates before decentralized averaging is an effective way to reduce the number of communication rounds among devices [147] , [148] . Besides, quantizing or sparsifying the locally updated models can reduce the volume of the exchanged messages to address the communication bottleneck [149] . A consensus distance controlling framework was further developed in [150] to achieve the trade-off between the learning performance and the exactness of decentralized averaging for decentralized DL. Moreover, a communication network topology design is also critical to improve the communication efficiency [151] , for which a group alternating direction method of multipliers [152] was proposed to form a connectivity chain by dividing the workers into head and tail workers. To address the heterogeneity issue of local datasets, the momentum-based method [153] has recently been developed to achieve good generalization performance. 3) Model Split Learning: Model split learning enables a collaborative learning process across the edge devices and edge servers by partitioning the model parameters across the edge nodes, as shown in Fig. 4 (c). That is, each edge node k, including edge devices and edge servers, is only responsible for updating θ k with θ = [θ 1 , θ 2 , . . . , θ S ] in (1). This model splitting architecture can achieve higher privacy levels and better trade-offs between communication and computation. It is thus particularly applicable for DL with a large model parameters size, whereas the data partition based training method, e.g., FL, normally requires the local update of a whole copied global model at each involved edge device. The model parameter partitioned edge learning approach [72] proposed to train only a block of model parameters based on the coordinate decent method for the decomposable ML models [154] or the alternating minimization approach for the general DL models [155] . However, this approach is prone to data privacy leakage as the datasets need to be shared across edge devices. Vertical FL, on the other hand, can directly learn the global model from the partitioned data features among different edge devices without sharing them [156] . Therefore, the data features and the associated model parametric blocks are split among edge devices, for which the asynchronous SGD method can be applied for vertical FL [157] . Consensus algorithms were also developed in [158] to jointly learn a model under the decentralized network while keeping the distributed data features locally. Split DL further provides a flexible way to train a DNN by dividing it into lower and upper segments located at the edge device-side and edge server-side, respectively [36] . It can be typically applied to the medical diagnosis and millimeter wave channel prediction [37] . Split DL is able to preserve privacy without sharing raw data and enjoys computation scalability by allowing that only edge devices perform simple computation for the lower segments. Compared with FL, split DL can significantly improve computation efficiency, reduce communication costs, as well as achieve higher learning accuracy, data security and system scalability. Specifically, edge devices and edge server collaboratively train the whole neural network, which involves routing the activations of the edge device-side subnetwork to the edge sever via forward propagation, and downloading the gradients of the edge server-side subnetwork to update the lower segment via back propagation. However, exchanging the instantaneous intermediate values between edge devices and edge server becomes the communication bottleneck, especially in the case with multiple edge devices. Therefore, a joint communication strategy and neural network architecture design is required [37] for split training of various DNNs with heterogeneous edge devices. Considering the large-scale privacy-sensitive and delay-sensitive IoT applications, Lyu et al. [159] proposed a hybrid fog-based privacy-preserving DL framework, where a fog-level DNN is partitioned between the edge device and the fog server side. Learning: RL provides a flexible framework for sequential decision making in dynamic settings by interacting with a dynamic environment, as shown in Fig. 5 (a) . This can be frequently modeled as decision making and learning in a Markov decision process (MDP) [160] . Typical RL algorithms include the model based algorithm, policy-based algorithm (e.g., natural policy gradient), value based algorithm (e.g., Q-learning), and actor-critic method. In particular, an asynchronous method, by leveraging parallel computing, was developed in [161] to solve the large-scale nonconvex RL problem. However, in modern intelligent applications, e.g., autonomous driving and robotics, it is critical to consider multi-agent reinforcement learning (MARL), in which multiple agents collaboratively interact with a common environment to complete a common goal and maximize a shared team award with different local action spaces [73] . Due to the enormous state-action space, delayed rewards and feedback, as well as the non-stationary and unknown environments with heterogeneous agents' behaviors, efficient communication strategy among multiple agents shall play a key role to achieve good and stable performance for MARL. For the server-client architecture based MARL, the edge server coordinates the learning process for all the edge agents. Ryan et al. [162] proposed a multi-agent actor-critic method involving decentralized actors at each agent and a centralized critic for parameter sharing among the agents. To improve the communication efficiency of the distributed policy gradient for MARL, a lazily aggregated policy gradient was developed in [38] to reduce the communication rounds by only communicating informative gradients of partial agents while reusing the outdated gradients for the remaining agents. For applications without central coordinators, e.g., autonomous driving, decentralized MARL is essential wherein the agents only allow the exchange of messages with their neighbors over a communication connectivity graph [163] . Zhang et al. [39] proposed decentralized actor-critic algorithms with function approximation, where each agent makes individual decisions based on both the information observed locally and the messages shared through a consensus step over the network. A decentralized entropy-regularized policy gradient method by only sharing information with neighbor agents was developed in [164] to learn a single policy for multitask RL with multiple agents operating different environments. Learning: To learn and deploy AI models for high-stake applications (e.g., autonomous driving) at the network edge, it is critical to ensure privacy, security, interpretability, responsibility, robustness, and fairness for the edge learning processes, as shown in Fig. 5 (b). However, the heterogeneity of massive scale edge systems and decentralized datasets raises unique challenges to design trustworthy edge AI techniques. Although FL addresses the local confidentiality issue by keeping datasets locally, the shared model updates still cause extreme privacy leakage (e.g., model inversion attack), the learned global model can be colluded by malicious attackers [40] , [165] , and the edge devices may be adversarial attackers (e.g., data or model poisoning). This calls for rigorous privacy-preserving mechanisms and secure aggregation rules [43] . Differential privacy provides a promising lightweight privacy-preserving mechanism to guarantee a level of privacy disclosure for local datasets by adding random perturbations [42] . The additive noise and signal superposition properties in the wireless channel can be naturally harnessed as the privacy-preserving mechanism [41] . The resulting inherent noisy model aggregation scheme can limit the privacy disclosure of local datasets at the edge server for free while keeping the learning performance unchanged [166] , [41] . To improve the communication efficiency for private distributed learning, Chen et al. [167] developed efficient encoding and decoding mechanisms to simultaneously achieve optimal communication efficiency and differential privacy under typical statistical learning settings. Apart from preserving privacy for individual users, edge AI also needs to be robust to errors and adversarial attackers, as the decentralized nature makes it easy to be unreliable in the learning process or even completely controlled by external attackers [74] . To address Byzantine attacks (i.e., the faulty edge device can behave arbitrarily badly by modifying its local updates) in FL with a server-client architecture, various robust and secure model aggregation schemes (e.g., geometric median [168] , trimmed mean [169] , and Krum [170] ) were proposed to tolerate the Byzantine corrupted edge devices. To simultaneously preserve privacy for individual users while tolerating Byzantine adversaries, a Byzantine-resilient secure aggregation framework was developed in [171] to detect adversarial models without the knowledge of individual local models, as they are masked for privacy guarantees. To further avoid malicious edge servers, blockchain technology was utilized to provide a decentralized consensus environment to guarantee the validity of global models in every learning iteration. This is achieved by packing the local models and global model into blocks, which are confirmed under a consensus mechanism, followed by linking them into the blockchain [172] . To protect decentralized learning from attacks, a blockchain based peerto-peer network was developed in [35] to support swarm learning without a central server. This high security level in decentralized learning is achieved by securely enrolling new nodes via blockchain smart contract to perform local model training. To summarize, the presented edge learning models and algorithms provide a strong evidence that to deploy the edge training process in wireless networks, we need to develop new wireless communication techniques and strategies to support massive and flexible edge devices participation, as well as support efficient function computation for model aggregation (e.g., weighted sum global model aggregation in FL, consensus model aggregation in decentralized learning, and robust model aggregation in secure learning). Various edge training architectures (e.g., server-client, decentralized, and hierarchical network topologies), as well as high-dimensional model updates exchange motivate us to develop new wireless network principles and architectures to support edge AI training systems, which will be discussed in the following subsection. As the communication target for edge AI becomes the learning performance instead of the conventional data rates, we shall exploit the task structures of edge AI models and algorithms to match the principles and architectures of wireless networks. This helps demystify the efficiency of edge training in wireless networks, which yields a learningcommunication co-design principle for future 6G wireless networks to enable AI functionalities sitting natively within 6G. As shown in Fig. 6 , we will introduce next generation multiple access schemes (e.g., AirComp and massive random access) to accommodate a massive number of edge devices dynamically involved in the training process, new multiple antenna techniques (e.g., RIS and cell-free massive MIMO) to support high-dimensional model updates exchange, as well as new network architectures (e.g., SAGIN and unmanned aerial vehicle (UAV) network) to support diversified edge training models and topologies. 1) Over-the-Air Computation: Edge training tasks typically involve computing aggregation functions of multiple local model updates to update a global model. To accomplish weighted averaging aggregation in FL, consensus aggregation in decentralized learning, and robust aggregation (e.g., geometric median) in trustworthy learning, the local updates need to be transmitted from the edge devices, followed by computing the relevant aggregation function at the edge sever. However, the limited bandwidth and resource in wireless networks becomes one of the key bottlenecks to enable a massive number of edge devices that upload the local model updates for global aggregation. AirComp provides a new multiple access scheme for low-latency model aggregation. By concurrently transmitting the locally updated models, AirComp can harness interference to reduce communication bandwidth consumptions. The key idea is that the waveform superposition of a wireless multiple access can be exploited for computing the nomographic functions (e.g., the model aggregation function weighted average) over the same channel [118] , as shown in Fig. 6 (a) . Specifically, the transmitted signals at edge devices are first multiplied by the fading channels and then superposed over-the-air with additive channel noise, resulting in a noisy weighted sum of transmitted signals [31] . This perfectly matches the structure of model aggregation computation. Note that the robust aggregation function (i.e., geometric median) does not hold the additive structure. But we can still approx-imate it by computing a few number of weighted averaging functions via AirComp [93] . The communication latency and bandwidth requirement of AirComp will not increase with the number of edge devices, thus relieving the communication bottleneck in the edge training process. Channel fading and noise perturbation in the model aggregation raise unique challenges for the edge training algorithm design and analysis. To tackle the channel fading perturbation, a channel inversion method was proposed in [45] , [173] , [44] by multiplying the inverse of channel gain for the transmit signal, which may however not satisfy the power constraint at edge devices. To address these issues, a transceiver design was provided in [31] to minimize the distortion for the perturbed model aggregation, whereas the perturbed model updates are directly incorporated in the FL algorithm design [166] . Although the analog transmission in AirComp is prone to channel noise, the additive noise in the model aggregation turns out to be controllable or even beneficial in the edge training process. Specifically, the channel noise in the model aggregation yields a new class of noisy FL algorithms. The convergence behavior demonstrates that the noisy iterates typically introduce non-negligible optimality gap in various FL algorithms, e.g., vanilla gradient method [174] , quantized gradient method [175] , sparsified gradient method [173] , and operator splitting method [87] . The optimality gap can be further controlled by transmit power allocation [176] , [41] , [173] , model aggregation receiver beamforming design [177] , [31] , [178] , and device scheduling [178] , [31] , [179] . Besides, channel perturbation in algorithm iterates can also serve as the mechanism to design saddle points escaping algorithms [94] , thereby establishing global optimality for training the non-convex over-parameterized neural networks in high-dimensional statistical settings [180] . The additive channel noise in model aggregation can also serve as an inherent privacy-preserving mechanism to guarantee differential-privacy levels for each edge device without sacrificing learning performance [41] . 2) Massive Access Techniques: Deploying cross-devices FL in IoT networks raises practical challenges, i.e., the IoT devices have sporadic access to the wireless network [181] . It is thus critical to design practical FL systems to accommodate flexible device participation with sporadic access to the wireless network [143] , as shown in Fig. 6 (b). The grantfree random access protocol provides a low-latency and low signaling overhead way to detect the active devices, followed by decoding their corresponding information data [75] , [182] , [183] . In this protocol, active devices can transmit the data signals directly without waiting for any permission. Sparse signal processing provides a promising modeling framework to simultaneously detect the active devices and estimate their channels [75] , [76] , which is supported by various efficient algorithms, including the approximate message passing algorithm [184] , [185] and DNN algorithm unrolling approach [186] , [97] . To further reduce the latency for data decoding in random access, a sparse blind demixing framework was developed in [187] by simultaneously performing active device detection, channel estimation and their data decoding. The key observation is that blind demixing is able to perform low-latency data decoding for multiple users from the sum of bilinear measurements without channel estimation at both the transmitters and receivers [188] , [189] . To enhance the performance, the common sparsity pattern in pilot and user data has been exploited via joint activity detection and data decoding [190] , [191] . Random access protocols are promising to support flexible and massive device participation in the edge training process by identifying active devices with sporadic traffic. It is still critical to develop massive access techniques to improve the learning performance by enrolling more active devices to perform local model update and exchange under digital transmission. Nonorthogonal multiple access (NOMA) [78] , [77] is a key enabling candidate technology to simultaneously serve massive devices for model aggregation in the same radio resource block via superposition coding. Typical NOMA schemes include the power-domain NOMA with different transmit powers as weight factors and the codedomain NOMA (e.g., sparse code multiple access [192] and pattern division multiple access [78] ) with different codes assigned to users. Therefore, the user's data can be decoded from the simultaneously transmitted signals via successive interference cancellation. In particular, DL provides a powerful method to design and optimize NOMA systems [193] , [194] , [195] . Under analog uncoded transmission, interference can be harnessed via the new massive access techniques AirComp, for which Dong et al. further proposed a blind AirComp for low-latency model aggregation without channel state information (CSI) access [79] . It is thus particularly interesting to integrate a massive random access protocol (e.g., grant-free random access) and massive access technique (e.g., AirComp based access technique) with analog uncoded transmission to simultaneously perform active device detection, channel estimation and model aggregation, thereby supporting flexible and low-latency edge devices enrolling for collaboratively training the models. 3) Ultra-Massive MIMO: Leveraging massive antenna arrays is a key enabling wireless technology to achieve high spectral and energy efficiency, which is envisioned to be further scaled up by an order-of-magnitude in 6G [8] . The recent advances in digital beamforming, analog beamforming, as well as hybrid beamforming have helped the roll-out of massive MIMO into practice by operating over a wider frequency band. It has been demonstrated that massive MIMO is able to bring enormous benefits for edge training systems, including high-accuracy and high-rate for model aggregation, as well as high-reliability for massive device connectivity. Specifically, massive MIMO can achieve a high computation accuracy for model aggregation via exploiting spatial diversity [196] , and enable ultra-fast model aggregation with simultaneous multi-functions computation by spatial multiplexing [197] . Furthermore, for FL with edge devices sporadically enrolling, the device activity detection error goes to zero as the number of antenna elements in the BS goes to infinity, thereby achieving high-reliable devices participation for model updates. To scale edge training to huge physical areas with massive geographically distributed edge devices, ultradense wireless network is a promising way to achieve low-latency, high-reliability and high-performance. This is achieved by simultaneously uploading massive local model updates with multiple distributed edge servers with abundant communication, computation, and storage resources, thereby mitigating the stragglers issues (i.e., devices with low communication and computation capabilities may prolong the training time) and unfavorable channel dynamics. Besides, compared with the single edge server architecture, distributed edge servers are robust to server failure issues for reliable edge training. In particular, cloud radio access network (Cloud-RAN) [198] , [199] provides a cost-effective way to implement distributed antenna aided edge training systems, for which reliable model aggregation via AirComp can be achieved by centralized signal processing and shortening the communication distances between edge devices and edge servers [200] . The recent proposal of cell-free massive MIMO [201] serves a promising way to realize the wireless distributed FL systems by exploiting the channel hardening characterization (i.e., the effective channel gain is approximated by its expectation value) and avoiding sharing instantaneous CSI among edge servers [46] , as shown in Fig. 6 (c). To obtain the desired average function of local model updates for model aggregation via AirComp, magnitude alignment by scaling the transmit signals (e.g., channel inversion) is normally required to reduce the channel perturbation [202] . However, due to the resource-limited edge devices and the non-uniform fad-ing channels, the unfavorable signal propagation environment inevitably leads to magnitude reduction and misalignment with perturbed model aggregation, which in turn degrades the learning performance of the edge training process. Besides, the massive edge devices with sporadic access to the edge servers can be located at a service dead zone, which makes device activity detection challenging for weak channel links [185] . To enroll multiple edge devices via simultaneously transmission with NOMA, sufficient diversified channel gains are normally required for successive interference cancellation, which however may not always hold in practical scenarios [203] . Heterogeneity in terms of computation, communication, and storage across edge devices is one of major challenges to deploy edge AI systems. Waiting for the straggler edge devices with slow computation and communication speeds for model aggregation causes significant delays, which can be tackled by computation offloading and task scheduling by mobile edge computing (MEC) technique [204] . However, fully unleashing the benefits of MEC for straggler mitigation is limited by the hostile wireless links [205] . To address the above challenges in terms of propagation impairments, RIS has been shown to be a cost-effective technology to support fast yet reliable model aggregation with massive edge devices participation by programming the propagation environment of electromagnetic waves [118] , [206] , as shown in Fig. 6 (d) . Specifically, RIS is typically realized by planar or conformal artificial metamaterials or metasurfaces equipped with a large number of low-cost passive reflecting elements, which are capable of adjusting the phase shifts and amplitudes of the incident signals for directional signal enhancement or nulling, and thus altering the propagation of the reflected signals [49] , [207] , [208] . To design an RIS-empowered edge training system, RIS can be leveraged to align the magnitudes of the transmit signals by establishing favorable propagation links in waveform superposition for AirComp, resulting in boosted received signal power and accurate aggregated function at the edge server [209] . The boosted model aggregation via RIS can support efficient edge devices scheduling in over-the-air FL, thereby adapting to the time-varying local model updates and channel dynamics [48] , [178] . The reliable sporadic access in edge training can be developed by establishing abundant propagation scatters using RIS for accurate activity detection [185] . The latency for local model updates of the active devices can be further reduced by establishing favorable propagation links via RIS, thereby mitigating stragglers [205] . The typical SA-GIN [51] , [210] provides an integrated space information platform across the satellite networks (e.g., miniaturized satellites [211] ), aerial networks (e.g., UAV communications [212] ), and terrestrial communications (e.g., vehicular communications [213] ) to provide ubiquitous connectivity for various edge training architectures, as shown in Fig. 6 (e). Edge learning over a vehicle-to-everything network is critical to enable autonomous driving with delay-sensitive applications [145] . In this scenario, the local model updates need to be fast and reliably aggregated within neighbors via vehicle-to-vehicle communications [181] , or to the roadside units via vehicleto-infrastructure communication. In particular, radar sensing provides a promising way to predict the vehicular links [214] and holds the potential to provide real-time model aggregation via predictive beamforming in the model aggregation procedure. In the scenario with sparsely deployed edge servers and moving edge devices (e.g., ground vehicles), UAV, serving as the flying edge servers, can provide a promising solution to aggregate local model updates in the whole procedure of edge training by joint UAV trajectory and transceivers design over dynamic wireless edge networks [81] . To build a scalable edge training system with massive devices participation for training extremely deep AI models [215] , it is critical to access abundant computation resources across the continuum of nodes from edge devices, edge servers, to cloud servers [50] . It was shown in [130] that the client-server-cloud multi-layer architecture is able to significantly reduce the training time and energy consumption. In the scenario without abundant edge and cloud computing infrastructures, SAGIN provides an ubiquitous computing platform for the multi-layer hierarchical edge learning system, where the flying UAVs serve as the proximal edge computing, and the low earth orbit satellites serve as the relays to the cloud computing [216] . To realize SAGIN empowered edge training system, tier-adaptive aggregation interval management becomes critical to control the local and global model aggregation intervals [130] to achieve high communication efficiency. Besides, the client-edge-satellite association with dynamic scheduling and offloading is fundamental to tackle the heterogeneity challenges in terms of system resources and network topologies. In summary, this section presented multiple access technologies (e.g., AirComp, grant-free random access, NOMA), multiple antenna techniques (e.g., Cloud-RAN, cell-free massive MIMO, RIS), and multiple layer networks (e.g., UAV, SAGIN) that are needed to support low-latency model aggregation and diversified learning architectures and environments. We hope this can inspire more advanced 6G wireless and information techniques (e.g., millimeter-wave and terahertz (THz) communications [217] , [218] , age of information [219] ) to support edge AI systems for establishing integrated communication, computation and learning ecosystems. In this section, we present communication-efficient techniques for edge inference tasks with latency and reliability guarantees. Based on the dataset distribution characteristics, Yang et al. [33] proposed to categorize FL as horizontal FL (i.e., datasets share the same feature space but different sample space) and vertical FL (i.e., datasets share the same sample space but differ in the feature space). Hosseinalipour et al. [50] further proposed a fog learning framework by allowing both vertical communications (i.e., model updates are only exchanged across different network layers) and horizontal communications (model updates can be exchanged between devices in the same network layer). In a similar way, based on different computing collaboration schemes, we shall propose to categorize edge inference as horizontal edge inference (i.e., computation resources can only be harvested among edge devices, or only be pooled among edge servers), and vertical edge inference (i.e., computation resources can be harnessed between edge devices and edge servers), which are discussed in the following two subsections, respectively. We consider two different types of horizontal edge inference, as shown in Fig. 7 (a) and Fig. 7 (b) . on TinyML with DL model compression and neural network architecture search have been conducted to enable low-latency and energy-efficient model inference on a single device with limited storage and computation resources [220] . However, due to limited storage capability at edge devices, it becomes extremely difficult to accomplish inference computation tasks at a single device, for applications such as mobile navigation with a huge map information dataset [221] . Edge device distributed inference based on wireless MapReduce enjoys the advantages of providing low-latency, high-accurate, scalable, and resilient services for edge devices without accessing the cloud data center [12] , [29] . Specifically, edge device distributed inference involves computing the intermediate values based on the local input datasets using the map function, followed by sharing the intermediate values via horizontal communication among edge devices, thereby constructing the desired computation or inference results using the reduce function [222] , [52] . To tackle the communication bottleneck for shuffling intermediate values in the edge device distributed inference process, a coded distributed computing approach [223] was adopted in [221] to improve the scalability of wireless MapReduce by inducing the coded multicasting transmission opportunities. This, however, sacrifices computation efficiency as computation replication of the local dataset is needed. To further improve the spectral efficiency, instead of reducing the volume of communication bits [221] , a joint uplink and downlink design approach based on the interference alignment principle was developed in [52] to improve the communication data rates for local intermediate values shuffling. In particular, to compute the nomographic function [224] for edge device distributed inference based on the MapReduce decomposition, a multi-layer hierarchical AirComp approach was proposed in [225] to improve the spectral efficiency over the multi-hop D2D communication network, as shown in Fig. 7 (a) . 2) Edge Server Cooperative Inference: DL with highdimensional model parameters is able to provide high accurate intelligent services. However, it is challenging to directly deploy such large AI models on IoT devices due to very limited onboard computation, storage and energy resources. Deploying and executing DL models on edge servers turns out to be a promising solution. However, the limited wireless bandwidth between edge devices and edge servers becomes the key bottleneck [53] , [54] for edge server cooperative inference. Compressing and encoding the input source data at edge devices are essential to reduce the uplink communication overheads, for which various data dimensionality reduction approaches have been proposed by exploiting the specific computation tasks and communication environments [29] . Besides, for the applications with high-dimension output inference results (e.g., the output of the NVIDIA's AI system GauGAN is a large-sized photorealistic landscape image), it is equally important to design highly efficient downlink communication solutions for delivering the output inference results for the edge devices [53] , [54] . Computation replication has been shown to be effective for reducing the communication latency in computation offloading when the output size is large [226] . This is achieved by executing each inference task at multiple edge servers, followed by delivering the inference results for multiple edge devices via downlink cooperative transmission [227] . Although edge server cooperative inference via downlink transmission cooperation is able to significantly improve communication efficiency by mitigating interference and alleviating channel uncertainties, it causes extra energy consumption to execute the same inference tasks at multiple edge servers. To design a green edge server cooperative inference system, joint inference task selection and downlink coordinated beamforming framework was proposed in [53] to minimize the overall computation and communication energy consumption, as shown in Fig. 7 (b) . RIS was further leveraged in [54] to design green edge server cooperative inference systems by considering both uplink and downlink transmit power consumption. The rate splitting method is also anticipated to be able to further improve the energy-efficiency for edge server cooperative inference by partially decoding the inference result and partially treating it as noise in a flexible way [228] . We consider two different cases of vertical edge inference, as shown in Fig. 7 (c) and Fig. 7 (d) , with a single edge device and multiple edge devices, respectively. In the following, we shall first present effective techniques for communication-efficient vertical edge inference for these two cases, and then present a new general design principle for resource-constrained vertical edge inference, named task-oriented communication. 1) Edge Device-Server Co-Inference: Edge device distributed inference enjoys low-latency whereas it has limited accuracy due to limited processing capabilities and limited bandwidth. Although edge server cooperative inference is able to achieve high accuracy with DL models, it may raise data leakage issue and excessive communication delay. It thus becomes inapplicable for privacy-sensitive and delay-sensitive applications. To provide ubiquitous AI services across diversified application scenarios, edge device-server co-inference, as a complementary solution to horizontal edge inference, is promising to alleviate the communication overheads while achieving high accuracy and privacy for inferring the DNN models. This is achieved by dividing the DNN model into a computational friendly segment at the edge device, and the remaining computational heavily segment at the edge server [32] , as shown in Fig. 7 (c) . By adaptively partitioning the computation burdens between the edge devices and edge server, model split selection for the neural network is essential to achieve optimal computation-communication trade-off in the vertical edge inference system via edge device-server synergy and collaboration [82] . To further reduce the communication overheads, a communication-aware model compression approach was proposed in [32] to limit the number of the activated neurons at the last layer of neural network deployed at the edge device. However, the short message transmission [37] and data amplification effect [229] of the output features extracted by the on-device split model raise unique challenges to realize real-time vertical edge inference. 2) Ultra-Reliable and Low-Latency Communication: The packet length of the extracted output features transmitted from the edge devices can be very short [83] , [145] , for which the achievable data rate in such a finite block length regime is penalized by a non-vanishing decoding error probability [83] . Besides, the output inference results from the edge server should be delivered to the edge devices with latency and reliability guarantees for mission-critical applications. Considering the system dynamics, including task arrival dynamics in the network layer and the wireless channel dynamics in the physical layer, cross-layer optimization is needed to minimize the end-to-end delay for edge deviceserver co-inference [84] , [85] . In particular, MDP supported by linear programming was adopted in [85] to jointly schedule the transmission at edge devices and computation at the edge server for achieving the optimal power-latency tradeoff for edge device-server co-inference via MEC. The random delay characteristics were also investigated in [230] by modeling the coupled transmission and computation process as a discrete-time two-stage tandem queueing system. To support multiple edge devices for uploading intermediate features using short packet transmission, massive MIMO can be adopted to combat channel fast fading and provide a nearly deterministic communication environment due to channel hardening [231] . The received multiple intermediate features can be further aggregated via the mixup augmentation technique [232] to enable scalable and cooperative inference at the edge server, as shown in Fig. 7 (d) . 3) Task-Oriented Communication: As revealed in [32] , there exists an intrinsic communication-computation trade-off in resource-constrained vertical edge inference. This is mainly caused by the data amplification issue in DL based inference, namely the dimension of the intermediate feature may be larger than the input data size. Thus, if only a few layers of the neural network were deployed on the edge device, the output feature would have a size larger than the input data, yielding too much communication overhead. To reduce the intermediate feature size, more layers have to be deployed on the edge device, which however will lead to high local computation burden. To resolve this tension between local computation and communication overhead, it is of critical importance to effectively compress and transmit the intermediate feature. Such a communication task is fundamentally different from data-oriented communication in current wireless networks, i.e., to transmit a binary sequence at the highest data rate for reliable reconstruction at the receiver. In vertical edge inference, the feature transmission is for the inference task, not for reconstructing the feature vector with high fidelity. Thus, as advocated in [32] , we should rather design the communication scheme for feature transmission in a task-oriented manner, i.e., only transmitting the informative messages for the downstream inference task at the edge server. Instead of decoding the intermediate features, the received signal corrupted by channel fading and noise is directly processed at the edge server to obtain the inference results. This task-oriented communication principle constitutes a paradigm shift for the communication system design from data recovery to task accomplishment. It was first tested in vertical edge inference via end-to-end training with joint source-channel coding in [233] , which helps to reduce both the communication overhead and on-device computation cost. Such design principle has also been applied in other tasks. For example, the DL based end-to-end semantic communication system was developed in [21] via joint semantic source and wireless channel coding for recovering the meaning of sentences instead of the original transmitted data samples. The analog JSCC approach was presented in [234] to compress and then code the feature vectors, followed by leveraging the received perturbed signal directly for wireless image retrieval at the edge server via a fully-connected neural network. Recently, a novel and generic design framework for taskoriented communication was developed in [20] , which is based on the information bottleneck formulation [235] . This framework provides a principled way to extract informative and concise representation from the intermediate feature, which is made mathematically tractable via variational approximation. Furthermore, it has been extended to the cooperative inference scenario with multiple edge devices in [86] based on distributed information bottleneck [236] and distributed source coding theory. In summary, this section presented interference coordination techniques and task-oriented low-latency communication principles for horizontal edge inference and vertical edge inference, respectively. We hope this can motivate the codesign of wireless communication networks and deep learning models to deliver low-latency, energy-efficient and trustworthy edge AI inference services. In this section, we shall characterize the engineering requirements for designing communication-efficient edge AI systems, including accuracy, latency, energy, privacy and security. Effective service-driven resource allocation methods based on mathematical programming and ML are then provided to achieve scalability and trustworthiness for edge AI systems. We identify the engineering requirements for designing scalable and trustworthy edge AI systems. Resource allocation strategies must cater to the needs of edge AI systems for achieving accurate intelligence distillation into the edge network at an ultra-low power and low-latency cost. The edge training process involves designing the global iterates θ [t] with t as the iteration index to minimize the empirical loss function while achieving fast convergence rates with negligible optimality gap for problem (1) . To design efficient resource allocation schemes in edge training systems, it is particularly important to characterize the convergence behaviors for the global iterates θ [t] , which typically depend on the scheduled devices, local updates, aggregation behaviors, network topologies, propagation environments, function landscapes, and underlying algorithms. Specifically, for edge training systems via AirComp, the global model aggregation errors due to the wireless channel fading and noise will cause learning performance degradation [45] , [87] . The optimality gap (i.e., the distance between the current iterate and the desired solution), characterized by the convergence behavior of the global iterate, can be further controlled by various resource allocation schemes, including edge devices transmit power control [41] , [237] , edge server receive beamforming [31] , [179] , passive beamforming at RIS [48] , [178] , as well as device scheduling policy [31] , [48] . For digital design of the edge training system, the optimality basically depends on the edge devices selection, packet errors in the uplink transmission, and model parameter partition, for which user scheduling [238] , power control [88] , batchsize selection [239] , aggregation frequency control [240] , and bandwidth allocation [241] were provided to improve the accuracy in the edge training process. For edge inference, the accuracy indicates the quality of the inference results for a given task. It is typically measured by the number of correct predictions from inference, e.g., the classification tasks. For computer vision applications in autonomous driving, ultra-high accuracy for the DNN model inference is demanded. For applications in radio resource allocation via distributed ML, the accuracy of inferring a DNN model can be moderate. The accuracy of edge inference depends on the difficulty of the tasks and datasets, the quality of the trained model, the dynamics of wireless communication and edge computation environments, as well as the methods for processing the models, datasets and features. In particular, for horizontal edge inference via AirComp aided wireless MapReduce, the accuracy for computing a nomographic function is fundamentally limited by the channel fading and noise, for which various transceivers were designed to minimize the mean square error for inference computation tasks [225] . The accuracy of vertical edge inference depends on the informativeness and reliability of the intermediate features transmitted from edge devices, as well as the dynamic wireless environments, for which an ultra-reliable communication and adaptive JSCC approach need to be developed to improve the inference performance. In particular, information bottleneck was adopted in [20] to characterize the relationship between the accuracy of the vertical edge inference and the communication overhead of the intermediate features. 2) Latency: For edge training, the latency consists of computation latency and communication latency. The computation latency highly depends on the computation capability of the edge devices and servers, as well as the size of the models and datasets. The communication latency is the sum of the transmission latency of one round with respective to the total learning rounds until convergence for training the global model. In one typical training round, the communication delay in the uplink and downlink transmissions for model updates, is mainly affected by the wireless communication techniques, bandwidth and power budgets, wireless channel conditions, as well as the scheduled edge devices. Li et al. characterized the delay distribution for FL over arbitrary fading channels via the saddle point approximation method and large deviation theory [89] . The trade-off between the convergence speed and the per-round latency was revealed in [242] based on the key observation that more scheduled devices yield faster convergence rate while prolonging the time of uploading the local updates at each iteration due to limited radio resources. A probabilistic device scheduling policy was further proposed in [90] , [243] to minimize the overall training time in wireless FL. Besides, the trade-off between the local computation rounds for local model updates and the global communication rounds for global model updates is characterized to guide the resource allocation for minimizing the total learning time and energy consumption [244] . The convergence speeds of FL algorithms were characterized in [245] by considering non-identical dataset distributions, partial edge devices participation, and quantized model updates in both uplink and downlink communications. In the case of edge inference, the latency measures the time between the data arrival to the generation of the inference results through the edge AI system. It consists of the data pre-processing, data transmission, model inference, and result post-processing, which highly depend on the computation hardware, communication schemes, DL models and tasks. For the real-time mobile computer vision application of AR/VR, stringent latency requirements are required, e.g., 100ms. For scalable radio resource allocation application via DL, the inference latency must be within the channel coherence time (e.g., 10ms) to yield a meaningful resource allocation decision [23] . A low-rank matrix optimization based transceiver design approach was proposed in [52] for fast shuffling intermediate values in wireless distributed computing, thereby reducing the latency for horizontal edge inference via edge devices collaboration. For vertical edge inference, the dynamic computation partition and early existing scheme was proposed in [82] to accelerate the inference speed via edge device-server synergy. The cross-layer design approach was adopted in [85] to reduce the communication and computation latency for the timesensitive edge inference computing applications. In particular, the DL enabled task-oriented communication framework was developed to achieve low-latency edge device-server co-inference by merging feature compression, source coding and channel coding for the specific inference tasks [20] , [234] . For edge training, the energy consumption consists of the computation and communication process. For AlphaGo, it may cost 280 GPUs and a $3000 electric bill per game [246] . It is therefore critical to design energyefficient edge training systems to minimize carbon dioxide footprint for contributing the carbon neutrality target. Such a design is mainly dictated by the size of training models, model training algorithms, and wireless transmission strategies and hardware (e.g., the scaled SiGe bipolar technology [247] ), and edge computing architectures and hardware. Both computation energy consumption for local model updates and communication energy consumption for uploading local updates are simultaneously minimized in [92] by considering the the learning latency and accuracy constraints for wireless FL. The wireless power transfer approach was further adopted in [248] to power the edge devices for local model computation and communication, for which the active devices with enough harvested energy will contribute to accelerate the learning procedure. To deploy AirComp-assisted FL across massive IoT devices with a limited battery capability, microwave based wireless power transfer supported by RIS was adopted in [91] to recharge the IoT devices via energy beamforming at edge server and passive beamforming at RIS. In the case of edge inference, it becomes particularly important to achieve high energy efficiency for processing the DNN models at the network edge with battery-limited devices. The energy consumption of executing a DNN model is highly dictated by the computation architecture and methods (e.g., ultra-low power compute-in-memory AI accelerator) at the edge computation nodes [249] , the architecture of DNN models [250] , and the wireless transmission for data exchange during the model inference procedure. For horizontal edge inference via wireless cooperative transmission at multiple edge servers, the sum of the computation and transmission power consumption for generating and delivering the inference results were minimized via downlink coordinated beamforming [53] . Energy consumption at the edge devices can be minimized in the cross-layer design for delay-sensitive edge device-server co-inference by computation offloading [85] . Besides, energy harvesting becomes a promising technology for the edge computing based vertical edge inference by providing renewable energy resources for edge devices [251] . Trustworthiness is one of the main drivers for developing the next generation AI technologies. Specifically, the developed AI models and algorithms must be privacy-preserving, adversarial-resilient, robust, fair, optimal and interpretable [95] . For edge training, privacy mainly depends on the offloading or coding of the raw data and intermediate features. Keeping datasets at devices is a direct and effective way to preserve user's privacy in FL. Besides, the wireless channel noise yields a noisy model aggregation procedure via AirComp, which provides an inherent privacypreserving mechanism to enhance differential-privacy for each edge device. An adaptive power control method was further developed in [41] to control the differential-privacy levels in this over-the-air FL system, while avoiding the learning performance degradation. To address the adversarial attacks, the blockchain based decentralized learning was proposed in [252] to enable secure global model aggregation by using a consensus mechanism of blockchain. The block generation rate was optimized by considering the communication, computation and consensus delays in the blockchain enabled secure edge learning systems [253] , [252] . For edge inference, privacy and security are mainly dictated by the way of processing the input data, of transmitting the inference results, as well as the computation methods for model inference (e.g., secure multiparty computation). Establishing optimality for ML algorithms is important to deliver reliable and responsible AI services. However, empirical risk minimization for training the models is usually nonconvex, which poses significant challenges to guarantee global optimality for the learning algorithms and models [180] . Fortunately, under the high-dimensional statistical setting, the local strong convexity and smoothness of the nonconvex loss functions can be exploited to tame the nonconvexity for various learning models, e.g., blind demixing [189] , phase retrieval [254] , and shallow neural networks [255] . Besides, with high-dimensional datasets, the nonconvex loss functions of certain statistical learning models, including over-parameterized neural networks [180] and dictionary learning [256] , can enjoy benign global geometric landscape such that all the local minima are global minima, and all the saddle points can be escaped efficiently using the algorithms including trust region method and perturbed gradient descent method [257] . In particular, for edge training, the channel noise yields a perturbed stochastic gradient descent method to escape saddle points for distributed principal component analysis via AirComp [94] . Therefore, channel noise can provide a mechanism for both preserving differential privacy [41] and achieving global optimality [94] . These evidences indicate that we should embrace channel fading and noise for achieving trustworthy edge AI. Edge AI systems need to incorporate various wireless network architectures and communication strategies by integrating communication and computation. This will result in a highly complex and dynamic network, which requires innovative technologies and solutions. Various use cases (e.g., autonomous driving, industrial IoT, and smart healthcare) and heterogeneous requirements in terms of accuracy, latency, energy and trustworthiness, would further aggravate the complexity for resource allocation in edge AI systems. Besides, the complex edge servers and base stations will be quite energy-consuming, which brings formidable challenges for achieving high energy efficiency. To enable efficient resource allocation, it is thus critical to precisely model the heterogeneous demands for edge AI services, and reversely matching them with proper network resource orchestration. This, however, relies on the quantitative relationship between network resources and user requirements for edge AI tasks. To pave the way for this paradigm shift for service-driven resource allocation in edge AI systems, in the next subsection, we shall provide various intelligent optimization models and algorithms to adapt to diversified network environments and services. The service-driven network resource management problems for edge AI systems can be classified as a parametric family of mathematical optimization problems: where z ∈ R n is the optimization variable vector consisting of both discrete and continuous variables, α ∈ A is the problem parameter vector with A denoted as the parameter space (e.g., CSI). For each fixed α ∈ A, f 0 : R n → R is the objective function (e.g., optimality gap in edge training), g i : R n → R, i = 1, . . . , m are the inequality constraint functions (e.g., latency requirements in edge inference), and h i : R n → R, i = 1, . . . , p are the equality constraint functions. The resource allocation optimization problems are typically categorized as mixed-combinatorial optimization, nonconvex continuous optimization, stochastic optimization, and end-to-end optimization. To provide scalable, realtime, parallel, distributed and automatic resource allocation schemes, we shall propose to exploit the landscape of the underling optimization problems (2) by the theory-driven method based on mathematical programming, followed by developing the novel data-driven approach based on machine learning to achieve real-time and distributed implementations, as well as improved and robust performance, as shown in Fig. 8 . Here, ψ(α) is a mapping function to map the problem parameter α to the optimal solution of problem (2). The resource allocation problems in edge AI systems involve optimizing across learning, computation and communication. Specifically, for edge training systems, we need to jointly optimize the subcarrier and bandwidth allocation [241] , [88] , [90] , transmit power and receive beamforming [31] , [48] , [178] , passive beamforming at RIS [48] , [178] , device selection [31] , [242] and activity detection [76] , local updates computation [92] , and global aggregation frequency control [130] , thereby reducing the optimality gap and energy consumption in the distributed learning procedure. For edge inference via collaboration among edge servers, task selection, coordinated downlink beamforming among edge servers, as well as passive beamforming at RISs were jointly optimized to achieve green edge inference [54] , [53] . All of these resource allocation schemes can be formulated as a mixed combinatorial optimization problem, which needs to jointly optimize continuous-valued variables (e.g., beamforming and power control) and discrete-valued variables (e.g., device selection and subcarrier allocation). In particular, sparse optimization provides a powerful modeling approach to solve the mixed combinatorial resource allocation problems by exploiting the sparsity structures in the optimal solutions [96] . For instance, the group sparsity can represent the combinatorial variables for edge devices selection in FL [31] , edge devices activity detection [76] , and inference tasks selection [53] . The algorithmic advantages of the sparse optimization modeling approach are supported by various convex relaxation algorithms [198] , [258] , e.g., mixed 1 / 2 -norm minimization [80] . A typical sparse and low-rank optimization modeling and algorithmic framework was developed in [31] to support the joint device selection and transceiver design for improving learning performance in over-the-air FL systems. Although operation research provides a theory-driven approach for solving the mixed combinatorial optimization problem or its equivalent sparse optimization problem, the existing algorithms are either heuristic with noticeable performance loss or optimal with intolerably high computation complexity. To address these challenges, "learning to optimize" provides a data-driven design paradigm to improve the computation efficiency and system performance for resource allocation [114] , [259] . This is achieved by developing computationally efficient optimization methods by learning from the sampled problem instances using training models and methods. The learned algorithms can be furthered executed online and distributed for real-time resource allocations in edge AI systems. To solve the largescale mixed combinatorial optimization problem efficiently, imitation learning was adopted in [22] to learn an aggressive pruning policy in the globally optimal-achieving branchand-bound algorithm. This learning based brand-and-bound method can significantly save the time for pruning the nodes in the search tree, achieve near-optimal performance with few training samples, as well as guarantee feasibility of constraints without performance degradation. To further speed up the sparse optimization method for the mixed combinatorial optimization problem in edge device activity detection, the DNN based algorithm unrolling framework was developed in [97] to achieve theoretical guarantees, performance improvements, interpretability and robustness for the learned sparse optimization algorithms [186] . This is achieved by mapping the theory-driven iterate operations, i.e., iterative shrinkage thresholding algorithm, into an unrolled recurrent neural network, followed by training the model parameters based on supervised learning. Besides, a multi-agent RL approach was developed in [260] to solve the distributed mixed combinatorial optimization problem for task offloading and resource allocation in multi-layer edge inference systems. 2) Nonconvex Optimization: Most of the resource allocation problems in edge AI need to solve a series of nonconvex optimization problems, e.g., nonconvex sparse optimization for device selection in wireless FL, nonconvex quadratic programming for transceiver design in over-the-air FL [31] , lowrank matrix optimization for interference management in edge device distributed inference [52] , and unit modulus constrained phase shifts optimization [261] in RIS-empowered edge AI systems [48] , [54] . Convex approximation provides a natural way to design polynomial time complexity algorithms for nonconvex programs based on the principle of majorizationminimization [262] or successive convex approximation [263] . A two-stage framework was provided in [98] for solving general large-scale convex programs with infeasibility detection and scalable computation. This is achieved by matrix stuffing technique for fast conic program modeling in the first stage, and operator splitting method for scalable conic program solving in the second stage [98] . Although the semidefinite relaxation approach [264] is able to convexify the general quadratic programs by matrix lifting and dropping the resulting rank constraints, it fails to return high quality solutions in the high-dimensional settings. This issue was addressed by a difference-of-convex-functions (DC) programm [99] , [31] , [52] by representing the rank function via an equivalent DC function. This DC optimization modeling and algorithmic framework was typically applied to solve the nonconvex passive beamforming problem in the RIS-empowered FL systems [48] and edge inference systems [54] . To solve the large-scale rank constrained matrix optimization problems, Riemannian manifold optimization was proposed to optimize such nonconvex programs directly by exploiting the manifold geometric structures of fixed-rank matrices [100] , [101] . To further enable real-time, automatic and distributed design of nonconvex optimization algorithms for resource allocation in edge AI systems, DL was shown to have great potentials for achieving this goal. A multi-layer perceptron was adopted in [265] to directly learn the mapping from the problem instance to the output solution generated by the weighted minimum mean square error (WMMSE) algorithm for nonconvex precoding design [266] . Instead of running the iterates, the learned algorithm via deep learning can be executed in real-time, as neural networks only involve computationally cheap operations, e.g., matrix-vector multiplication. To reduce the model and sample complexity, as well as improve the performance and interpretability, unfolded neural networks were developed in [267] , [268] , [269] to parameterize the iterative policy via unfolding one iteration of the existing structured algorithm into one layer of a neural network. Graph neural network (GNN) has recently been shown to be able to harness the benefits of generalizability, interpretability, robustness, scalability, superior performance, real-time and distributed implementation for learning to optimize nonconvex problems, including power control [270] , beamforming [23] , and phase shift design [25] . This is achieved by modeling wireless network as a graph, followed by using a GNN to parameterize the mapping function ψ(α) for the optimal solution. 3) Stochastic Optimization: In large-scale edge AI systems, the estimated CSI will be inevitably imperfect or partially available [271] , [177] . It is thus critical to design practical resource allocation schemes by considering the CSI uncertainty, for which robust optimization and stochastic optimization are two typical approaches. Specifically, robust optimization approach aims at guaranteeing the worst-case but conservative performance over the uncertainty set. The robust optimization method can usually yield computationally tractable optimization models [102] . The stochastic optimization approach, e.g., chance constrained programming, only relies on the probabilistic description of the uncertainty of the problem parameter α in problem (2) and is able to provide a trade-off between conservativeness and probabilistic guarantees for the achievable performance [272] . In particular, a statistical learning approach was presented in [53] to learn a tractable uncertainty set to approximate the chance constrained programming for achieving high computation efficiency and system performance in the energy-efficient edge inference systems. However, due to the limited historical samples, it is difficult to characterize the true probability distribution for the CSI uncertainty. Distributionally robust optimization [273] provides a promising way to achieve worst-case probabilistic performance by incorporating all sample-generating distributions into an ambiguity set. However, finding the globally optimal solution for this method is often computationally intractable. DL provides an alternative way to address the uncertainty and dynamics of environment parameters to achieve modeling flexibility and computational efficiency for resource allocation in the complicated edge AI systems. Specifically, DL can provide acceptable performance for resource allocation based only on geographic locations information of the transmitters and receivers [274] . By considering CSI variations [275] , [55] , [276] and stochastic task arrivals [85] , [84] , the dynamic communication and computation resource allocation problem can be formulated as a MDP, for which deep RL, a modelfree approach, can provide efficient and robust solutions [73] . Besides, the learned algorithms can be distributively executed in the multi-agent edge AI systems. However, due to the distribution shift for system parameters in episodically dynamic environment, the trained model may suffer from performance deterioration when the dataset follows a different distribution in the inference stage [22] . Transfer learning [103] and continual learning [277] have recently been adopted to address such task mismatch issue in the "learning to optimize" framework considering the system distribution dynamics. Channel estimation plays a pivotal role to support effective resource allocation in largescale edge AI systems [198] , [206] . In particular, exploiting the low-dimensional structures of wireless channels becomes a promising way to address the curse of dimensionality for CSI acquisition in various networks. Specifically, in ultra-dense Cloud-RAN, a high-dimensional structured channel estimation framework was proposed in [278] by inducing the spatial sparsity and temporal correlation prior information using a convex regularizer. Sparsity structures of a massive MIMO channel was exploited in [279] to reduce the training overheads for CSI acquisition. The signal superposition property of a wireless multiple access channel was exploited to directly obtain the weighted sum of channels for receive beamformer design, thereby avoiding global CSI estimation [280] . The sparsity in the activity pattern was leveraged to develop the sparse signal processing framework for joint activity detection and channel estimation in grant-free massive access [75] . Due to the passive nature of RIS, it becomes infeasible to directly perform signal processing for channel estimation at RIS and the cascaded channel can only be estimated either at the edge servers or edge devices [206] . To address this unique challenge, the common reflective channels among all edge devices [281] , quasi-static property between RIS and edge server channel links [282] , [283] , spatial features of noisy channels and additive nature of noises [284] , as well as channel sparsity [285] and device activity sparsity [185] , were exploited to reduce the training overhead. However, all of the above works follow the "estimatethen-optimize" framework by first performing pilots-based channel estimation, followed by allocating resources based on the estimated CSI. However, this two-stage approach fails to achieve a low signaling overhead and superior system performance. Although the low-dimensional structures have been exploited for designing efficient channel estimation methods, the additional information (e.g., user location and mobility), are difficult to be modelled and incorporated into a unified mathematical model for CSI acquisition overhead reduction, which may exceed latency. Besides, the artificially defined criterion (e.g., mean square error) for channel estimation may not be aligned with the ultimate goal for resource allocation in edge AI systems. To address this challenge, a DL approach has recently been proposed to merge the two stages into an "end-to-end optimization" framework for resource allocation [104] . This is achieved by directly mapping the received pilots (i.e., the problem parameters α in (2) can be the received pilots) into the resource allocation policy without explicit channel estimation. This mapping function is further parameterized by a DNN to capture the inherent structures of the resource allocation problems. For instance, the GNN was adopted to model the permutation invariant and equivalent properties of the mapping function for resource allocation in the RIS empowered TDD wireless networks [25] . The neural calibration approach [286] was developed in FDD massive MIMO systems to map the received pilots at edge devices into feedback bits, followed by directly mapping the feedback bits into the downlink beamformers [104] . In summary, this section presented the operation research based theory-driven and ML based data-driven methods for designing effective, real-time, distributed and robust resource allocation strategies in edge AI systems. We hope these results can stimulate more service-driven resource allocation methods (e.g., network slicing [287] ) and optimization approaches (e.g., multi-objective optimization [288] ). The presented "learning to optimize" framework is also promising for resource allocation in various future wireless networks. In this section, we present a new mobile network architecture for edge AI systems, supported by the wireless network infrastructures in Section II and Section III, as well as the service-driven resource allocations in Section IV. We will provide an end-to-end (E2E) architecture design across the network infrastructure, data governance, network function, network management, as well as operations and applications. For each new generation of mobile networks, new services and capabilities have been introduced at the architecture level in order to meet more and typically more stringent demands. The mobile network was originally designed to deliver voice services. Since then, both the architecture and deployment of mobile networks have followed a centralized and hierarchical paradigm that reflects the nature of voice traffic and packet traffic of the mobile internet. To realize the vision of "connected intelligence", 6G will break and shift these traditional paradigms towards a novel architecture and design that meet new requirements for the deep integration of communication, AI, computing, and sensing at the network edge with new integrated capabilities empowered by evolutionary, as well as, revolutionary enabling technologies. Under this new design philosophy, we introduce a holistic E2E architecture for scalable and trustworthy 6G edge AI systems, as illustrated in Fig. 9 . By providing new wireless network infrastructures, enabling efficient data governance, integrating communication and computation at the network edge, as well as performing automated and scalable edge AI management and orchestration, the proposed E2E architecture will provide a scalable and flexible platform to support diversified edge AI applications with heterogeneous service requirements. Due to the expected huge energy consumption, as well as, security and privacy concerns, we envisioned that data in future 6G networks need to be collected, processed, stored and consumed at the the network edge. Since data and AI applications in 6G are expected to be much more diverse than ever before, it is incentive that there will be a provision for a unified and efficient data governance framework at the architecture level. Data governance goes far beyond conventional data collection and storage, which will also consider the data availability and quality, data sovereignty, knowledge management and legal implication. Data governance also must consider the mechanism to comply with the regional or national data protection policies and regulations of the data source in terms of usage rights and obligations such as GDPR. 1) Independent Data Plane: 5G has introduced a new network data analytics function (NWDAF) in the core network to implement AI-based network automation, optimize the related network functions (e.g., AI-based mobility management [105] ), and improve user service experience, etc. One of its main goals is to collect and analyze data from other 5G network elements to train AI models and implement AI inference for automated and scalable network optimization. Meanwhile, similar mechanisms such as collecting and analyzing data based on the existing SON/MDT (self-organizing networks and minimization of drive tests), was adopted for 5G radio access networks (RAN). In 6G, such a separated data collection and analytics mechanism needs to evolve to a unified and more efficient paradigm. An independent data plane in 6G could contribute to organizing and managing data efficiently while also considering privacy protection [28] . This paves the way for natively embedding edge AI into the 6G networks by leveraging multi-domain data. 2) Multi-Player Roles: The data governance ecosystem includes different roles: data customer, data provider, data owner and data steward, etc. These could be taken by the same or different business entities, including individual users. Hence, data governance is a typical scenario that involves multiple players. It thus becomes essential to establish a multi-party data trading platform to negotiate data rights and prices among different business entities while achieving trustworthiness, fairness and efficiency. This can be achieved using decentralized technologies such as blockchain with smart contracts design [28] , [106] . This will improve data efficiency and business ecosystem for the deployment of edge AI. In 5G, the superior performance has been achieved by leveraging the AI capability into RAN [289] , [290] . For instance, we can optimize radio resource scheduling and mitigate interference using machine learning methods [23] . Such utilization of AI in 5G can be refereed to as AI for networks. The targets of edge AI are not only AI for networks, but also networks for AI [12] , as presented in Section II and Section III. This will depend on the new functional capabilities of future networks, including how to make computing as a foundational capability of future 6G networks. A new type of radio equipment may emerge, which we refer to as a radio computing node (RCN), which allows the computing resources to be seamlessly converged with the communication capability. This will require the introduction of a new independent computing plane (CmP) in RCN to host AI tasks and collaborate with the communication functions in the control plane (CP) and user plane (UP) [28] . This will also enable the flexible integration of computation, communication and intelligence for edge AI. Edge AI involves a diverse set of learning models and algorithms, network infrastructures, as well as complicated collaboration for communication, computation and intelligence. Developing a framework for edge AI management and orchestration thus becomes an essential aspect for the design of the native AI support at the architecture level. This framework needs to be designed so as to facilitate the seamless integration and deployment of AI services, especially from third-parties. This can be achieved by planning, deploying, maintaining, and optimizing the decentralized machine learning models and algorithms, as well as the edge network infrastructures and functions. The edge AI management and orchestration shall also include AI workflow, distributed and streaming data, along with heterogeneous network resources, etc. Scale and cross-domain issues will be huge challenges for such a framework and this may involve complicated standardization efforts. Hence, building such new framework which will fully rely on standardization may not be feasible. We may instead leverage the open-source approach [28] to commercialize some of the components in this framework. In summary, this section presented the edge AI system architecture from an E2E perspective detailing its network infrastructure, data governance, edge network function, as well as edge AI management and orchestration. The standardization efforts, hardware and software platform, and application scenarios will be further discussed in the next section. We hope this novel E2E architecture can stimulate more innovative and out-of-the-box ideas for the evolution of edge AI system architectures. In this section, we will first discuss the standardization for edge learning models and algorithms, as well as integrated computing functionalities at the network edge. The researchoriented and production-oriented platforms are then provided, including distributed optimization based FL software, largescale optimization based resource allocation solvers, as well as edge AI computing and communicating hardware. To accelerate commercialization for edge AI, the application scenarios are also investigated, including autonomous driving, industrial IoT, and smart healthcare. The standardization of 6G will not be limited to the communications part, but also to the deep integration of communications, intelligence, and computing. The 3rd Generation Partnership Project (3GPP) may start an overall study into 6G systems around the end of 2025 (3GPP Release 20), while starting research into technical specifications around the end of 2027 [28] . In this subsection, we will introduce the standardizations on trustworthy edge learning models and algorithms, as well as wireless computing functionalities implemented in digital or analog communication systems. 1) Learning: The first technical standard for FL was approved on March 2021 as IEEE 3652.1-2020 [107] , IEEE Guide for Architectural Framework and Application of Federated Machine Learning. This IEEE standard for FL is developed by the Learning Technology Standards Committee of the IEEE Computer Society with participants from the shared machine learning working group, including 4Paradigm, AI Singapore, Alipay, Huawei, JD iCity, Tencent, WeBank, and Xiaomi, etc. Specifically, the IEEE 3652.1-2020 standard provides the guidelines for architectures and categories of FL from the perspectives of data, user and system, followed by identifying the associated application scenarios, performance evaluations, and regulatory requirements. Standardization plays a vital role in creating a private and secure FL ecosystems at large-scale to provide consumer products and services in the market. Besides, various standards for data privacy and security have been developed by the information security, cybersecurity and privacy protection technical committee from the International Organization for Standardization (IOS) and the International Electrotechnical Commission (IEC). For example, ISO/IEC TS 27570 [291] , Privacy Protection -Privacy Guidelines for Smart Cities, provides guidelines and recommendations for the management of privacy and the usage of standards. ISO/IEC DIS 27400 [292] , Guidelines for Security and Privacy in Internet of Things (IoT), provides guidance for principles and controls to provide private and secure IoT systems, services and solutions. The technical committee on cybersecurity of the European Telecommunications Standards Institute (ETSI) has recently unveiled ETSI EN 303 645 [293], Cyber Security for Consumer Internet of Things: Baseline Requirements, to provide cybersecurity standard and baseline for IoT consumer products and certification schemes. All these standards are applicable for developing private and secure edge AI models and algorithms to provide trustworthy products and services. 2) Computing: The computing functionality can be implemented in wireless networks by either digital modulation or analog modulation. Specifically, MEC provides a promising solution for deploying edge AI systems in current wireless systems with digital modulation [115] . The standardization activities on MEC thus pave a way to integrate edge AI into mobile networks at a maturity level. Specifically, the ETSI ISG MEC (Industry Specification Group for Multi-access Edge Computing) has established a standardized and open ecosystem for both edge-aware and edge-unaware applications at the network edge. It has published a set of white papers and specifications covering across user equipment application, service application, as well as management, mobility, and orchestration related application programming interfaces (APIs). Besides, 3GPP 5G specifications define the key enablers and architectures for edge computing to allow traffic routing, policy control, and network management for collaboration in a MEC system and a 5G system [294] . The collaboration between two independent systems of MEC and 5G can be further optimized in 6G, where communication and computing can be converged into one system by adding a computing plane [28] . In particular, ETSI ISG MEC has recently developed a synergized mobile edge cloud architecture by leveraging and harmonizing the existing and ongoing standards (including 3GPP, ETSI ISG MEC, GSMA, and 5GAA) [108] . Although 5G is rolled out globally, the modern mobile systems are widely deployed based on digital modulation instead of analog modulation [295] . To support analog communication based AirComp for edge training in current wireless networks [6] , one may either directly leverage the existing digital modulator with quantized analog signals or introduce an additional analog modulator with a matched filter for decoding the received signals [37] . It is obvious that more efforts are needed to incorporate AirComp functionalities into the future 6G standards to mature edge AI systems. We present the software and hardware platforms for deploying edge AI models and algorithms, as well as the optimization solvers for resource allocation in edge AI systems. There is a rapidly growing body of software platforms for simulations and productization of edge AI algorithms and models. FL library, TensorFlow Federated, Leaf, and PySyft have provided excellent open software frameworks for FL simulations and evaluations. To further accelerate research progress and facilitate algorithmic innovation and performance comparison in realistic FL environments, FedML [109] , a research-oriented open FL library, has recently been established to support diverse FL computing environments and topological architectures with standardized FL algorithm implementations and benchmarks. As a production-oriented software project, FATE [110] has been developed in the Webank's AI Department for financial industry by supporting various secure computing protocols and FL architectures. Besides, existing edge computing frameworks (e.g., Baidu "Baetyl" and Huawei "KubeEdge") provide promising solutions to deliver edge AI services. For edge AI empowered IoT applications, Microsoft "Azure IoT Edge", Google "Cloud IoT", Amazon "Web Services (AWS) IoT" and NVIDIA "EGX" provide edge AI platform to bring real-time AI services across a wide range of applications, including smart retial, home, manufacturing, and healthcare. Huawei has recently released a next-generation operating system, HarmonyOS [111] , to enable seamless collaboration and interconnection among smart edge devices across diverse platforms. This empowers connected intelligence by deploying edge AI in the operating systems. 2) Solver: Resource allocations for edge AI systems and wireless networks are booming through the development of various large-scale optimization models and algorithms. General-purpose large-scale optimization software solvers are important to enable rapid prototyping and deploying resource allocation optimization algorithms for edge AI systems. Specifically, CVX [112] provides a two-stage software framework for modeling and solving general largescale convex optimization problems. This is achieved by automatically transforming the original problem instances into standard conic programming forms, followed by calling the advanced off-the-shelf conic solvers, e.g., MOSEK [296] and SCS [113] . To further speed up the modeling phase and avoid repeatedly parsing and re-generating conic forms, a matrix stuffing technique was presented in [98] to generate the mapping function between the original problem and the conic form in a symbolic way instead of the time-consuming numerical way using CVX. It is thus particularly interesting to develop a solver to automatically generate the mapping functions for conic transformation in a symbolic forms. Besides, Gurobi [297] and MOSEK [296] are among the fastest solvers for solving the general mixed-integer second-order conic programs. Chen et al. recently released the software package "Open-L2O" [114] to implement the "learning to optimize" framework for benchmarking performance fairly and designing algorithm automatically. 3) Hardware: The achievable performance and benefits of edge AI systems are conditioned upon the availability of edge AI computing hardware and radio frequency (RF) hardware technologies. Specifically, edge AI computing hardware can be categorized as graphic processing unit (GPU)-based hardware (e.g., NVIDIA's GPUs), field programmable gate array (FPGA)-based hardware (e.g., Xilinx's SDSoC), and application specific integrated circuit (ASIC)-based hardware (e.g., Google's TPU). The detailed comparisons for various edge AI computing hardware can be found in [115] . In particular, the chip design procedure for edge AI hardware can be significantly accelerated by the recent proposal of deep RL assisted fast chip floorplanning [298] . Besides, the massive broadband connectivity requirements for edge AI systems motivate the innovations in RF hardware technologies. The benefits of RIS-empowered FL systems highly depend on the capabilities of manipulating electromagnetic waves at the metasurfaces [49] , whose reconfigurability is typically enabled by switches, tunable material, topological metasurfaces, and hybrid metasurfaces [299] . THz communication with frequency band 0.1-10 THz, is envisioned as a promising enabler for achieving sensing, communication, and learning in an integrated edge AI system. To approach this THz region, RF hardware technologies and solutions were thoroughly investigated in [116] , including semiconductor circuits, antenna forms, packaging and testing of transceivers. We discuss edge AI enabled application scenarios by inspiring new communication algorithms, resource allocation optimization algorithms, as well as data processing methods. 1) Autonomous Driving: Autonomous driving basically refers to self-driving vehicles that move without the intervention of human drivers. Self-driving vehicle integrates various innovative technologies, including advanced sensor technologies, new energy automobiles, next generation AI technologies, as well as future vehicular networks. Autonomous driving can significantly improve the safety, passenger comfort, travel and logistics efficiency, collision avoidance, and energy efficiency. Edge AI shall provide a pivotal role for achieving ultralow latency communication, intelligent networking, real-time data analytics, as well as high security for intelligent vehicles [117] , [27] . A general DL framework was proposed in [26] to enable ultra-reliable and low-latency vehicular communication, by incorporating the domain knowledge including information theoretical tools and cross-layer optimization design. To minimize the vehicles' queuing latency, a FL approach was developed in [300] to learn the tail distribution of the queue lengths. To cope with the high mobility and heterogeneous structures in vehicular networks, DL becomes powerful for dynamic resource allocation [24] and network traffic control [27] . In particular, edge AI techniques, including distributed RL [55] , [301] , decentralized GNN [23] , as well as distributed DNN with binarized output layer [61] , are able to learn and execute the distributed resource allocation polices in an automatic and real-time manner. The data processing tasks for autonomous driving mainly include perception, high-definition (HD) mapping, as well as SLAM [117] , [302] . Specifically, to understand the environments for intelligent decision making, various sensory data from onboard sensors (e.g., light detection and ranging (LiDAR), cameras, radar and sonar) need to be processed for the perception tasks, including localization, object detection and tracking. The perception capability can be enhanced by edge AI systems, e.g., edge device-server co-inference of DNN models for vision based perception tasks [303] . HD mapping aims at constructing a representation of the vehicles operating environments, e.g., obstacles, landmarks position, curvature and slope. This is imperative to achieve high accurate localization for autonomous driving. The edge server cooperative inference method in Section III-A2 can be adopted to reduce the storage and communication overheads for updating the HD map by collecting fresh data from the vehicles in the dynamic environments [117] . SLAM comprises simultaneously estimating the state of a vehicle and constructing a map of the environment [304] , which paves the way for achieving full autonomy in autonomous driving [305] . Edge SLAM [56] , [57] has recently been developed to execute DL based visual SLAM algorithms on edge vehicles. This is achieved by deploying the tracking computation parts on the edge vehicles while offloading the remaining parts (e.g., local mapping and loop closure) to the roadside edge server via vertical edge inference in Section III-B. 2) Internet of Things: Artificial Intelligence of Things (AIoT) leverages AI technologies and IoT infrastructures to improve the human-machine interactions and enable multiagent communications and collaborations. AIoT goes beyond the conventional communication paradigm for audio, video and data delivery. It will enable semantic communication [58] to exchange semantic information among agents. Shannon and Weaver categorize communication into three levels, including transmission level (i.e., transmit symbols accurately), semantic level (i.e., convey the desired meanings precisely), and effectiveness level (i.e., produce the desired actions effectively) [306] . Sematic communication is able to significantly improve the communication efficiency by only transmitting the extracted relevant information for sematic information delivery tasks with the semation error as the performance metric. A distributed edge DL approach has been recently developed in [58] to enable low-latency semantic communication over IoT networks. This is achieved by jointly optimizing the compressed DNN based transmitters at the edge IoT devices and the quantized DNN based receivers at the edge server over the wireless fading channels. Industrial IoT (IIoT) is a production-oriented industrial network for connecting industrial devices and equipments, processing and exchanging generated data, as well as optimizing the production system [307] . Besides, digital twin is becoming a key technology for smart manufacturing in industrial 4.0 by connecting physical machines and digital representations in a cyber-physical system [308] , [309] . This is achieved by providing a virtual representation of the industrial entities and products' life-cycle to predict and optimize the behaviors of the manufacturing process. Edge AI provides a promising way to model and deploy digital twins for IIoT networks to process the high volume of industrial streaming data with low-latency and high-security guarantees. Specifically, edge computing provides a general platform for inferring DNN models via computation offloading to reduce network latency and operation cost in IIoT [119] . FL becomes a key enabling technology to support intelligent IIoT applications (e.g., smart grid and smart manufacturing) and provide IIoT services (e.g., data offloading and mobile crowdsensing) [120] . In particular, blockchain empowered FL was proposed in [310] to provide secure communication and private data sharing schemes for constructing digital twin IIoT networks, followed by reducing communication overheads via asynchronous model aggregation. 3) Smart Healthcare: Smart healthcare aims to realize a common platform for efficient and personalized healthcare, intelligent health monitoring, and precision medicine development via collaboration among multiple participants (e.g., doctors, patients, hospitals, and research institutions). This is achieved by emerging advanced technologies, including DL [311] , [312] , Tactile Internet, IoT, edge AI, and wireless communications. In particular, edge AI with distributed and secure DL has been demonstrated to be able to significantly improve the reliability, accuracy, scalability, privacy and security for precision medicine and Internet of Medical Things [313] , including medical imaging, drug development, and chronic disease management [314] . Specifically, Kaissis et al. in [315] presented a FL approach for medical imaging to preserve privacy and avoid potential attacks against the datasets or learning algorithms. Besides, swarm learning has recently been developed in [35] to provide a decentralized and confidential clinical disease detection solution for diseases (e.g., COVID-19, tuberculosis, and leukaemia). This is achieved by leveraging the blockchain and edge computing techniques to develop a secure and private decentralized learning architecture while keeping the medical data locally. MIT Media Lab established a split learning project to allow health entities collaboration for training patient diagnostic models without sharing sensitive raw data [316] . An RL approach for decisions making in patient treatment was introduced in [317] to realize safe and risk-conscious healthcare practice. Haptic communication [121] aims at delivering the skill set (e.g., the manipulation skills representation learned from the multisensory tactile and visual data [318] , and the signatures of the human grasp learned using a tactile glove [319] ) over the Tactile Internet in an ultra-reliable and low-latency manner. It has potentials in healthcare applications including telediagnosis, tele-rehabilitation, and tele-surgery, which turns out to be essential during the ongoing COVID-19 pandemic. Edge AI becomes a key enabling technique for the Tactile Internet with human-in-the loop to facilitate ultra-responsive and truly immersive tactile actuation in the tele-operation systems [320] . This is achieved by enabling the network edge with intelligent prediction capability for haptic information (e.g., tactile feedback and control traffics) [321] , as well as the intelligent resource allocations across the whole network layers [322] . Specifically, a distributed optimization framework was developed in [323] to design an edge computing assisted Tactile Internet for achieving both the ultra-low latency and high energy efficiency. Such a distributed optimization algorithm can be further learned via the distributed DL techniques [61] . Besides, a variational optimization framework was proposed in [324] to enjoy low-latency and high-reliability for massive access in the Tactile Internet. The variational decision function can be further parameterized via DNNs with the capability of distributed training and inference for practical deployments [324] , [23] . In summary, this section presented standardizations, platforms, and applications for practical deployment of edge AI systems. Combining the presentations of edge training in Section II, edge inference in Section III, resource allocation in Section IV, and system architecture in Section V, we complete the roadmap for edge AI ecosystem, as shown in Fig. 2 . We hope these results can encourage more communities and stakeholders to engage in industrializing and commercializing edge AI in the era of 6G. Embedding low-power, low-latency, reliable, and trustworthy intelligence into the network edge is an inevitable trend and disruptive shift in both academia and industry. Edge AI serves as a distributed neural network to imbue connected intelligence in 6G, thereby enabling intelligent and seamless interactions among the human world, physical world, and digital world. The challenges for building edge AI ecosystems are multidisciplinary spanning wireless communications, machine learning, operation research, domain applications, regulations and ethics. In this paper, we have investigated the key wireless communication techniques, effective resource management approaches and holistic network architectures to design scalable and trustworthy edge AI systems. The standardizations, platforms, and applications were also discussed for productization and commercialization of edge AI. We hope that this article will serve as a valuable reference and guideline for further considering edge AI opportunities across theoretical, algorithmic, systematic, and entrepreneurial considerations to embrace the exciting era of edge AI. Resilient and Intelligent NextG Systems (RINGS) Expanded 6G vision, use cases and societal values 6G vision and candidate technologies Network 2030: A blueprint of technology, applications and market drivers towards the year 2030 and beyond What will 5G be? 5G: A tutorial overview of standards, trials, challenges, deployment, and practice Huawei 5 6G wireless systems: Vision, requirements, challenges, insights, and opportunities The internet of no things: Making the internet disappear and 'see the invisible' Towards 6G wireless communication networks: Vision, enabling technologies, and new paradigm shifts A vision of 6G wireless systems: Applications, trends, technologies, and open research problems The roadmap to 6G: AI empowered wireless networks White paper on machine learning in wireless communication networks Interplay between RIS and AI in wireless communications: Fundamentals, architectures, applications, and open research problems Communication algorithms via deep learning An introduction to deep learning for the physical layer Toward a 6G AI-native air interface Joint source-channel coding over additive noise analog channels using mixture of variational autoencoders 6G networks: Beyond shannon towards semantic and goal-oriented communications Learning task-oriented communication for edge inference: An information bottleneck approach Deep learning enabled semantic communication systems LORM: Learning to optimize for resource management in wireless networks with few training samples Graph neural networks for scalable radio resource management: Architecture design and theoretical analysis Deep-learning-based wireless resource allocation with application to vehicular networks Learning to reflect and to beamform for intelligent reflecting surface with implicit channel estimation A tutorial on ultrareliable and low-latency communications in 6G: Integrating domain knowledge into deep learning Future intelligent and secure vehicular network toward 6G: Machine-learning approaches 6G, the Next Horizon: From Connected People and Things to Connected Intelligence Communicationefficient edge AI: Algorithms and systems Edge intelligence: Paving the last mile of artificial intelligence with edge computing Federated learning via overthe-air computation Communication-computation trade-off in resource-constrained edge inference Federated machine learning: Concept and applications Advances and open problems in federated learning Swarm learning for decentralized and confidential clinical machine learning Distributed learning of deep neural network over multiple agents Communication-efficient and distributed learning over wireless networks: Principles and applications Communicationefficient policy gradient methods for distributed reinforcement learning Fully decentralized multi-agent reinforcement learning with networked agents Secure distributed on-device learning networks with byzantine adversaries Privacy for free: Wireless federated learning via uncoded transmission with adaptive power control The algorithmic foundations of differential privacy Lagrange coded computing: Optimal design for resiliency, security, and privacy Broadband analog aggregation for low-latency federated edge learning Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air Cell-free massive MIMO for wireless federated learning Cell-free massive MIMO versus small cells Federated learning via intelligent reflecting surface Smart radio environments empowered by reconfigurable intelligent surfaces: How it works, state of research, and the road ahead From federated to fog learning: Distributed machine learning over heterogeneous wireless networks Space-air-ground integrated network: A survey Data shuffling in wireless distributed computing via low-rank optimization Energy-efficient processing and robust wireless cooperative transmission for edge inference Reconfigurable intelligent surface for green edge inference Multi-agent deep reinforcement learning for dynamic power allocation in wireless networks Edge-SLAM: edgeassisted visual simultaneous localization and mapping Edge assisted mobile semantic visual SLAM A lite distributed semantic communication system for Internet of Things Distributed q-learning aided uplink grant-free noma for massive machine-type communications AI empowered resource management for future wireless networks Deep learning for distributed optimization: Applications to wireless resource management An introduction to probabilistic spiking neural networks: Probabilistic models, learning rules, and applications Spiking neural networks-part iii: Neuromorphic communications The collective advantage for advancing communications and intelligence An immunology-inspired network security architecture A fully-decoupled ran architecture for 6g inspired by neurotransmission Communication-efficient learning of deep networks from decentralized data A field guide to federated optimization Randomized gossip algorithms Diffusion strategies for adaptation and learning over networks: an examination of distributed strategies and network behavior Optimal complexity in decentralized training Communication efficient distributed machine learning with the parameter server Optimization for reinforcement learning: From a single agent to cooperative agents Distributed statistical machine learning in adversarial settings: Byzantine gradient descent Sparse signal processing for grant-free massive connectivity: A future paradigm for random access protocols in the internet of things Joint activity detection and channel estimation for IoT networks: Phase transition and computation-estimation tradeoff A survey on non-orthogonal multiple access for 5G networks: Research challenges and future trends Nonorthogonal multiple access for 5G: solutions, challenges, opportunities, and future research trends Blind over-the-air computation and data fusion via provable wirtinger flow Group sparse beamforming for green Cloud-RAN UAV aided overthe-air computation Edge AI: On-demand accelerating deep neural network inference via edge computing Channel coding rate in the finite blocklength regime Dynamic task offloading and resource allocation for ultra-reliable low-latency edge computing Joint channel and queue aware scheduling for latency sensitive mobile edge computing with power constraints Task-oriented communication for multi-device cooperative edge inference Fast convergence algorithm for analog federated learning A joint learning and communications framework for federated learning over wireless networks Delay analysis of wireless federated learning based on saddle point approximation and large deviation theory Convergence time optimization for federated learning over wireless networks Wireless-powered over-the-air computation in intelligent reflecting surface-aided IoT networks Energy efficient federated learning over wireless communication networks Byzantine-resilient federated machine learning via over-the-air computation Turning channel noise into an accelerator for over-the-air principal component analysis Explaining deep neural networks and beyond: A review of methods and applications Generalized sparse and low-rank optimization for ultra-dense networks Algorithm unrolling for massive access via deep neural network with theoretical guarantee Large-scale convex optimization for dense wireless cooperative networks DC formulations and algorithms for sparse optimization problems Manopt, a matlab toolbox for optimization on manifolds Low-rank matrix completion for topological interference management by riemannian pursuit Robust group sparse beamforming for multicast green Cloud-RAN with imperfect CSI Transfer learning and meta learning-based fast downlink beamforming adaptation Deep learning for distributed channel feedback and multiuser precoding in FDD massive MIMO A non-stochastic learning approach to energy efficient mobility management SDTE: A secure blockchain-based data trading ecosystem IEEE guide for architectural framework and application of federated machine learning Harmonizing standards for edge computing FedML: A research library and benchmark for federated machine learning HarmonyOS CVX: Matlab software for disciplined convex programming, version 2.1 Conic optimization via operator splitting and homogeneous self-dual embedding Learning to optimize: A primer and a benchmark Convergence of edge computing and deep learning: A comprehensive survey White paper on RF enabling 6G-opportunities and challenges from technology to spectrum Mobile edge intelligence and computing for the internet of vehicles Federated machine learning for intelligent IoT via reconfigurable intelligent surface Edge computing in industrial Internet of Things: Architecture, advances and challenges Federated learning for industrial internet of things in future industries Toward haptic communications over the 5G tactile internet Federated learning: Challenges, methods, and future directions QSGD: communication-efficient SGD via gradient quantization and encoding TernGrad: Ternary gradients to reduce communication in distributed deep learning signSGD: Compressed optimisation for non-convex problems High-dimensional stochastic gradient quantization for communication-efficient edge learning UVeQFed: Universal vector quantization for federated learning ATOMO: communication-efficient learning via atomic sparsification Sparse communication for distributed gradient descent Hierarchical quantized federated learning: Convergence analysis and system design Federated learning with compression: Unified analysis and sharp guarantees Lazily aggregated quantized gradient innovation for communication-efficient federated learning On the convergence of fedavg on Non-IID data Federated optimization in heterogeneous networks Federated learning based on dynamic regularization Personalized federated learning with moreau envelopes Distributionally robust federated averaging Distributionally robust learning Federated multi-task learning Personalized federated learning with theoretical guarantees: A model-agnostic meta-learning approach Fedsplit: An algorithmic framework for fast federated optimization Tackling the objective inconsistency problem in heterogeneous federated optimization Towards flexible device participation in federated learning Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent Opportunities of federated learning in connected, cooperative, and automated industrial systems Decentralized stochastic optimization and machine learning: A unified variance-reduction framework for robust performance and fast convergence Cooperative SGD: A unified framework for the design and analysis of communication-efficient SGD algorithms A unified theory of decentralized SGD with changing topology and local updates Decentralized stochastic optimization and gossip algorithms with compressed communication Consensus control for decentralized deep learning Decentralized gradient methods: does topology matter GADMM: Fast and communication efficient framework for distributed machine learning Quasi-global momentum: Accelerating decentralized deep learning on heterogeneous data Coordinate descent algorithms Beyond backprop: Online alternating minimization with auxiliary variables Federated doubly stochastic kernel learning for vertically partitioned data FDML: A collaborative machine learning framework for distributed features Supervised learning under distributed features FORESEEN: towards differentially private deep inference for intelligent Internet of Things Deep reinforcement learning: A brief survey Asynchronous methods for deep reinforcement learning Multiagent actor-critic for mixed cooperative-competitive environments Multi-agent reinforcement learning: A selective overview of theories and algorithms A decentralized policy gradient approach to multi-task reinforcement learning Aggregathor: Byzantine machine learning via robust gradient aggregation Harnessing wireless channels for scalable and privacy-preserving federated learning Breaking the communicationprivacy-accuracy trilemma Federated variancereduced stochastic gradient descent with robustness to byzantine attacks Byzantine-robust distributed learning: Towards optimal statistical rates Machine learning with adversaries: Byzantine tolerant gradient descent Byzantine-resilient secure federated learning Blockchained on-device federated learning Federated learning over wireless fading channels On analog gradient descent learning over multiple access fading channels One-bit over-the-air aggregation for communication-efficient federated edge learning: Design and convergence analysis Federated learning over noisy channels: Convergence analysis and design examples Collaborative machine learning at the wireless edge with blind transmitters Reconfigurable intelligent surface enabled federated learning: A unified communication-learning design approach Joint optimization of communications and federated learning over the air Mathematical models of overparameterized neural networks Federated learning with cooperating devices: A consensus approach for massive IoT networks Massive access for 5G and beyond Massive access for future wireless communication systems Massive connectivity with massive MIMO-Part I: Device activity detection and channel estimation Reconfigurable intelligent surface for massive connectivity Algorithm unrolling: Interpretable, efficient deep learning for signal and image processing Low-overhead Communications in IoT Networks Blind demixing for low-latency communication Nonconvex demixing from bilinear measurements Supporting more active users for massive access via data-assisted activity detection Joint activity detection and data decoding in massive random access via a turbo receiver Deep neural network aided low-complexity MPA receivers for uplink SCMA systems DeepNOMA: A unified framework for NOMA using deep multi-task learning Buffer-aided relay selection for cooperative hybrid NOMA/OMA networks with asynchronous deep reinforcement learning Deep multi-task learning for cooperative NOMA: System design and principles Hybrid beamforming for massive MIMO over-the-air computation MIMO over-the-air computation for highmobility multimodal sensing Largescale convex optimization for ultra-dense Cloud-RAN Recent advances in cloud radio access networks: System architectures, key techniques, and open issues Over-the-air computation via cloud radio access networks Prospective multiple antenna technologies for beyond 5G Over-the-air computing for 6G -turning air into a computer Reconfigurable intelligent surface empowered downlink non-orthogonal multiple access Computation scheduling for distributed machine learning with straggling workers Latency minimization for intelligent reflecting surface aided mobile edge computing Reconfigurableintelligent-surface empowered wireless communications: Challenges and opportunities Intelligent reflecting surface-aided wireless communications: A tutorial Holographic MIMO surfaces for 6G wireless networks: Opportunities, challenges, and trends Over-the-air computation via reconfigurable intelligent surface Cooperative multigroup multicast transmission in integrated terrestrial-satellite networks CubeSat communications: Recent advances and future challenges Accessing from the sky: A tutorial on UAV communications for 5G and beyond Evolutionary V2X technologies toward the internet of vehicles: Challenges and opportunities Radar-assisted predictive beamforming for vehicular links: Communication served by sensing Towards federated learning at scale: System design Space/aerial-assisted computing offloading for IoT applications: A learning-based approach Millimeter-wave cellular wireless networks: Potentials and challenges Wireless communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond Age of information: An introduction and survey Model compression and acceleration for deep neural networks: The principles, progress, and challenges A scalable framework for wireless distributed computing Mapreduce: simplified data processing on large clusters A fundamental tradeoff between computation and communication in distributed computing Nomographic functions: Efficient computation in clustered gaussian sensor networks Multi-level over-the-air aggregation of mobile edge computing over D2D wireless networks Exploiting computation replication for mobile edge computing: A fundamental computation-communication tradeoff study Multi-cell MIMO cooperative networks: A new look at interference Rate splitting for MIMO wireless networks: a promising PHY-layer strategy for LTE evolution Branchy-GNN: A deviceedge co-inference framework for efficient point cloud processing Delay characterization of mobileedge computing for 6g time-sensitive services Joint pilot and payload power allocation for massive-MIMO-enabled URLLC IIoT networks Manifold mixup: Better representations by interpolating hidden states BottleNet++: An end-to-end approach for feature compression in device-edge co-inference systems Wireless image retrieval at the edge The information bottleneck method Distributed variational representation learning Optimized power control design for over-the-air federated edge learning Scheduling policies for federated learning in wireless networks Accelerating dnn training in wireless federated edge learning systems Adaptive federated learning in resource constrained edge computing systems Joint parameter-and-bandwidth allocation for improving the efficiency of partitioned edge learning Joint device scheduling and resource allocation for latency constrained wireless federated learning Communication-efficient federated learning Federated learning over wireless networks: Convergence analysis and resource allocation Design and analysis of uplink and downlink communications for federated learning Machine learning with neuromorphic photonics Energy-quality scalable integrated circuits and systems: Continuing energy scaling in the twilight of moore's law Wirelessly powered federated edge learning: Optimal tradeoffs between convergence and power transfer How to evaluate deep neural network processors: TOPS/W (alone) considered harmful Efficient processing of deep neural networks: A tutorial and survey Dynamic computation offloading for mobile-edge computing with energy harvesting devices Decentralized privacy using blockchain-enabled federated learning in fog computing Federated learning with blockchain for autonomous vehicles: Analysis and design challenges Nonconvex optimization meets lowrank matrix factorization: An overview Learning shallow neural networks via provable gradient descent with random initialization Complete dictionary recovery over the sphere I: Overview and the geometric picture How to escape saddle points efficiently Smoothed lp-minimization for green Cloud-RAN with user admission control Machine learning for combinatorial optimization: a methodological tour d'horizon Distributed multicloud multi-access edge computing by multi-agent reinforcement learning Optimization techniques in reconfigurable intelligent surface aided networks Majorization-minimization algorithms in signal processing, communications, and machine learning A general inner approximation algorithm for nonconvex mathematical programs Semidefinite relaxation of quadratic optimization problems Learning to optimize: Training deep neural networks for interference management An iteratively weighted MMSE approach to distributed sum-utility maximization for a s interfering broadcast channel Unfolding WMMSE using graph neural networks for efficient power allocation Joint deep reinforcement learning and unfolding: Beam selection and precoding for mmwave multiuser MIMO with lens arrays Iterative algorithm induced deep-unfolding neural networks: Precoding design for multiuser MIMO systems Optimal wireless resource allocation with random edge graph neural networks Joint client scheduling and resource allocation under channel uncertainty in federated learning Optimal stochastic coordinated beamforming for wireless cooperative networks with CSI uncertainty Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations Spatial deep learning for wireless scheduling Power allocation in multiuser cellular networks: Deep reinforcement learning approaches Reconfigurable intelligent surface assisted multiuser MISO systems exploiting deep reinforcement learning Learning to continuously optimize wireless resource in episodically dynamic environment Massive CSI acquisition for dense Cloud-RANs with spatial-temporal dynamics Compressed CSI acquisition in FDD massive MIMO: How much training is needed? Computation over MAC: Achievable function rate maximization in wireless networks Channel estimation for intelligent reflecting surface assisted multiuser communications: Framework, algorithms, and analysis Matrix-calibration-based cascaded channel estimation for reconfigurable intelligent surface assisted multiuser MIMO Two-timescale channel estimation for reconfigurable intelligent surface aided wireless communications Deep residual learning for channel estimation in intelligent reflecting surface-assisted multi-user communications Cascaded channel estimation for large intelligent metasurface assisted massive MIMO Neural calibration for scalable beamforming in FDD Massive MIMO with implicit channel estimation Network slicing and softwarization: A survey on principles, enabling technologies, and solutions Multiobjective signal processing optimization: The way to balance conflicting metrics in 5g systems Deep learning for intelligent wireless networks: A comprehensive survey Applications of deep reinforcement learning in communications and networking: A survey Privacy protection -privacy guidelines for smart cities Cyber security for consumer Internet of Things: Baseline requirements 3rd generation partnership project; technical specification group services and system aspects; system architecture for the 5G system An introduction to analog and digital communication Mosek A graph placement methodology for fast chip design Quo vadis, metasurfaces Distributed federated learning for ultra-reliable low-latency vehicular communications Distributed multiagent meta learning for trajectory design in wireless drone networks Edge computing for autonomous driving: Opportunities and challenges Visual odometry Past, present, and future of simultaneous localization and mapping: Toward the robust-perception age Simultaneous localization and mapping: A survey of current trends in autonomous driving The mathematical theory of communication Industrial Internet of Things: Challenges, opportunities, and directions Digital twin in industry: State-of-the-art A methodology for digital twin modeling and deployment for Industry 4.0 Blockchain and federated learning for privacy-preserved data sharing in industrial IoT Predicting the future-big data, machine learning, and clinical medicine A review of deep learning in medical imaging: Imaging traits, technology trends, case studies with progress highlights, and future promises Security and privacy for the internet of medical things enabled healthcare systems: A survey Precision medicine in the era of artificial intelligence: implications in chronic disease management Secure, privacy-preserving and federated machine learning in medical imaging Split learning for health: Distributed deep learning without sharing raw patient data Guidelines for reinforcement learning in healthcare See, feel, act: Hierarchical learning for complex manipulation skills with multisensory fusion Learning the signatures of the human grasp using a scalable tactile glove 5G-enabled tactile internet Achieving low-latency humanto-machine (H2M) applications: An understanding of H2M traffic for AI-Facilitated bandwidth allocation A comprehensive survey of the tactile internet: State-of-the-art and research directions Distributed optimization for energy-efficient fog computing in the tactile internet Deep learning aided grant-free NOMA toward reliable low-latency access in tactile Internet of Things 03) is an internationally recognized leader in wireless communications and networks with research interest in artificial intelligence, big data analytics systems, mobile cloud and edge computing, tactile Internet, 5G systems and beyond. In these areas, he has over 720 papers along with 15 patents, including 11 US inventions. He is a Member of the United States National Academy of Engineering, Fellow of IEEE, Fellow of Hong Kong Institution of Engineers, and Member of the Hong Kong Academy of Engineering Sciences. He is also recognized by Thomson Reuters as an ISI Highly Cited Researcher and was listed among the 2020 top 30 of AI Award; 2017 IEEE Cognitive Networks Technical Committee Publication Award; 2016 IEEE Signal Processing Society Young Author Best Paper Award; 2016 IEEE Marconi Prize Paper Award in Wireless Communications; 2011 IEEE Wireless Communications Technical Committee Recognition Award He is the founding Editor-in-Chief of the prestigious IEEE Transactions on Wireless Communications and been involved in organizing many flagship international conferences Since 1993, he has been with the Hong Kong University of Science & Technology (HKUST) where he is currently the New Bright Professor of Engineering Letaief is well recognized for his dedicated service to professional societies and IEEE where he has served in many leadership positions. These include IEEE Communications Society Vice-President for Conferences, elected member of IEEE Product Services and Publications Board, and IEEE Communications Society Vice-President for Technical Activities. He also served as President of the His research areas include optimization, statistics, machine learning, wireless communications, and their applications to 6G, IoT, and edge AI. He was a recipient of the 2016 IEEE Marconi Prize Paper Award in Wireless Communications, the 2016 Young Author Best Paper Award by the IEEE Signal Processing Society, and the 2021 IEEE ComSoc Asia-Pacific Outstanding Young Researcher Award He received more than 50 patents during the research. He was deeply involved in 3GPP2 (EVDO/UMB), WiMAX/802.16m and 3GPP(LTE/NR) standardization and contributed several key technologies such as flexible radio frame structure, radio resource management and MIMO. His current research interest is in the area of signal processing, protocol and networking for the next generation wireless communication He is now a vice president of the National Natural Science Foundation of China. His research interests include broadband wireless communications, multimedia signal processing, and satellite communications. He has authored/co-authored over 300 referred technical papers published in international renowned journals and conferences and over 80 Chinese invention patents