key: cord-0472002-hsxfqc80
authors: Zhou, Mingyong
title: A Theory on AI Uncertainty Based on Rademacher Complexity and Shannon Entropy
date: 2020-11-19
journal: nan
DOI: nan
sha: 46bbce81cf2df4e3d7611cd332a4284b3bd3bc70
doc_id: 472002
cord_uid: hsxfqc80

In this paper, we present a theoretical discussion on AI deep learning neural network uncertainty investigation based on the classical Rademacher complexity and Shannon entropy. First it is shown that the classical Rademacher complexity and Shannon entropy is closely related by quantity by definitions. Secondly based on the Shannon mathematical theory on communication [3], we derive a criteria to ensure AI correctness and accuracy in classifications problems. Last but not the least based on Peter Barlette's work, we show both a relaxing condition and a stricter condition to guarantee the correctness and accuracy in AI classification . By elucidating in this paper criteria condition in terms of Shannon entropy based on Shannon theory, it becomes easier to explore other criteria in terms of other complexity measurements such as Vapnik-Cheronenkis, Gaussian complexity by taking advantage of the relations studies results in other references. A close to 0.5 criteria on Shannon entropy is derived in this paper for the theoretical investigation of AI accuracy and correctness for classification problems.

AI uncertainty investigation becomes more and more important while AI deep learning neural networks are applied successfully in more and more areas including imaging diagnosis and pattern classification problems etc. Though AI achieves great success in applications covering huge industries and applications, the theoretical investigation for AI success is less performed as compared to the applications investigations. However the theory study on how and why AI is so successful is by all means very important in order to understand the mechanism of success of the AI deep learning neural networks, so that in future AI can be understood better and applied better with theoretical guidance for industry applications.

In this paper based on Shannon entropy and Shannon theory [3] , we provide a different perspective and derive a criteria for AI uncertainty in classification problems. From our perspective, we are able to build a theory on AI uncertainty based on Shannon mathematical theory on communications.

First classical Rademacher complexity and Shannon entropy are investigated for their quantity by definitions. It is shown that they are closely related with each other. Secondly based on the Shannon mathematical theory on communication, we are able to derive a criteria to ensure AI correctness and accuracy in classifications problems by comparing an AI to a communication channel and define new AI quantities such as in communication channels. Last but not the least based on Peter Barlette's work in [1] , we show both relaxing conditions and much more strict conditions to guarantee the correctness and accuracy in AI classifications. By elucidating the criteria condition in terms of Shannon entropy based on Shannon theory, it becomes easier to explore other criteria in terms of other complexity measurements such as Vapnik-Cheronenkis, Gaussian complexity by taking advantage of the relations studies results in other references . In this paper a 1/2 criteria of Shannon entropy is derived for the theoretical investigation of AI accuracy and correctness for classification problems.

In Section 2 , staring with the Shannon definitions on channel capacity and rate and its main result, we derive a basic criteria for misclassification by observing a quantity relation between Rademacher complexity and Shannon entropy .In sections 3, a discussion is devoted to the error upper and lower bound ranges effects on the criteria based on reference [1] . Both a relaxing condition and a much more strict criteria condition based on the AI error upper bound and lower bounds are derived. In section 4 future work direction is indicated. This paper is concluded with section 5 in which major results are summarized.

In this section, we first overview the main results of Shannon theory by reviewing Shannon's definition on channel capacity and information rate. Then we present an AI model for pattern classification problems and show how the AI model can be compared to a communication channel in terms of Shannon entropy. We will show that AI model and communication channel are closely correlated in principle. Further we elucidate a quantity relation between classical Rademacher complexity and Shannon entropy from their definitions before we derive a basic close to 1/2 criteria for AI misclassification.

where C is defined as the channel capacity in Shannon theory for a noisy channel, and R is channel transmission rate. Equation (2.1) by Shannon theory indicates that the transmission rate R must not exceed channel capacity C in order to recover the original information by any desired accuracy [3] . Now with an AI mode for classification problems, one is faced with problems as follows: given an AI deep learning neural network with the following error ranges (2.2) and a complex classification probelm with certain Rademacher complexity,one is required to find a criteria by which one can classify the patterns correctly with any accuracy desired. Obviously R rate in Shannon theory is closely related to the Rademacher complexity for a given complex problem. To show this observation, we only need to notice the defintion of classical Rademacher complexity R(F) by equation (2.3) , that is, the quantity R(F) reaches the maximum of 1.0 in case that With the above observations on three quantities, we now derive a basic criteria for AI misclassifications. By Shannon theory on the noise channel with conditional entropy Hy(x), to ensure transmit with no errors starting from equation (2.1), the following (2.4) must be satisfied The analysis is performed as part A) and part B): 

Based on the derived results in this paper, we can extend to other complexity measures such as Vapnik-Cheronenkis(VC) and Gaussian complexity by provoking the results in [2] . Criteria based on other measures are possible if one look into the quantity relationships such as in [2] .

We show basic quantity relationships between classical Rademacher complexity and Shannon entropy. First classical Rademacher complexity is outlined and its quantity relations with Shannon entropy is elucidated. Secondly by comparing AI to a communication channel, we are able to apply Shannon theory and entropy to investigate the AI uncertainty. A 1/2 criteria is derived from Shannon theory on noise communication channels, that is , either Shannon entropy of AI model error lower bound is less than 1/2, or the Rademacher complexity of given problem is less than 1/2. The theoretical results in this paper can provide a guidance for AI applications and model selections to ensure no ambiguity in AI outcomes. The main results are summarized as follows:

(1), Classical Rademacher complexity is almost equivalent to Shannon entropy in quantity. In fact three quantities Rademacher complexity, Shannon conditional entropy Hy(x) and rate R are closely correlated in quantity.

(2), If AI capability C is defined as in Shannon theory, either max(Hy(x)) <1/2 or R(f)<1/2 can ensure that AI model can restore original information with any small errors desired, (In fact if R(f)>1/2 is observed in real applications, only max(Hy(x)) <1/2 is required for this purpose which is a much more strict condition)

A pragmatic approach is proposed for AI deep learning neural network , that is, assumed under the condition of R(f) >1/2 for real application problems by training well (in the sense e.g. to keep the size of weights small enough) the AI by large distinctive samples [1] and by testing the error probability to ensure max(Hy(x)) < 1/2 so that in applications AI can attain the most desired performance with any small errors desired according to Shannon Theory

The Sample Complexity of Pattern Classification with Neural Networks: The Size of the Weights is More Important than the Size of the Network

A Deep Connection Between Vapnik-Cheronenkis Entropy and Rademacher Complexity

A Mathematical Theory of Communication

The VC dimension of graph and recursive neural networks

A Comprehensive Survey on Graph Neural Networks