key: cord-1003010-bxr8r37d
authors: Lu, Chenguang
title: Channels’ Confirmation and Predictions’ Confirmation: From the Medical Test to the Raven Paradox
date: 2020-03-26
journal: Entropy (Basel)
DOI: 10.3390/e22040384
sha: 6abc1087f9ed8c4ec6c47bc331f3cd8a08ac63cc
doc_id: 1003010
cord_uid: bxr8r37d

After long arguments between positivism and falsificationism, the verification of universal hypotheses was replaced with the confirmation of uncertain major premises. Unfortunately, Hemple proposed the Raven Paradox. Then, Carnap used the increment of logical probability as the confirmation measure. So far, many confirmation measures have been proposed. Measure F proposed by Kemeny and Oppenheim among them possesses symmetries and asymmetries proposed by Elles and Fitelson, monotonicity proposed by Greco et al., and normalizing property suggested by many researchers. Based on the semantic information theory, a measure b* similar to F is derived from the medical test. Like the likelihood ratio, measures b* and F can only indicate the quality of channels or the testing means instead of the quality of probability predictions. Furthermore, it is still not easy to use b*, F, or another measure to clarify the Raven Paradox. For this reason, measure c* similar to the correct rate is derived. Measure c* supports the Nicod Criterion and undermines the Equivalence Condition, and hence, can be used to eliminate the Raven Paradox. An example indicates that measures F and b* are helpful for diagnosing the infection of Novel Coronavirus, whereas most popular confirmation measures are not. Another example reveals that all popular confirmation measures cannot be used to explain that a black raven can confirm “Ravens are black” more strongly than a piece of chalk. Measures F, b*, and c* indicate that the existence of fewer counterexamples is more important than more positive examples’ existence, and hence, are compatible with Popper’s falsification thought.

A universal judgment is equivalent to a hypothetical judgment or a rule, such as "All ravens are black" is equivalent to "For every x, if x is a raven, then x is black". Both can be used as a major premise for a syllogism. Deductive logic needs major premises; however, some major premises for empirical reasoning must be supported by inductive logic. Logical empiricism affirmed that a universal judgment can be verified finally by sense data. Popper said against logical empiricism that a universal judgment could only be falsified rather than be verified. However, for a universal or hypothetical judgment that is not strict, and is therefore uncertain, such as "Almost all ravens are black", "Ravens are black", or "If a man's Coronavirus test is positive, then he is very possibly infected", we cannot say that one counterexample can falsify it. After long arguments, Popper and most logical empiricists reached the identical conclusion [1, 2] that we may use evidence to confirm universal judgments or major premises that are not strict or uncertain.

In 1945, Hemple [3] proposed the confirmation paradox or the Raven Paradox. According to the Equivalence Condition in the classical logic, "If x is a raven, then x is black" (Rule I) is equivalent The main purposes of this paper are:

• to distinguish channel confirmation measures that are compatible with the likelihood ratio and prediction confirmation measures that can be used to assess probability predictions, • to use a prediction confirmation measure c* to eliminate the Raven Paradox, and • to explain that confirmation and falsification may be compatible.

The confirmation methods in this paper are different from popular methods, since:

• Measures b* and c* are derived by the semantic information method [17, 20] and the maximum likelihood criterion rather than defined directly.

Confirmation and statistical learning mutually support so that the confirmation measures can be used not only to assess major premises but also to make probability predictions.

The main contributions of this paper are:

• It clarifies that we cannot use one confirmation measure for two different tasks: (1) to assess (communication) channels, such as medical tests as testing means, and (2) to assess probability predictions, such as to assess "Ravens are black". • It provides measure c* that manifests the Nicod criterion and hence provides a new method to clarify the Raven Paradox.

The rest of this paper is organized as follows. Section 2 includes background knowledge. It reviews existing confirmation measures, introduces the related semantic information method, and clarifies some questions about confirmation. Section 3 derives new confirmation measures b* and c* with the medical test as an example. It also provides many confirmation formulas for major premises with different antecedents and consequents. Section 4 includes results. It gives some cases to show the characteristics of new confirmation measures, to compare various confirmation measures by applying them to the diagnosis of COVID-19, and to show how an increased example affects the degrees of confirmation with different confirmation measures. Section 5 discusses why we can only eliminate the Raven Paradox by measure c*. It also discusses some conceptual confusion and explains how new confirmation measures are compatible with Popper's falsification thought. Section 5 ends with conclusions.

First we distinguish logical probability and statistical probability. Logical probability of a hypothesis (or a label) is the probability in which the hypothesis is judged to be true, whereas its statistical probability is the probability in which the hypothesis or the label is selected.

Suppose that ten thousand people go through a door. For everyone denoted by x, entrance guards judge if x is elderly. If two thousand people are judged to be elderly, then the logical probability of the predicate "x is elderly" is 2000/10,000 = 0.2. If the task of entrance guards is to select a label for every person from four labels: "Child", "Youth", "Adult", and "Elderly", there may be one thousand people who are labeled "Elderly". The statistical probability of "Elderly" should be 1000/10,000 = 0.1. Why are not two thousand people are labeled "Elderly"? The reason is that some elderly people are labeled "Adult". A person may make two labels be true, such as a 65 years old person makes both "Adult" and "Elderly" be true. That is why the logical probability of a label is often greater than its statistical probability. An extreme example is that the logical probability of a tautology, such as "x is elderly or not elderly", is 1, whereas its statistical probability is almost 0 in general because a tautology is rarely selected. Statistical probability is normalized (the sum is 1), whereas logical probability is not normalized in general [17] . Therefore, we use two different symbols "P" and "T" to distinguish statistical probability and logical probability.

We now consider the Shannon channel [21] between human ages and labels "Child", "Adult", "Youth", "Middle age", "Elderly", and the like. Let X be a random variable to denote an age and Y be a random variable to denote a label. X takes a value x∈{ages}; Y takes a value y∈{"Child", "Adult", "Youth", "Middle age", "Elderly", . . . }. Shannon calls the prior probability distribution P(X) (or P(x)) the source, and calls P(Y) the destination. There is a Shannon channel P(Y|X) from X to Y. It is a transition probability matrix:

P(y 1 |x 1 ) P(y 1 |x 2 ) . . . P(y 1 |x m ) P(y 2 |x 1 ) P(y 2 |x 2 ) . . . P(y 2 |x m ) . . . . . . . . . . . . P(y n |x 1 ) P(y n |x 2 ) . . . P(y n |x m )

. . . P(y n |x)

where ⇔ indicates equivalence. This matrix consists of a group of conditional probabilities P(y j |x i ) (j = 0, 1, . . . , n; i = 0, 1, . . . , m) or a group of transition probability functions (so called by Shannon [21] ), P(y j |x) (j = 0, 1, . . . , n), where y j is a constant, and x is a variable. There is also a semantic channel that consists of a group of truth functions. Let T(θ j |x) be the truth function of y j , where θ j is a model or a set of model parameters, by which we construct T(θ j |x). The θ j is alse explained as a fuzzy sub-set of the domain of x [17] . For example, y j = "x is young". Its truth function may be

where 20 and 25 are model parameters. For y k = "x is elderly", its truth function may be a logistic function:

where 0.2 and 65 are model parameters. The two truth functions are shown in Figure 1 . 

where  indicates equivalence. This matrix consists of a group of conditional probabilities P(yj|xi) (j = 0,1,…,n; i = 0,1, …, m) or a group of transition probability functions (so called by Shannon [21] ), P(yj|x) (j = 0,1, …, n), where yj is a constant, and x is a variable. There is also a semantic channel that consists of a group of truth functions. Let T(θj|x) be the truth function of yj, where θj is a model or a set of model parameters, by which we construct T(θj|x). The θj is alse explained as a fuzzy sub-set of the domain of x [17] . For example, yj="x is young". Its truth function may be

where 20 and 25 are model parameters. For yk = "x is elderly", its truth function may be a logistic function:

where 0.2 and 65 are model parameters. The two truth functions are shown in Figure 1 . According to Tarski's truth theory [22] and Davidson's truth-conditional semantics [23] , a truth function can represent the semantic meaning of a hypothesis. Therefore, we call the matrix, which consists of a group of truth functions, a semantic channel: According to Tarski's truth theory [22] and Davidson's truth-conditional semantics [23] , a truth function can represent the semantic meaning of a hypothesis. Therefore, we call the matrix, which consists of a group of truth functions, a semantic channel:

Using a transition probability function P(y j |x), we can make the probability prediction P(x|y j ) by P(x|y j ) = P(x)P(y j |x)/P(y j ), which is the classical Bayes' formula. Using a truth function T(θ j |x), we can also make a probability prediction or produce a likelihood function by P(x|θ j ) = P(x)T(θ j |x)/T(θ j ), (6) where T(θ j ) is the logical probability of y j . There is

Equation (6) is called the semantic Bayes' formula [17] . The likelihood function is subjective; it may be regarded as the hybird of logical probability and statistical probability.

When the source P(x) is changed, the above formulas for predictions still work. It is easy to prove that P(x|θ j ) = P(x|y j ) as T(θ j |x)∝P(y j |x). Since the maximum of T(θ j |x) is 1, letting P(x|θ j ) = P(x|y j ), we can obtain the optimized truth function [17] :

where x is a variable and max(.) is the maximum of the function in brackets (.).

We use h 1 to denote a hypothesis, h 0 to denote its negation, and h to denote one of them. We use e 1 as another hypothesis as the evidence of h 1 , e 0 as its negation, and e as one of them. We use c(e, h) to represent a confirmation measure, which means the degree of inductive support. Note that c(e, h) here is used as in [8] , where e is on the left, and h is on the right.

In the existing studies of confirmation, logical probability and statistical probability are not definitely distinguished. We still use P for both in introducing popular confirmation measures.

The popular confirmation measures include:

• D(e 1 , h 1 )=P(h 1 |e 1 )−P(h 1 ) (Carnap, 1962 [1] ), • M(e 1 , h 1 ) = P(e 1 |h 1 )−P(e 1 ) (Mortimer, 1988 [5] ), • R(e 1 , h 1 ) = log[P(h 1 |e 1 )/P(h 1 )] (Horwich, 1982 [6] ), • C(e 1 , h 1 ) = P(h 1 , e 1 )−P(e 1 )P(h 1 ) (Carnap,1962 [1] ), (Shortliffe and Buchanan, 1975 [7] , Crupi et al., 2007 [8] ), • S(e 1 , h 1 ) = P(h 1 |e 1 )−P(h 1 |e 0 ) (Christensen, 1999 [9] ), • N(e 1 , h 1 ) = P(e 1 |h 1 )−P(e 1 |h 0 ) (Nozik, 1981 [10] ), • L(e 1 , h 1 ) = log[P(e 1 |h 1 )/P(e 1 |h 0 )] (Good, 1984 [11] ), and • F(e 1 , h 1 ) = [ P(e 1 |h 1 )−P(e 1 |h 0 )]/[ P(e 1 |h 1 )+ P(e 1 |h 0 )] (Kemeny and Oppenheim, 1952 [12] ).

Two measures D and C proposed by Carnap are for incremental confirmation and absolute confirmation respectively. There are more confirmation measures in [8, 24] . Measure F is also denoted by l* [13] , L [8], or k [24] . Most authors explain that probabilities they use, such as P(h 1 ) and P(h 1 |e 1 ) in D, R, and C, are logical probabilities. Some authors explain that probabilities they use, such as P(e 1 |h 1 ) in F, are statistical probabilities.

Firstly, we need to clarify that confirmation is to assess what kind of evidence supports what kind of hypotheses. Let us have a look at the following three hypotheses:

• Hypothesis 1: h 1 (x) = "x is elderly", where x is a variable for an age and h 1 (x) is a predicate. An instance x=70 may be the evidence, and the truth value T(θ 1 |70) of proposition h 1 (70) should be 1. If x=50, the (uncertain) truth value should be less, such as 0.5. Let e 1 = "x ≥ 60", true e 1 may also be the evidence that supports h 1 so that T(θ 1 |e 1 ) > T(θ 1 ).

• Hypothesis 2: h 1 (x) = "If age x ≥ 60, then x is elderly", which is a hypothetical judgment, a major premise, or a rule. Note that x = 70 or x ≥ 60 is only the evidence of the consequent "x is elderly" instead of the evidence of the rule. The rule's evidence should be a sample with many examples. • Hypothesis 3: e 1 →h 1 = "If age x ≥ 60, then x is elderly", which is the same as Hypothesis 2. The difference is that e 1 = "x ≥ 60"; h 1 = "x is elderly". The evidence is a sample with many examples like {(e 1 , h 1 ), (e 1 , h 0 ), . . . }, or a sampling distribution P(e, h), where P means statistical probability.

Hypothesis 1 has a (uncertain) truth function or a conditional logic probability function between 0 and 1, which is ascertained by our definition or usage. Hypothesis 1 need not be confirmed. Hypothesis 2 or Hypothesis 3 is what we need to confirm. The degree of confirmation is between −1 and 1.

There exist two different understandings about c(e, h):

• Understanding 1: The h is the major premise to be confirmed, and e is the evidence that supports h; h and e are so used by Elles and Fitelson [14] . • Understanding 2: The e and h are those in rule e→h as used by Kemeny and Oppenheim [12] . The e is only the evidence that supports consequent h instead of the major premise e→h (see Section 2.3 for further analysis).

Fortunately, although researchers understand c(e, h) in different ways, most researchers agree to use a sample including four types of examples (e 1 , h 1 ), (e 0 , h 1 ), (e 1 , h 0 ), and (e 0 , h 0 ) as the evidence to confirm a rule and to use the four examples' numbers a, b, c, and d (see Table 1 ) to construct confirmation measures. The following statements are based on this common view. The a is the number of example (e 1 , h 1 ). For example, e 1 = "raven" ("raven" is a label or the abbreviate of "x is a raven") and h 1 = "black"; a is the number of black ravens. Similarly, b is the number of black non-raven things; c is the number of non-black ravens; d is the number of non-black and non-raven things.

To make the confirmation task clearer, we follow Understanding 2 to treat e→h = "if e then h" as the rule to be confirmed and replace c(e, h) with c(e→h). To research confirmation is to construct or select the function c(e→h)=f (a, b, c, d).

To screen reasonable confirmation measures, Elles and Fitelson [14] propose the following symmetries: They conclude that only HS is desirable; the other three symmetries are not desirable. We call this conclusion the symmetry/asymmetry requirement. Their conclusion is supported by most researchers. Since TS is the combination of HS and ES, we only need to check HS, ES, and CS. According to this symmetry/asymmetry requirement, only measures L, F, and Z among the measures mentioned above are screened out. It is uncertain whether N can be ruled out by this requirement [15] . See [14, 25, 26] for more discussions about the symmetry/asymmetry requirement.

Greco et al. [15] propose monotonicity as a desirable property. If f (a, b, c, d) does not decrease with a or d and does not increase with b or c, then we say that f (a, b, c, d) has the monotonicity.

Measures L, F, and Z have this monotonicity, whereas measures D, M, and N do not have. If we further require that c(e→h) are normalizing (between −1 and 1) [8, 12] , then only F and Z are screened out. There are also other properties discussed [15, 19] . One is logicality, which means c(e→h) = 1 without counterexample and c(e→h) = −1 without positive example. We can also screen out F and Z using the logicality requirement.

Consider the medical test, such as the test for COVID-19. Let e 1 = "positive" (e.g., "x is positive", where x is a specimen), e 0 = "negative", h 1 = "infected" (e.g.,"x is infected"), and h 0 = "uninfected". Then the positive likelihood ratio is LR + = P(e 1 |h 1 )/P(e 1 |h 0 ), which indicates the reliability of the rule e 1 →h 1 . Measures L and F have the one-to-one correspondence with LR:

F(e 1 , h 1 )=(LR + − 1)/(LR + + 1).

Hence, L and F can also be used to assess the reliability of the medical test. In comparison with LR and L, F can indicate the distance between a test (any F) and the best test (F = 1) or the worst test (F = −1) better than LR and L. However, LR can be used for the probability predictions of diseases more conveniently [27] .

The evidence for the consequent of a syllogism is the minor premise, whereas the evidence for a major premise or a rule is a sample or a sampling distribution P(e, h). In some researchers' studies, e is used sometimes as the minor premise, and sometimes as an example or a sample; h is used sometimes as a consequent, and sometimes as a major premise. Researchers use c(e, h) or c(h, e) instead of c(e→h) because they need to avoid the contradiction between the two understandings. However, if we distinguish the two types of evidence, it has no problem to use c(e→h). We only need to emphasize that the evidence for a major premise is a sampling distribution P(e, h) instead of e.

If h is used as a major premise and e is used as the evidence (such as in [14, 28] ), −e (the negation of e) is puzzling because there are four types of examples instead of two. Suppose h = p→q and that e is one of (p, q), (p, −q), (−p, q), and (−p, q). If (p, −q) is the counterexample, and other three examples (p, q), (−p, q) and (−p, −q) are positive examples, which support p→q, then (−p, q) and (−p, −q) should also support p→−q because of the same reason. However, according to HS [14] , it is unreasonable that the same evidence supports both p→q and p→−q. In addition, e is a sample with many examples in general. A sample's negation or a sample's probability is also puzzling.

Fortunately, though many researchers say that e is the evidence of a major premise h, they also treat e as the antecedent and treat h as the consequent of a major premise because, only in this way, one can calculate the probabilities or conditional probabilities of e and h for a confirmation measure. Why, then, should we replace c(e, h) with c(e→h) to make the task clearer? Section 5.3 will show that h used as a major premise will result in the misunderstanding of the symmetry/asymmetry requirement.

Confirmation is often explained as assessing the impact of evidence on hypotheses, or the impact of the premise on the consequent of a rule [14, 19] . However, this paper has a different point of view that confirmation is to assess how well a sample or sampling distribution supports a major premise or a rule; the impact on the rule (e.g., the increment of degree of confirmation) may be made by newly added examples.

Since one can use one or several examples to calculate the degree of confirmation with a confirmation measure, many researchers call their confirmation incremental confirmation [14, 15] . There are also researchers who claim that we need absolute confirmation [29] . This paper supports absolute confirmation. The problem with incremental confirmation is that the degrees of confirmation calculated are often bigger than 0.5 and are irrelevant to our prior knowledge or a, b, c, and d that we knew before. It is unreasonable to ignore prior knowledge. Suppose that the logical probability of h 1 = "x is elderly" is 0.2; the evidence is one or several people with age(s) x > 60; the conditionally logical probability of h 1 is 0.9. With measure D, the degree of confirmation is 0.9 − 0.2 = 0.7, which is very large and irrelevant to the prior knowledge.

In 

The increment of the degree of confirmation brought about by a new example is closely related to the number of old examples. Section 5.2 will further discuss incremental confirmation and absolute confirmation.

We now consider the Shannon channel and the semantic channel of the medical test. The relation between h and e is shown in Figure 2 . The increment of the degree of confirmation brought about by a new example is closely related to the number of old examples. Section 5.2 will further discuss incremental confirmation and absolute confirmation.

We now consider the Shannon channel and the semantic channel of the medical test. The relation between h and e is shown in Figure 2 . In Figure 2 , h1 denotes an infected specimen (or person), h0 denotes an uninfected specimen, e1 is positive, and e0 is negative. We can treat e1 as a prediction "h is infected" and e0 as a prediction "h is uninfected". In other word, h is a true label or true statement, and e is a prediction or selected label. The x is the observed feature of h; E1 and E2 are two sub-sets of the domain of x. If x is in E1, then e1 is selected; if x is in E0, then e0 is selected. Figure 3 shows the relationship between h and x by two posterior probability distributions P(x|h0) and P(x|h1) and the magnitudes of four conditional probabilities (with four colors). In Figure 2 , h 1 denotes an infected specimen (or person), h 0 denotes an uninfected specimen, e 1 is positive, and e 0 is negative. We can treat e 1 as a prediction "h is infected" and e 0 as a prediction "h is uninfected". In other word, h is a true label or true statement, and e is a prediction or selected label. The x is the observed feature of h; E 1 and E 2 are two sub-sets of the domain of x. If x is in E 1 , then e 1 is selected; if x is in E 0 , then e 0 is selected. Figure 3 shows the relationship between h and x by two posterior probability distributions P(x|h 0 ) and P(x|h 1 ) and the magnitudes of four conditional probabilities (with four colors).

positive, and e0 is negative. We can treat e1 as a prediction "h is infected" and e0 as a prediction "h is uninfected". In other word, h is a true label or true statement, and e is a prediction or selected label. The x is the observed feature of h; E1 and E2 are two sub-sets of the domain of x. If x is in E1, then e1 is selected; if x is in E0, then e0 is selected. Figure 3 shows the relationship between h and x by two posterior probability distributions P(x|h0) and P(x|h1) and the magnitudes of four conditional probabilities (with four colors). In the medical test, P(e1|h1) is called sensitivity [18] , and P(h0|e0) is called specificity. They ascertain a Shannon channel, which is denoted by P(e|h), as shown in Table 2 . In the medical test, P(e 1 |h 1 ) is called sensitivity [18] , and P(h 0 |e 0 ) is called specificity. They ascertain a Shannon channel, which is denoted by P(e|h), as shown in Table 2 . Table 2 . Sensitivity and specificity ascertain a Shannon's Channel P(e|h).

We regard predicate e 1 (h) as the combination of believable and unbelievable parts (see Figure 4 ). The truth function of the believable part is T(E 1 |h)∈{0,1}. The unbelievable part is a tautology, whose truth function is always 1. Then we have the truth functions of predicates e 1 (h) and e 0 (h):

where model parameter b 1 ' is the proportion of the unbelievable part, and also the truth value for the counter-instance h 0 .

Entropy 2020, 22, x 9 of 26 

Positive e1 Infected h1 P(e0|h1) = 1−sensitivity P(e1|h1) = sensitivity Uninfected h0 P(e0|h0) = specificity P(e1|h0) = 1−specificity

We regard predicate e1(h) as the combination of believable and unbelievable parts (see Figure 4 ). The truth function of the believable part is T(E1|h)∈{0,1}. The unbelievable part is a tautology, whose truth function is always 1. Then we have the truth functions of predicates e1(h) and e0(h):

where model parameter b1' is the proportion of the unbelievable part, and also the truth value for the counter-instance h0. The four truth values form a semantic channel, as shown in Table 3 . The four truth values form a semantic channel, as shown in Table 3 . 

For medical tests, the logical probability of e 1 is

The likelihood function is

P(h|θ j ) is also the predicted probability of h according to T(θ e1 |h) or the semantic meaning of e 1 .

To measure subjective or semantic information, we need subjective probability or logical probability [17] . To measure confirmation, we need statistical probability.

According to the semantic information G theory [17] , the (amount of) semantic information conveyed by y j about x i is defined with the log-normalized-likelihood:

where T(θ j |x i ) is the truth value of proposition y j (x i ) and T(θ j ) is the logical probability of y j . If T(θ j |x) is always 1, then this semantic information formula becomes Carnap and Bar-Hillel's semantic information formula [30] .

In semantic communication, we often see hypotheses or predictions, such as "The temperature is about 10 •C", "The time is about seven o'clock", or "The stock index will go up about 10% next month". Each one of them may be represented by y j = "x is about x j ." We can express the truth functions of y j by

Introducing Equation (16) into Equation (15), we have

by which we can explain that this semantic information is equal to the Carnap-Bar-Hillel's semantic information minus the squared relative deviation. This formula is illustrated in Figure 5 .

Introducing Equation (16) into Equation (15), we have

by which we can explain that this semantic information is equal to the Carnap-Bar-Hillel's semantic information minus the squared relative deviation. This formula is illustrated in Figure 5 . Figure 5 indicates that the smaller the logical probability is, the more information there is; and the larger the deviation is, the less information there is. Thus, a wrong hypothesis will convey negative information. These conclusions accord with Popper's thought (see [2] , p. 294).

To average I(xi; θj), we have generalized Kullback-Leibler information or relative cross-entropy:

where P(x|yj) is the sampling distribution, and P(x|θj) is the likelihood function. If P(x|θj) is equal to P(x|yj), then I(X; θj) reaches its maximum and becomes the relative entropy or the Kullback-Leibler divergence.

Consider medical tests, the semantic information conveyed by e1 about h becomes

The average semantic information is:

where P(hi|e1) is the conditional probability from a sample. We now consider the relationship between the likelihood and the average semantic information. Figure 5 indicates that the smaller the logical probability is, the more information there is; and the larger the deviation is, the less information there is. Thus, a wrong hypothesis will convey negative information. These conclusions accord with Popper's thought (see [2] , p. 294).

To average I(x i ; θ j ), we have generalized Kullback-Leibler information or relative cross-entropy:

where P(x|y j ) is the sampling distribution, and P(x|θ j ) is the likelihood function. If P(x|θ j ) is equal to P(x|y j ), then I(X; θ j ) reaches its maximum and becomes the relative entropy or the Kullback-Leibler divergence. Consider medical tests, the semantic information conveyed by e 1 about h becomes

The average semantic information is:

where P(h i |e 1 ) is the conditional probability from a sample. We now consider the relationship between the likelihood and the average semantic information.

Let D be a sample {(h(t), e(t))|t = 1 to N; h(t)∈{h 0 , h 1 }; e(t)∈{e 0 , e 1 }}, which includes two sub-samples or conditional samples H 0 with label e 0 and H 1 with label e 1 . When N data points in D come from Independent and Identically Distributed random variables, we have the log-likelihood (21) where N 1i is the number of example (h i , e 1 ) in D; N 1 is the size of H 1 . H(h|θ e1 ) is the cross-entropy. If P(h|θ e1 ) = P(h|e 1 ), then the cross-entropy becomes the Shannon entropy. Meanwhile, the cross-entropy reaches its minimum, and the likelihood reaches its maximum.

Comparing the above two equations, we have (22) which indicates the relationship between the average semantic information and the likelihood. Since the second term on the right side is constant, the maximum likelihood criterion is equivalent to the maximum average semantic information criterion. It is easy to find that a positive example (e 1 , h 1 ) increases the average log-likelihood L(θ e1 )/N 1 ; a counterexample (e 1 , h 0 ) decreases it; examples (e 0 , h 0 ) and (e 0 , h 1 ) with e 0 are irrelevant to it. The Nicod criterion about confirmation is that a positive example (e 1 , h 1 ) supports rule e 1 →h 1 ; a counterexample (e 1 , h 0 ) undermines e 1 →h 1 . No reference exactly indicates if Nicod affirmed that (e 0 , h 1 ) and (e 0 , h 1 ) are irrelevant to e 1 →h 1 . If Nicod did not affirm, we can add this affirmation to the criterion, then call the corresponding criterion the Nicod-Fisher criterion, since Fisher proposed the maximum likelihood estimation. From now on, we use the Nicod-Fisher criterion to replace the Nicod criterion.

Researchers have noted the similarity between most confirmation measures and information measures. One explanation [31] is that information is the average of confirmatory impact. However, this paper gives a different explanation as follows.

There are three tasks in statistical learning: label learning, classification, and reliability analysis. There are similar tasks in inductive reasoning:

Induction. It is similar to label learning. For uncertain hypotheses, label learning is to train a likelihood function P(x|θ j ) or a truth function T(θ j |x) by a sampling distribution [17] . The Logistic function often used for binary classifications may be treated as a truth function.

Hypothesis selection. It is like classification according to different criteria.

Confirmation. It is similar to reliability analysis. The classical methods are to provide likelihood ratios and correct rates (including false rates, as those in Table 8 ).

Classification and reliability analysis are two different tasks. Similarly, hypothesis selection and confirmation are two different tasks.

In statistical learning, classification depends on the criterion. The often-used criteria are the maximum posterior probability criterion (which is equivalent to the maximum correctness criterion) and the maximum likelihood criterion (which is equivalent to the maximum semantic information criterion [17] ). The classifier for binary classifications is

After the above classification, we may use information criterion to assess how well e j is used to predict h j :

where I* means optimized semantic information. With information amounts I(h i ; θ ej ) (i, j = 0,1), we can optimize the classifier [17] : The new classifier will provide the new Shannon's channel P(e|h). The maximum mutual information classification can be achieved by repeating Equations (23) and (25) [17, 32] .

With the above classifiers, we can make prediction e j = "x is h j " according to x. To tell information receivers how reliable the rule e j →h j is, we need the likelihood ratio LR to indicate how good the channel is or need the correct rate to indicate how good the probability prediction is. Confirmation is similar. We need to provide a confirmation measure similar to LR, such as F, and a confirmation measure similar to the correct rate. The difference is that the confirmation measures should change between −1 and 1.

According to above analyses, it is easy to find that confirmation measures D, N, R, and C are more like information measures for assessing and selecting predictions instead of confirming rules. Z is their normalization [8] ; it seems between an information measure and a confirmation measure. However, confirming rules is different from measuring predictions' information; it needs the proportions of positive examples and counterexamples.

We use the maximum semantic information criterion, which is consistent with the maximum likelihood criterion, to derive the channel confirmation measure. According to Equations (13) and (18), the average semantic information conveyed by e 1 about h is

Letting dI(h;θ e1 )/db 1 ' = 0, we can obtain the optimized b 1 ':

where P(h 1 |e 1 )/ P(h 1 ) ≥ P(h 0 |e 1 )/ P(h 0 ). The b'* can be called a disconfirmation measure. Letting both the numerator and the denominator multiply by P(e 1 ), the above formula becomes: b 1 '* = P(e 1 |h 0 )/ P(e 1 |h 1 ) = (1 − specificity)/sensibility = 1/LR + .

According to the semantic information G theory [17] , when a truth function is proportional to the corresponding transition probability function, e.g., T*(θ e1 |h)∝P(e 1 |h), the average semantic information reaches its maximum. Using T*(θ e1 |h)∝P(e 1 |h), we can directly obtain

and Equation (28) . We call 

Combining the above two formulas, we obtain b * 1 = b * (e 1 → h 1 ) = P(e 1 |h 1 ) − P(e 1 |h 0 ) max[P(e 1 |h 1 ), P(e 1 |h 0 )] = LR + − 1 max(LR + , 1)

.

the b 1 * possesses HS or Consequent Symmetry.

In the same way, we obtain

Using Consequent Symmetry, we can obtain b*(e 1 →h 0 ) = −b*(e 1 →h 1 ) and b*(e 0 →h 1 ) = −b*(e 0 →h 0 ). Using measure b* or F, we can answer the question: if the result of NAT is negative and the result of CT is positive, which should we believe? Section 4.2 will provide the answer that is consistent with the improved diagnosis of COVID-19 in Wuhan.

Compared with F, b* is better for probability predictions. For example, from b 1 * > 0 and P(h), we obtain

This formula is much simpler than the classical Bayes' formula (see Equation (5)). If b 1 * = 0, then P(h 1 |θ e1 ) = P(h 1 ). If b 1 * < 0, then we can make use of HS or Consequent Symmetry to obtain b 10 * = b 1 *(e 1 →h 0 ) = |b 1 *(e 1 →h 1 )| = |b 1 *|. Then we have

We can also obtain b 1 * = 2F 1 /(1 + F 1 ) from F 1 = F(e 1 →h 1 ) for the probability prediction P(h 1 |θ e1 ), but the calculation of probability predictions with F 1 is a little complicated.

So far, it is still problematic to use b*, F, or another measure to handle the Raven Paradox. For example, as shown in Table 13 , the increment of F(e 1 →h 1 ) caused by ∆d = 1 is 0.348 − 0.333, whereas the increment caused by ∆a = 1 is 0.340 − 0.333. The former is greater than the latter, which means that a piece of white chalk can support "Ravens are black" better than a black raven. Hence measure F does not accord with the Nicod-Fisher criterion. Measures b* and Z do not either.

Why does not measure b* and F accord with the Nicod-Fisher criterion? The reason is that the likelihood L(θ e1 ) is related to prior probability P(h), whereas b* and F are irrelevant to P(h).

Statistics not only uses the likelihood ratio to indicate how reliable a testing means (as a channel) is but also uses the correct rate to indicate how reliable a probability prediction is. Measure F and b* like LR cannot indicate the quality of a probability prediction. Most other measures have similar problems.

For example, we assume that an NAT for COVID- 19 [33] has sensitivity P(e 1 |h 1 ) = 0.5 and specificity P(e 0 |h 0 ) = 0.95. We can calculate b 1 '* = 0.1 and b 1 * = 0.9. When the prior probability P(h 1 ) of the infection changes, predicted probability P(h 1 |θ e1 ) (see Equation (35)) changes with the prior probability, as shown in Table 4 . We can obtain the same results using the classical Bayes' formula (see Equation (5)). Table 4 . Predictive probability P(h 1 |θ e1 ) changes with prior probability P(h 1 ) as b 1 * = 0.9. Data in Table 4 show that measure b* cannot indicate the quality of probability predictions. Therefore, we need to use P(h) to construct a confirmation measure that can reflect the correct rate.

We now treat probability prediction P(h|θ e1 ) as the combination of a believable part with proportion c 1 and an unbelievable part with proportion c 1 ', as shown in Figure 6 . We call c 1 the degree of belief of the rule e 1 →h 1 as a prediction. Table 4 show that measure b* cannot indicate the quality of probability predictions. Therefore, we need to use P(h) to construct a confirmation measure that can reflect the correct rate.

We now treat probability prediction P(h|θe1) as the combination of a believable part with proportion c1 and an unbelievable part with proportion c1', as shown in Figure 6 . We call c1 the degree of belief of the rule e1→h1 as a prediction. When the prediction accords with the fact, e.g., P(h|θe1) = P(h|e1), c1 becomes c1*. The degree of disconfirmation for predictions is

Further, we have the prediction confirmation measure 

where CR1=P(h1|θe1) = P(h1|e1) is the correct rate of rule e1→h1. This correct rate means that the probability of h1 we predict as x∈E1 is CR1. Letting both the numerator and denominator of Equation (38) multiply by P(e1), we obtain 

The sizes of four areas covered by two curves in Figure 7 may represent a, b , c, and d. When the prediction accords with the fact, e.g., P(h|θ e1 ) = P(h|e 1 ), c 1 becomes c 1 *. The degree of disconfirmation for predictions is c'*(e 1 →h 1 ) = P(h 0 |e 1 )/P(h 1 |e 1 ), if P(h 0 |e 1 ) ≤ P(h 1 |e 1 ); c'*(e 1 →h 1 ) = P(h 1 |e 1 )/P(h 0 |e 1 ), if P(h 1 |e 1 ) ≤ P(h 0 |e 1 ).

Further, we have the prediction confirmation measure 

where CR 1 = P(h 1 |θ e1 ) = P(h 1 |e 1 ) is the correct rate of rule e 1 →h 1 . This correct rate means that the probability of h 1 we predict as x∈E 1 is CR 1 . Letting both the numerator and denominator of Equation (38) multiply by P(e 1 ), we obtain c * 1 = c * (e 1 → h 1 ) = P(h 1 , e 1 ) − P(h 0 , e 1 ) max(P(h 1 , e 1 ), P(h 0 , e 1 )) = a − c max(a, c)

.

The sizes of four areas covered by two curves in Figure 7 may represent a, b, c, and d. In like manner, we obtain In like manner, we obtain

.

Making use of Consequent Symmetry, we can obtain c*(e 1 →h 0 ) = −c*(e 1 →h 1 ) and c*(e 0 →h 1 ) = −c*(e 0 →h 0 ).

In Figure 7 , the sizes of the two areas covered by two curves are P(h 0 ) and P(h 1 ), which are different. If P(h 0 ) = P(h 1 ) = 0.5, then prediction confirmation measure c* is equal to channel confirmation measure b*.

Using measure c*, we can directly assess the quality of the probability predictions. For P(h 1 |θ e1 ) = 0.77 in Table 4 , we have c 1 * = (0.77 − 0.23)/0.77 = 0.701. We can also use c* for probability predictions. When c 1 * > 0, according to Equation (39), we have the correct rate of rule e 1 →h 1 :

For example, if c 1 * = 0.701, then CR 1 = 1/(2−0.701) = 0.77. If c*(e 1 →h 1 ) = 0, then CR 1 = 0.5. If c*(e 1 →h 1 ) < 0, we may make use of HS to have c 10 * = c*(e 1 →h 0 ) = |c* 1 |, and then make probability prediction:

We may define another prediction confirmation measure by replacing operation max( ) with +:

The c F * is also convenient for probability predictions when P(h) is certain. There is

However, when P(h) is variable, we should still use b* with P(h) for probability predictions. It is easy to prove that c*(e 1 →h 1 ) and c F *(e 1 →h 1 ) possess all the above-mentioned desirable properties.

Greco et al. [19] Similarly, this paper divides confirmation measures into We now consider c*(h 1 →e 1 ). The positive examples' proportion and the counterexamples' proportion can be found in the upside of Figure 7 . Then we have c * (h 1 → e 1 ) = P(e 1 |h 1 ) − P(e 0 |h 1 ) max(P(e 1 |h 1 ), P(e 0 |h 1 )) max(a, b) .

The correct rate reflected by c*(h 1 →e 1 ) is sensitivity or true positive rate P(h 1 |e 1 ). The correct rate reflected by c*(h 0 →e 0 ) is specificity or true negative rate P(h 0 |e 0 ).

Consider the converse channel confirmation measure b*(h 1 →e 1 ). Now the source is P(e) instead of P(h). We may swap e 1 with h 1 in b* (e 1 →h 1 ) or swap a with d and b with c in f (a, b, c, d) 

where ∨ is the operator for the maximum of two numbers and is used to replace max( ). There are also four types of converse channel/prediction confirmation formulas with a, b, c, and d (see Table 7 ). Due to Consequent Symmetry, there are the eight types of converse channel/prediction confirmation formulas altogether. Table 5 shows the positive examples' and counterexamples' proportions needed by measures b* and c*. Table 6 provides four types of confirmation formulas with a, b, c, and d for rule e→h, where function max( ) is replaced with the operator ∨. 

These confirmation measures are related to the misreporting rates of the rule e→h. For example, smaller b*(e 1 →h 1 ) or c*(e 1 →h 1 ) means that the test shows positive for more uninfected people. Table 7 includes four types of confirmation measures for h→e. These confirmation measures are related to the underreporting rates of the rule h→e. For example, smaller b*(h 1 →e 1 ) or c*(h 1 →e 1 ) means that the test shows negative for more infected people. Underreports are more serious problems.

Each of the eight types of confirmation measures in Tables 6 and 7 has its consequent-symmetrical form. Therefore, there are 16 types of function f (a, b, c, d) altogether for confirmation.

In a prediction and converse prediction confirmation formula, the conditions of two conditional probabilities are the same; they are the antecedents of rules so that a confirmation measure c* only depends on the two numbers of positive examples and counterexamples. Therefore, these measures accord with the Nicod-Fisher criterion.

If we change "∨" into "+" in f (a, b, c, d), then measure b* becomes measure b F * = F, and measure c* becomes measure c F *. For example, c F *(e 1 →h 1 ) = (a − c)/(a + c).

(47)

Measure b* is like measure F. The two measures changes with likelihood ratio LR, as shown in Figure 8 . These confirmation measures are related to the underreporting rates of the rule h→e. For example, smaller b*(h1→e1) or c*(h1→e1) means that the test shows negative for more infected people. Underreports are more serious problems.

Each of the eight types of confirmation measures in Tables 6 and 7 has its consequentsymmetrical form. Therefore, there are 16 types of function f(a, b, c, d) altogether for confirmation.

In a prediction and converse prediction confirmation formula, the conditions of two conditional probabilities are the same; they are the antecedents of rules so that a confirmation measure c* only depends on the two numbers of positive examples and counterexamples. Therefore, these measures accord with the Nicod-Fisher criterion.

If we change "˅" into "+" in f(a, b, c, d) , then measure b* becomes measure bF* = F, and measure c* becomes measure cF*. For example, cF*(e1→h1) = (a − c)/(a + c).

(47)

Measure b* is like measure F. The two measures changes with likelihood ratio LR, as shown in Figure 8 . Measure F has four confirmation formulas for different antecedents and consequents [8] , which are related to measure bF* as follows: Measure F has four confirmation formulas for different antecedents and consequents [8] , which are related to measure b F * as follows:

F(e 1 → h 1 ) = P(e 1 |h 1 ) − P(e 1 |h 0 ) P(e 1 |h 1 ) + P(e 1 |h 0 ) = ad − bc ad + bc + 2ac

F(e 0 → h 0 ) = P(e 0 |h 0 ) − P(e 0 |h 1 ) P(e 0 |h 0 ) + P(e 0 |h 1 )

F is equivalent to b F *. Measure b* has all the above-mentioned desirable properties as well as measure F. The differences are that measure b* has a greater absolute value than measure F; measure b* can be used for probability predictions more conveniently (see Equation (35) ).

Channel confirmation measures are related to likelihood ratios, whereas Prediction Confirmation Measures (PCMs) including converse PCMs are related to correct rates and false rates in the medical test.

To help us understand the significances of PCMs in the medical test, Table 8 shows that each PCM is related to which correct rate and which false rate. Table 8 . PCMs (Prediction Confirmation Measures) are related to different correct rates and false rates in the medical test [18] . The false rates related to PCMs are the misreporting rates of the rule e→h, whereas the false rates related to converse PCMs are the underreporting rates of the rule h→e. For example, False Discovery Rate P(h 0 |e 1 ) is also the misreporting rate of rule e 1 →h 1 ; False Negative Rate P(e 0 |h 1 ) is also the underreporting rate of rule h 1 →e 1 .

In China's war against COVID-19, people often ask the question: since the true positive rate, e.g., sensitivity, of NAT is so low (less than 0.5), why do we still believe it? Medical experts explain that though NAT has low sensitivity, it has high specificity, and hence its positive is very believable.

We use the following two extreme examples (see Figure 9 ) to explain why a test with very low sensitivity can provide more believable positive than another test with very high sensitivity, and whether popular confirmation measures support this conclusion.

F is equivalent to bF*. Measure b* has all the above-mentioned desirable properties as well as measure F. The differences are that measure b* has a greater absolute value than measure F; measure b* can be used for probability predictions more conveniently (see Equation (35)).

Channel confirmation measures are related to likelihood ratios, whereas Prediction Confirmation Measures (PCMs) including converse PCMs are related to correct rates and false rates in the medical test.

To help us understand the significances of PCMs in the medical test, Table 8 shows that each PCM is related to which correct rate and which false rate. Table 8 . PCMs (Prediction Confirmation Measures) are related to different correct rates and false rates in the medical test [18] . The false rates related to PCMs are the misreporting rates of the rule e→h, whereas the false rates related to converse PCMs are the underreporting rates of the rule h→e. For example, False Discovery Rate P(h0|e1) is also the misreporting rate of rule e1→h1; False Negative Rate P(e0|h1) is also the underreporting rate of rule h1→e1.

In China's war against COVID-19, people often ask the question: since the true positive rate, e.g., sensitivity, of NAT is so low (less than 0.5), why do we still believe it? Medical experts explain that though NAT has low sensitivity, it has high specificity, and hence its positive is very believable.

We use the following two extreme examples (see Figure 9 ) to explain why a test with very low sensitivity can provide more believable positive than another test with very high sensitivity, and whether popular confirmation measures support this conclusion. In Example 1, b*(e 1 →h 1 ) = (0.1 − 0.01)/0.1 = 0.9, which is very large. In Example 2, b*(e 1 →h 1 ) = (1 − 0.9)/1 = 0.1, which is very small. The two examples indicate that fewer counterexamples' existence is more important to b* than more positive examples' existence. Measures F, c*, and c F * also possess this characteristic, which is compatible with the Logicality requirement [15] . However, most confirmation measures do not possess this characteristic.

We supposed P(h 1 ) = 0.2 and n = 1000 and then calculated the degrees of confirmation with different confirmation measures for the above two examples, as shown in Table 9 , where the base of log for R and L is 2. Table 9 also includes Example 3 (e.g., Ex. 3), in which P(h 1 ) is 0.01. Example 3 reveals the difference between Z and b* (or F). Data for Examples 1 and 2 show that L, F and b* give Example 1 a much higher rating than Example 2, whereas M, C, and N give Example 2 a higher rating than Example 1 (see red numbers). The excel file for Table 9, Tables 12 and 13 Although measure L (log-likelihood ratio) is compatible with F and b*, its values, such as 3.32 and 0.152, are not intuitionistic as well as the values of F or b*, which are normalizing.

The COVID-19 outbreak in Wuhan of China in 2019 and 2020 has infected many people. In the early stage, only NAT was used to diagnose the infection. Later, many doctors found that NAT often failed to report the viral infection. Because this test has low sensitivity (which may be less than 0.5) and high specificity, we can confirm the infection when NAT is positive, but it is not good for confirming the non-infection when NAT is negative. That means that NAT-negative is not believable. To reduce the underreports of the infection, CT gained more attention because CT had higher sensitivity than NAT.

When both NAT and CT were used in Wuhan, doctors improved the diagnosis, as shown in Figure 10 and Table 11 . If we diagnose the infection according to confirmation measure b*, will the diagnosis be the same as the improved diagnosis? Besides NAT and CT, patients' symptoms, such as fever and cough, were also used for the diagnosis. To simplify the problem, we assumed that all patients had the same symptoms so that we could diagnose only according to the results of NAT and CT. Figure 10 was drawn according to Table 10 . Figure 10 also shows sensitivities and specificities. For example, the half of the red circle on the right side indicates that the sensitivity of NAT is 0.5. Reference [34] introduces the sensitivity and specificity of CT that the authors achieved. According to [33, 34] and other reports on the internet, the author of this paper estimated the sensitivities and specificities, as shown in Table 10 . Figure 10 was drawn according to Table 10 . Figure 10 also shows sensitivities and specificities. For example, the half of the red circle on the right side indicates that the sensitivity of NAT is 0.5.

We Table 11 ). Table 11 . Improved diagnosis (for final positive or negative) according to NAT and CT. Table 11 ) because b*(CT+) = 0.69 is higher than b*(NAT−) = 0.47. This diagnosis is the same as the improved diagnosis in Wuhan.

Assuming the prior probability of the infection P(e 1 ) = 0.25, the author calculated the various degrees of confirmation with different confirmation measures for the same sensitivities and specificities, as shown in Table 12 . If there is a "No" under a measure, this measure will result in a different diagnosis from the improved diagnosis. The red numbers mean that c(CT+) < c(NAT−) or c(NAT+)<c(CT−). Measures D, M, and F, as well as b*, are consistent with the improved diagnosis. If we change P(h 1 ) from 0.1 to 0.6, we will find that measure M is also not consistent with the improved diagnosis. If we believe a test-positive or test-negative when its degree of confirmation is greater than 0.2, then D is also undesirable, and only measures F and b* satisfy our requirements.

The above sensitivities and specificities in Table 10 were not specially selected. When NAT-sensitivity changed between 0.3 and 0.7, or CT-sensitivity changed between 0.6 and 0.9, it was the same that only measures D, F, and b* were consistent with the improved diagnosis.

Measure c* is also not suitable for the diagnosis because it reflects correctness and cannot reduce the underreports of the infection. Yet, the underreports of the infection will cause greater loss than the misreports of the infection.

The following example is used to check if we can use popular confirmation measures to explain that a black raven can confirm "Ravens are black" more strongly than a piece of white chalk. Table 13 shows the degrees of confirmation calculated with nine different measures. First, we supposed a = d = 20 and b = c = 10 to calculate the nine degrees of confirmation. Next, we only replaced a with a + 1 to calculate the nine degrees. Last, we only replaced d with d + 1 to calculate them. The results must have exceeded many researchers' expectations. Table 13 indicates that all measures except c* (see blue numbers) cannot ensure that ∆a = 1 increases f (a, b, c, d) more than ∆d = 1. If we change b and c between 1 and 19, all measures except c*, S, and N cannot ensure ∆f /∆a≥∆f /∆d. When b>c, measures S and N also cannot ensure ∆f /∆a≥∆f /∆d. The cause for measures D and M is that ∆d = 1 decreases P(h 1 ) and P(e 1 ) more than increasing P((h 1 |e 1 ) and P(e 1 |h 1 ). The causes for other measures except c* are similar.

To clarify the Raven Paradox, some researchers including Hemple [3] affirm the Equivalence Condition and deny the Nicod-Fisher criterion; some researchers, such as Scheffler and Goodman [35] , affirm the Nicod-Fisher criterion and deny the Equivalence Condition. There are also some researchers who do not fully affirm the Equivalence Condition or the Nicod-Fisher criterion.

First, we consider measure F to see if we can use it to eliminate the Raven Paradox. The difference between F(e 1 →h 1 ) and F(h 0 →e 0 ) is that their counterexamples are the same, yet, their positive examples are different. When d increases to d+∆d, F(e 1 →h 1 ) and F(h 0 →e 0 ) unequally increase. Therefore,

• though measure F denies the Equivalence Condition, it still affirms that ∆d affects both F(e 1 →h 1 ) and F(h 0 →e 0 ); • measure F does not accord the Nicod-Fisher criterion.

Measure b* is like F. The conclusion is that measures F and b* cannot eliminate our confusion about the Raven Paradox.

After inspecting many different confirmation measures from the perspective of the rough set theory, Greco et al. [15] conclude that Nicod criterion (e.g., the Nicod-Fisher criterion) is right, but it is difficult to find a suitable measure that accords with the Nicod criterion. However, many researchers still think that the Nicod criterion is incorrect; it accords with our intuition only because a confirmation measure c(e 1 →h 1 ) can evidently increase with a and slightly increase with d. After comparing different confirmation measures, Fitelson and Hawthorne [28] believe that the likelihood ratio may be used to explain that a black raven can confirm "Ravens are black" more strongly than a non-black non-raven thing.

Unfortunately, Table 13 shows that the increments of all measures except c* caused by ∆d = 1 are greater than or equal to those caused by ∆a = 1. That means that these measures support the conclusion that a piece of white chalk can confirm "Ravens are black" more strongly than (or as well as) a black raven. Therefore, these measures cannot be used to clarify the Raven Paradox.

However, measure c* is different. Since c*(e 1 →h 1 ) = (a − c)/(a∨c) and c*(h 0 →e 0 ) = (d − c)/(d∨c), the Equivalence Condition does not hold, and measure c* accords with the Nicod-Fisher criterion very well. Hence, the Raven Paradox does not exist anymore according to measure c*.

In Table 13 , if the initial numbers are a = d = 200 and b = c = 100, the increments of all measures caused by ∆a = 1 will be much less than those in Table 13 . For example, D(e 1 →h 1 ) increases from 0.1667 to 0.1669; c*( e 1 →h 1 ) increase from 0.5 to 0.5025. The increments are about 1/10 of those in Table 13 . Therefore, the increment of the degree of confirmation brought about by a new example is closely related to the number of old examples or our prior knowledge.

The absolute confirmation requires that Otherwise, the degree of confirmation calculated is unreliable. We need to replace the degree of confirmation with the degree interval of confirmation, such as [0.5, 1] instead of 1.

Elles and Fitelson defined HS by c(e, h) = −c(e, −h). Actually, it means c(x, y) = −c(x, −y) for any x and y. Similarly, ES is Antecedent Symmetry, which means c(x, y) = −c(−x, y) for any x and y. Since e and h are not the antecedent and the consequent of a major premise from their point of view, they cannot say Antecedent Symmetry and Consequent Symmetry. Consider that c(e, h) becomes c(h, e). According the literal meaning of HS (Hypothesis Symmetry), one may misunderstand HS as shown in Table 14 . For example, the misunderstanding happens in [8, 19] , where the authors call c(h, e) = −c(h, −e) ES. However, it is in fact HS or Consequent Symmetry. In [19] , the authors think that F(H, E) (where the right side is evidence) should have HS: F(H, E) = −F(−H, E), whereas F(E, H) should have ES: F(E, H)= −F(−E, H). However, this "ES" does not accord with the original meaning of ES in [14] . Both F(H, E) and F(E, H) possess HS instead of ES. The more serious thing because of the misunderstanding is that [19] concludes that ES and EHS (e.g., c(H, E) = c(−H, −E)), as well as HS, are desirable, and hence, measures S, N, and C are particularly valuable.

The author of this paper approves the conclusion of Elles and Fitelson that only HS (e.g., Consequent Symmetry) is desirable. Therefore, it is necessary to make clear that e and h in c(e, h) are the antecedent and the consequent of the rule e→h. To avoid the misunderstanding, we had better replace c(e, h) with c(e→h) and use "Antecedent Symmetry" and "Consequent Symmetry" instead of "Evidence Symmetry" and "Hypothesis Symmetry".

Measure D proposed by Carnap is often referred to as the standard Bayesian confirmation measure. The above analyses, however, show that D is only suitable as a measure for selecting hypotheses instead of a measure for confirming major premises. Carnap opened the direction of Bayesian confirmation, but his explanation about D easily lets us confuse a major premise's evidence (a sample) and a consequent's evidence (a minor premise).

Greco et al. [19] call confirmation measures with conditional probability p(h|e) as Bayesian confirmation measures, those with P(e|h) as Likelihoodist confirmation measures, and those for h→e as converse Bayesian/Likelihoodist confirmation measures. This division is very enlightening. However, the division of confirmation measures in this paper does not depend on symbols, but on methods. The optimized proportion of the believable part in the truth function is the channel confirmation measure b*, which is similar to the likelihood ratio, reflecting how good the channel is. The optimized proportion of the believable part in the likelihood function is the prediction confirmation measure c*, which is similar to the correct rate, reflecting how good the probability prediction is. The b* may be called the logical Bayesian confirmation measure because it is derived with Logical Bayesian Inference [17] , although P(e|h) may be used for b*. The c* may be regarded as the likelihoodist confirmation measure, although P(h|e) may be used for c*.

This paper also provides converse channel/prediction confirmation measures for rule h→e. Confirmation measures b*(e→h) and c*(e→h) are related to misreporting rates, whereas converse confirmation measures b*(h→e) and c*(h→e) are related to underreporting rates.

The Certainty Factor, which is denoted by CF, was proposed by Shortliffe and Buchanan for a backward chaining expert system [7] . It indicates how true an uncertain inference h→e is. The relationship between measures CF and Z is CF(h→e) = Z(e→h) [36] .

As pointed out by Heckerman and Shortliffe [36] , the Certainty Factor method has been widely adopted in rule-based expert systems, it also has its theoretical and practical limitations. The main reason is that the Certainty Factor method is not compatible with statistical probability theory. They believe that the belief-network representation can overcome many of the limitations of the Certainty Factor model; however, the Certainty Factor model is simpler than the belief-network representation; it is possible to combine both to develop simpler probabilistic expert systems.

Measure b*(e 1 →h 1 ) is related to the believable part of the truth function of predicate e 1 (h). It is similar to CF(h 1 →e 1 ). The differences are that b*(e 1 →h 1 ) is independent of P(h) whereas CF(h 1 →e 1 ) is related to P(h); b*(e 1 →h 1 ) is compatible with statistical probability theory whereas CF(h 1 →e 1 ) is not.

Is it possible to use measure b* or c* as the Certainty Factor to simplify belief-networks or probabilistic expert systems? This issue is worth exploring.

Popper affirms that a counterexample can falsify a universal hypothesis or a major premise. However, for an uncertain major premise, how do counterexamples affect its degree of confirmation? Confirmation measures F, b*, and c* can reflect the importance of counterexamples. In Example 1 of Table 9 , the proportion of positive examples is small, and the proportion of counterexamples is smaller still, so that the degree of confirmation is large. This example shows that to improve the degree of confirmation, it is not necessary to increase the conditional probability P(e 1 |h 1 ) (for b*) or P(h 1 |e 1 ) (for c*). In Example 2 of Table 9 , although the proportion of positive examples is large, the proportion of counterexamples is not small so that the degree of confirmation is very small. This example shows that to raise degree of confirmation, it is not sufficient to increase the posterior probability. It is necessary and sufficient to decrease the relative proportion of counterexamples.

Popper affirms that a counterexample can falsify a universal hypothesis, which can be explained by that for the falsification of a strict universal hypothesis, it is important to have no counterexample. Now for the confirmation of a universal hypothesis that is not strict or uncertain, we can explain that it is important to have fewer counterexamples. Therefore, confirmation measures F, b*, and c* are compatible with Popper's falsification thought.

Scheffler and Goodman [35] proposed selective confirmation based on Popper's falsification thought. They believe that black ravens support "Ravens are black" because black ravens undermine "Ravens are not black". Their reason why non-black ravens support "Ravens are not black" is that non-black ravens undermine the opposite hypothesis "Ravens are black". Their explanation is very meaningful. However, they did not provide the corresponding confirmation measure. Measure c*(e 1 →h 1 ) is what they need.

Using the semantic information and statistical learning methods and taking the medical test as an example, this paper has derived two confirmation measures b*(e →h) and c*(e →h). The measure b* is similar to the measure F proposed by Kemeny and Oppenheim; it can reflect the channel characteristics of the medical test like the likelihood ratio, indicating how good a testing means is. Measure c*(e→h) is similar to the correct rate but varies between −1 and 1. Both b* and c* can be used for probability predictions. The b* is suitable for predicting the probability of disease when the prior probability of disease is changed. Measures b* and c* possess symmetry/asymmetry proposed by Elles and Fitelson [14] , monotonicity proposed by Greco et al. [16] , normalizing property (between −1 and 1) suggested by many researchers. The new confirmation measures support absolute confirmation instead of incremental confirmation. This paper has shown that most popular confirmation measures cannot help us diagnose the infection of COVID-19, but measures F and b* and the like, which are the functions of likelihood ratio, can. It has also proved that popular confirmation measures did not support the conclusion that a black raven could confirm more strongly than a non-black non-raven thing, such as a piece of chalk. It has shown that measure c* could definitely deny the Equivalence Condition and exactly reflect Nicod-Fisher Criterion, and hence, could be used to eliminate the Raven Paradox. The new confirmation measures b* and c* as well as F indicates that fewer counterexamples' existence is more important than more positive examples' existence; therefore, measures F, b*, and c* are compatible with Popper's falsification thought.

When the sample is small, the degree of confirmation calculated by any confirmation measure is not reliable, and hence, the degree of confirmation should be replaced with the degree interval of confirmation. We need further studies combining the theory of hypothesis testing. It is also worth conducting further studies ensuring that the new confirmation measures are used as the Certainty Factors for belief-networks.

Logical Foundations of Probability

Conjectures and Refutations

Studies in the Logic of Confirmation

Transl. The logical problem of induction

The Logic of Induction

Probability and Evidence

A model of inexact reasoning in medicine

On Bayesian measures of evidential support: Theoretical and empirical issues

Measuring confirmation

Philosophical Explanations

The best explicatum for weight of evidence

Degrees of factual support

Studies in Bayesian Confirmation Theory

Symmetries and asymmetries in evidential support

Properties of rule interestingness measures and alternative approaches to normalization of measures

Can Bayesian confirmation measures be useful for rough set decision rules?

Semantic information G theory and Logical Bayesian Inference for machine learning

Wikipedia the Free Encyclopedia

Szczech, I. Measures of rule interestingness in various perspectives of confirmation

A generalization of Shannon's information theory

A mathematical theory of communication

The semantic conception of truth and the foundations of semantics

Truth and meaning

Comparison of confirmation measures

Entailment and symmetry in confirmation measures of interestingness

Selected group-theoretic aspects of confirmation measure symmetries

Likelihood ratios as a measure of the diagnostic usefulness of excretory urogram information

How Bayesian confirmation theory handles the paradox of the ravens

What Is the Point of Confirmation?

An Outline of a Theory of Semantic Information

State of the field: Measuring information and confirmation

Semantic channel and Shannon channel mutually match and iterate for tests and estimations with maximum mutual information and maximum likelihood

A deep learning algorithm using CT images to screen for Corona Virus Disease (COVID-19)

Selective confirmation and the ravens: A reply to Foster

From certainty factors to belief networks

The author thanks Zhilin Zhang of Fudan University and Jianyong Zhou of Changsha University because this study benefited from communication with them. The author thanks Peizhuang Wang of Liaoning Technical University for his long-term support and encouragement. The author also thanks the anonymous reviewers for their comments and suggestions, which evidently improved this paper.

The author declares no conflict of interest.