key: cord-0261358-10n7d2qb authors: Kikuchi, Masato; Shiramatsu, Shun; Kozakai, Ryota; Ozono, Tadachika title: Matching Social Issues to Technologies for Civic Tech by Association Rule Mining using Weighted Casual Confidence date: 2021-12-17 journal: nan DOI: 10.1145/3498851.3498931 sha: 63996935b88f6b7cd97b1f2c845769dc6c75dcff doc_id: 261358 cord_uid: 10n7d2qb More than 80 civic tech communities in Japan are developing information technology (IT) systems to solve their regional issues. Collaboration among such communities across different regions assists in solving their problems because some groups have limited IT knowledge and experience for this purpose. Our objective is to realize a civic tech matchmaking system to assist such communities in finding better partners with IT experience in their issues. In this study, as the first step toward collaboration, we acquire relevant social issues and information technologies by association rule mining. To meet our challenge, we supply a questionnaire to members of civic tech communities and obtain answers on their faced issues and their available technologies. Subsequently, we match the relevant issues and technologies from the answers. However, most of the issues and technologies in this questionnaire data are infrequent, and there is a significant bias in their occurrence. Here, it is difficult to extract truly relevant issues--technologies combinations with existing interestingness measures. Therefore, we introduce a new measure called weighted casual confidence, and show that our measure is effective for mining relevant issues--technologies pairs. Civic tech has been introduced in Europe [7] , wherein citizens in a region solve regional social issues using information technologies. Since the Great East Japan Earthquake, many civic tech communities, which are civic groups that apply civic tech, have been established in various regions of Japan, and the number of communities is now over 80. The issues that such communities deal with and their technical experience and knowledge are different for different communities. However, some regions face similar or identical social issues. For example, in the last two years, almost all regions have been facing problems related to COVID-19. In such cases, the technical experience and knowledge of one community can be used for the issues that other communities are dealing with. Moreover, cross-sectional collaboration across multiple regions is occasionally essential for solving problems. Therefore, communities need to share their knowledge and assist each other in solving issues appropriately and rapidly. To achieve collaboration among communities, the following information are necessary but insufficiently shared, which is a factor that inhibits collaboration. • What technologies are necessary for solving a certain issue? • Which communities are the best at these technologies? There are two reasons why information is difficult to share. First, it is difficult to identify which technologies are useful for unresolved issues. Hence, there are social technology officers (STOs), who are experts in matching issues and technologies. However, hiring STOs is expensive for civic tech communities. Second, it is difficult to determine the skill levels of all communities in the issues and technologies using human power because there are many communities in Japan. Therefore, our final objective is to build a system that identifies the technologies required to solve issues and subsequently recommends the best communities for collaboration. In this study, we match social issues with the information technologies required to solve them, as a first step toward the realization of the abovementioned recommendation system. We supply a questionnaire to members of civic tech communities and obtain answers on their social issues and the information technologies they use. However, the correspondence between the issues and the technologies is unclear from the answers. Therefore, to match them appropriately, we use association rule mining to acquire such pairs based on their co-occurrence in the answers. Most of the association rules generated from the answers are infrequent and need to be dealt with carefully. There is also a significant bias in the occurrence of technologies, with many technologies being mentioned rarely, whereas a few technologies being specified in most answers. To avoid these problems, we propose a novel measure called weighted casual confidence (WCC). One approach to deal with the occurrence bias is using negative evidence. Casual confidence [6] was proposed as a measure that uses evidence. However, because it treats positive and negative evidence equivalently, it mines many infrequent and unreliable rules. In comparison, the proposed measure treats the two types of evidence with different weights. Moreover, our measure uses conservative confidence [4] , which underestimates the interestingness of a rule if it has a low frequency. This allows dealing with low-frequency rules, which are ignored in general mining problems. Consequently, our measure can mine issues-technologies pairs that frequently co-occur only with each other. We conduct two experiments on mining results by supplying questionnaires to experts familiar with the technologies, and show the effectiveness of our measure in terms of the relevance of the issues and the technologies and their usefulness for knowledge discovery. This section describes related studies on analyzing questionnaire data based on association rule mining and supporting collaboration of people. Association rule mining has been used in various data analyses and for analyzing questionnaire data. For example, Chen et al. [2] and Maduako et al. [8] proposed mining approaches for dealing with different data types (e.g., open-ended answers, selective answers) of a questionnaire. In our mining problem, as described in Section 3, we only use selective answers. Specifically, we deal with only one data type. Our questionnaire data form a transaction database, in which each transaction is a set of selective answers of a respondent. Therefore, our mining problem can be formulated in the general association rule mining framework. However, if we use the other answers of different data types for mining, we believe that these studies will be helpful. Several types of research have been conducted to support collaboration, including civic tech communities. Tossavainen et al. [10] developed a matching system for social issues and social goals to support sustainable collaboration among civic tech communities. Furthermore, Shiramatsu et al. [9] conducted a practical research using this system in civic tech community events. Horita et al. [3] studied designing a workshop to promote collaboration between researchers and citizens. In comparison, we deal with a problem of matching different perspectives of social issues and information technologies for the objective of supporting collaboration. In addition to the pairs of issues and technologies, the matching results of social issues and social goals by Tossavainen et al. may be used to provide better collaborative support. Moreover, we believe that the matching of issues and technologies will be useful for supporting collaboration between researchers and citizens. In our mining problem, we use a database based on the questionnaire data prepared in advance. The questionnaire is designed to visualize the characteristics and specialties of the civic tech communities easily. Although the questionnaire contains various questions, we use only the answers to the two questions mentioned in Tables 1 and 2 . We supply the questionnaire to 49 people from 47 civic tech communities in Japan and obtain answers about their social issues and the information technologies they use. An example answer is expressed as where represents an answer, and is an index that distinguishes the answers. and ℎ are the sets of issues and technologies in , respectively. The range of is 1 ≤ ≤ , and = 49 because 49 people answered the questionnaire. In the above example, a community deals with three issues: "sightseeing, " "transportation, " and "living, " and uses two technologies: "open data" and "GIS and geospatial information. " In this paper, we denote a set of such answers as a database = { } =1 and each as a transaction. In the next section, we present the problem formulation of matching high-relevance issues and technologies from their possible combinations in s as association rule mining. Accordingly, we obtain high-relevance issues-technologies pairs with statistical evidence. From the questionnaire data described in the previous section, we match social issues with the information technologies required to solve them. We formulate this problem as an association rule mining problem. Association rule mining is a well-known data analysis method to obtain frequent patterns and combinations of highrelevance patterns in a transaction database. By association rule mining, we can obtain high-relevance issues-technologies pairs based on statistical evidence from numerous combinations. We represent an association rule as ⇒ , where is a set of items related to the social issues and is a set of items related to the information technologies. The items are the answer choices listed in Tables 1 and 2 (e.g., "COVID-19, " "cultural exchange, " "GIS and geospatial information"). Suppose that the sets of all items for the issues and technologies are and ℎ , respectively, ⊆ , ⊆ ℎ are satisfied. The rule, ⇒ , suggests that if the condition, , is satisfied, then is also satisfied. For example, the following rule indicates that the technology of "GIS and geospatial information" is used for the issues of "COVID-19" and "Traffic": {COVID-19, Traffic} ⇒ {GIS and Geospatial Information} . Association rule mining identifies rules in which the relationship between and is frequently satisfied (specifically, rules with strong "interestingness"). Interestingness measures, which are measures of the interestingness of rules, are described in Section 5. combinations of possible pairs , s. In this example, it is easy to enumerate all pairs because there are only two choices each for the issues and the technologies. However, in our case, as can be seen from Tables 1 and 2, | | = 38 and | ℎ | = 16, and enumerating all pairs leads to a combinatorial explosion. Therefore, we enumerate all pairs , s with occurring in two or more answers. This reduces the total number of pairs to be enumerated to 7,733,793. Note that in general association rule mining, a threshold is set for the frequency of pair , . This threshold is extremely stringent to apply in our case. Our threshold is set for the frequency of to enumerate many pairs, because our database is small and most of the rules are infrequent. In Algorithm 1, the dataset, , described in Section 3 is the input, and its outputs are the issues-techniques pairs, , s, and their frequencies ( ), ( ), and ( ∪ ), which are the numbers of answers in which , , and ∪ occur, respectively. Combi(·) is a function that takes a set as an argument and returns the set of all combinations among the elements. The arguments, and ℎ , are the sets of social issues and information technologies in theth answer , respectively. Therefore, and ℎ are the combinations of the issues and technologies generated from , respectively. For example, if Combi( ℎ ) and ℎ = {Open Data, SNS}, the returned value, ℎ , is as follows: This algorithm first enumerates the possible combinations among the issues and among the technologies from each answer in , respectively. Subsequently, it counts their frequencies ( )s and ( )s. Finally, it enumerates pairs , s that satisfy ( ) ≥ 2, and counts their co-occurrence frequencies ( ∪ )s. We use these frequencies to measure the interestingness of the association rules. Association rule mining measures the interestingness of each rule and mines the rules with strong interestingness. Therefore, it is important to measure interestingness appropriately for successful mining. In this section, we first introduce existing measures [1, 4, 6] and describe the problems of using them in our mining. Subsequently, we propose the measure WCC to mitigate the problems. Require: The classical association rule mining method, the Apriori algorithm [1] , estimates the interestingness of a rule ⇒ as where ( ) is the number of transactions containing and ( ∪ ) is the number of transactions containing both and . Note that ∩ = ∅. The above equation yields the maximum likelihood estimate of the conditional probability, ( | ), which is called confidence. Although confidence is easy to calculate, it can be unreasonably large for low-frequency rules. For example, suppose we obtain the database summarized in Table 3 . If we measure the interestingness of rules { } ⇒ { } and { } ⇒ { } with confidence, the estimates of both are 0.500, as listed in Table 4 . However, { } and { } co-occur thrice, whereas { } and { } co-occur only once, suggesting that the co-occurrence of { } and { } may be coincidental. Therefore, the Apriori algorithm introduces a threshold called minimum support ( ) for the cooccurrence frequency and calculates the confidence for only those rules that satisfy ( ∪ ) ≥ . In general association rule mining, the number of rules is much larger than in our case. Therefore, by introducing , we can reduce the number of rules to measure the interestingness and mitigate the problem of calculating the confidence for low-frequency rules. Our database consists of 49 answers (transactions). Specifically, the maximum frequency that can be used for estimating the interestingness is 49, and most of the generated rules are infrequent. In this case, it is difficult to determine the optimal , and a small variation in will reduce numerous rules. Therefore, Kikuchi et al. [4] proposed a measure that weakly (conservatively) estimates the interestingness of a rule depending on its low frequency, instead of ignoring the low-frequency rules. We call this measure "conservative confidence" (denoted as Conf ℓ ). Conf ℓ first constructs a confidence interval for ( | ) using the method of Kikuchi et al. [5] , and subsequently uses its lower bound as the interestingness score. This measure has a confidence coefficient 1 − , where 0 < < 1 is a parameter. We set the coefficient as 0.99 ( = 0.01) and use the lower bound of the one-sided 99% confidence interval. As listed in Table 4 , the scores for rules { } ⇒ { } and { } ⇒ { } are 0.142 and 0.059, respectively. Therefore, using Conf ℓ , we can preferentially identify high-frequency and reliable relationships and subsequently find low-frequency but frequently co-occurring relationships. However, it is not always possible to obtain truly relevant relationships even using this measure. Both rules { } ⇒ { } and { } ⇒ { } have ( ) = 4 and ( ∪ ) = 3; therefore, both are estimated as 0.222. However, item occurs only in the transactions that contain item , whereas frequently occurs in the transactions that do not contain . Here, and may not be relevant. However, Conf ℓ yields large estimates for such rules. Table 2 shows that "open data" occurs in as many as 40 of the 49 answers. Therefore, the use of Conf ℓ may result in the mining of many irrelevant issues-technologies pairs, including "open data. " One approach for mitigating the problem of Conf ℓ is using negative evidence (i.e., the frequency of the transactions in which the items do not occur). Casual confidence [6] was proposed as a measure for using negative evidence and is defined as where ( | ) represents the probability that itemset does not occur in a transaction under the condition that itemset does not occur in the transaction. We estimate the two probabilities in the above equation using conservative confidence. We set the confidence coefficients as 0.99. ( | ) is estimated from observed frequencies ( ) and ( ∪ ). These frequencies can be obtained as respectively. is the total number of transactions in the database. As summarized in We aim to acquire rule ⇒ , where issues and technologies frequently co-occur only with each other. Therefore, instead of treating ( | ) and ( | ) equally, we should pay attention to ( | ). To achieve this, we propose a measure that takes a weighted average of ( | ) and ( | ), which is called WCC. WCC is defined as where 0 < < 2 is a weight parameter that adjusts the balance between ( | ) and ( | ). When = 1, this measure is equivalent to Casual-Conf, as expressed in Eq.(1). In this study, we set = 1.6 to emphasize ( | ). We estimate the two probabilities in the above equation using conservative confidence. We set the confidence coefficients as 0.99. As summarized in Table 4 This section describes our experiments for identifying effective interestingness measures for matching the social issues and the information technologies. Because there are no ground truth issuestechnologies pairs, it is difficult to quantitatively evaluate the measures. Hence, we conducted two experiments using questionnaires. In the first experiment, we first performed association rule mining using three measures, except Conf, as described in Section 5. Accordingly, we prepared three lists of high-scoring rules for each measure 1 . Subsequently, we supplied a questionnaire on these lists to approximately rank the measures. Note that because Conf ℓ and WCC tend to assign similar scores to rules, the rule lists for these measures were expected to be similar. Therefore, in the second experiment, we conducted a detailed comparison of the rules scored by Conf ℓ and WCC. Specifically, we randomly showed a two-rule set to the respondents from the rule lists and subsequently supplied a questionnaire on the usefulness of the rules. In this experiment, the number of rules we used and the number of respondents are larger than in the first experiment. We showed the lists of rules generated for each measure to the respondents and supplied a questionnaire asking about the superiority or inferiority of the measures. The objective of this experiment was to determine the approximate superiority or inferiority of the measures. The experimental procedure was as follows. First, we used Algorithm 1 to generate the association rules and calculate their frequencies. Next, we scored the rules using the three measures: Conf ℓ , Casual-Conf, and WCC. We extracted the top 30 rules with high scores for each measure, and listed these rules in descending order of scores. Finally, we asked the respondents to compare these three lists and to answer the questions about the superiority and inferiority of the measures and their differences. In this questionnaire, we hid the measure names and notated the measures as methods A, B, and C. The 24 respondents consisted of research students, fourth-year undergraduate students, and graduate students majoring in computer science at a science and engineering university in Japan. The questionnaire contained the following five questions: Q1-1: Please rank methods A, B, and C. You should choose a method like the following first: a method seems to generate the most plausible and useful pairs of social issues and information technologies Q1-2: Please indicate the difference between the first and second methods in terms of the relevance between the issues and technologies in each pair. Q1-3: Please indicate the difference between the first and second methods in terms of the usefulness for each pair of issues and technologies. Q1-4: Please indicate the difference between the second and third methods in terms of the relevance between the issues and technologies in each pair. Q1-5: Please indicate the difference between the second and third methods in terms of the usefulness for each pair of issues and technologies. In this questionnaire, we asked the respondents to rank the measures and subsequently to indicate the differences in the superiority or inferiority of the measures. They had the following four choices as the difference: (1) almost same, (2) slightly different, (3) different, or (4) largely different. First, we focus on Table 5 , which provides the answers to the ranking of the measures (Q1-1). As listed in the column of the first rank, 12 and 10 respondents answered WCC and Conf ℓ , respectively. Thus, WCC is better than Conf ℓ , with a difference of only two respondents. In contrast, in the column of the third rank, 18 out of 24 respondents answered Casual-Conf, indicating that many determined it to be inferior to the other two measures. Next, we focus on the differences among the measures. We received various answers; however, owing to space limitations, we cannot include all of them. Therefore, we limit our discussion to the two most prominent choices in Q1-1 (highlighted in gray in Table 5). Tables 6 and 7 list the answers of Q1-2 to Q1-5 for the two choices, respectively. For each table, we first focus on the differences between the first and second measures. These represent the differences between Conf ℓ and WCC, and the majority of respondents answered "almost same" and "slightly different" in terms of both relevance and usefulness. Subsequently we focus on the differences between the second and third measures. These represent the differences between Conf ℓ and Casual-Conf or the difference between WCC and Casual-Conf. For the differences between Casual-Conf and Conf ℓ or WCC, the majority of respondents answered "different" in all cases, except Q1-4, based on Table 7 . From Table 7 , we can see that two respondents answered "largely different" for the difference between Conf ℓ and Casual-Conf. We summarize the results of the questionnaire. In terms of the effectiveness of the measures in matching the social issues and the information technologies, many respondents answered that WCC or Conf ℓ as the best (12 and 10 respondents, respectively), whereas many answered Casual-Conf as the worst (18 respondents). As for the differences among the measures, WCC and Conf ℓ were close, whereas Casual-Conf was considered inferior to these two measures by many respondents. We randomly selected a two-rule set awarded high scores by Conf ℓ or WCC or both, showed it to the respondents, and asked them to compare the goodness or badness of the two rules. We conducted experiments to clarify the difference in the superiority and inferiority of Conf ℓ and WCC. Because it was difficult for the respondents to provide an absolute rating for each rule, we asked them to assign a relative rating between the two rules. The experimental procedure was as follows. First, from the scored rules, we created two lists of 100 rules awarded high scores by Conf ℓ and WCC, respectively. Next, we took the union of the rule lists. Consequently, there were 63 rules contained in both lists, and after removal of duplication, the union size was 137. Here, we assigned unique labels to the rules that were contained only in the list of Conf ℓ or WCC and both lists, respectively. We used these labels only to analyze the answers and did not show them to the respondents. Finally, we randomly selected a two-rule set with different labels from this union and showed to the respondents for comparison of their goodness or badness. We conducted a crowdsourced questionnaire of people with experience in information technology and received 817 answers 2 . Experience in information technology implied experience in learning information technology or engaging in work related to information technology. Before the questionnaire, we explained in writing to the respondents that our final objective was to create a recommendation system for information technologies required for solving social issues. Next, we presented the respondents with pair 1 1 , 1 and pair 2 2 , 2 of issues and technologies that constitute the two-rule set { 1 ⇒ 1 , 2 ⇒ 2 }, and asked them to compare the usefulness of these pairs in the recommendation system. Q2: Please compare the usefulness of pairs 1 and 2 for the recommendation system and choose the appropriate one from the following choices: (1) Both of them are useful. In addition, pair 1 is more useful. (2) Both of them are useful. In addition, pair 2 is more useful. (3) Pair 1 is as useful as pair 2. (4) Pair 1 is useful, but pair 2 is not. (5) Pair 2 is useful, but pair 1 is not. (6) Both of them are not useful. Table 8 summarizes the aggregate results of the answers based on the combinations of the labels for pairs 1 and 2. The labels represent the measures for the lists that contained the pairs. For ease of understanding the results, we replaced pairs 1 and 2 with the corresponding labels in the choices. For each comparison between the labels, the most prominent choice, number of respondents who selected the choice, and percentage of the respondents for the six choices are highlighted in gray. From the overall comparison results between the two labels, we can see that the percentages of both pairs being useful (i.e., the percentage of respondents who chose the top three choices among the six choices) is approximately 80%. In the comparison between Conf ℓ and {Conf ℓ , WCC}, the number of respondents who answered that "{Conf ℓ , WCC} is more useful" is approximately twice that who answered "Conf ℓ is more useful" (104 and 53, respectively). In the comparison between Conf ℓ and WCC, the number of respondents who answered that "WCC is more useful" is approximately twice that who answered that "Conf ℓ is more useful" (79 and 38, respectively). In the comparison between {Conf ℓ , WCC} and WCC, the number of respondents who answered that "WCC is more useful" is approximately 1.5 times that who answered that "{Conf ℓ , WCC} is more useful" (109 and 79, respectively). We summarize the results of the questionnaire. We asked whether the rules assigned high scores by Conf ℓ or WCC or both were useful for the recommendation system. Thus, approximately 80% of the respondents answered that these rules were useful. The rules awarded high scores only by WCC tended to be more useful than the other rules. The rules assigned high scores by both Conf ℓ and WCC tended to be more useful than those only by Conf ℓ . Overall, we confirmed the effectiveness of WCC. In experiment 1, we could not observe noticeable differences in the superiority or inferiority of Conf ℓ and WCC. We consider this is owing to the similarity of the rule lists of both measures. We set the weight parameter, , of WCC as 1.6. In this setting, the scores of WCC are close to those of Conf ℓ . Consequently, of the 30 rules contained in the list of WCC, more than half (18 rules) are also contained in the list of Conf ℓ . Hence, we infer that this made it difficult for the respondents to differentiate between WCC and Conf ℓ . In contrast, the rules in the list of Casual-Conf do not overlap with the lists of the other rules lists. However, the list contains only the rules that occur thrice, and the occurrence of these rules may be coincidental. Therefore, Casual-Conf is determined to be inferior to the other two measures. In experiment 2, the rules assigned high scores by only WCC tend to be more useful than the other rules. To clarify the reason, we categorized the rules by their labels and compared them. We found that more than 90% of the rules labeled as Conf ℓ and {Conf ℓ , WCC} contained only "GIS and geospatial information" or "open data" or both in technologies . These two technologies are chosen by numerous communities in the questionnaire, as shown in Section 3, and it is difficult to clarify their relationship with specific issues. Therefore, we consider the rules awarded high scores by Conf ℓ are highly general and not useful. However, the rules labeled as WCC contain technologies such as "Wikipedia and Wikidata, " "IoT, " "visualization, " "SNS, " "programming, " indicating that our proposed measure, WCC, can present more diverse rules than Conf ℓ . Based on the above, we consider that WCC is the superior measure. As expressed in Eq.(2), WCC adjusts the impact of the two probabilities, ( | ) and ( | ), on the estimates by the weight parameter, . In our experiments, we set as 1.6 to acquire positive association rules ⇒ . However, if we set smaller than 1, WCC can acquire negative association rules ⇒ . In addition, WCC can deal with low-frequency rules without ignoring them Table 8 : Comparison results of two rules with different labels. {Conf ℓ , WCC} is label assigned to rule existing in both lists. Conf ℓ {Conf ℓ , WCC} by underestimating their interestingness. However, WCC has two hyperparameters: confidence coefficient and weight . Our future study will be on automatically setting these parameters depending on the mining objective and the properties of the rules. Moreover, even WCC assigns high scores to the highly general relationships containing "open data. " Therefore, WCC can be further improved to eliminate these rules. Furthermore, the effectiveness of WCC shown in our experiments is the result of the subjective evaluation achieved by questionnaires. Although the rules we treated cannot be completely quantitatively evaluated for their correctness, it may be possible to evaluate quantitatively their correctness. For example, by collecting source codes and related documents published by civic tech communities using crawlers and analyzing them, we may be able to link issues and technologies. Quantitative evaluation of the tendency for the correctness of rules is necessary to support collaboration among communities, and is another future study. In this paper, to support collaboration among civic tech communities, we presented a method to find high-relevance pairs of social issues and information technologies using questionnaire data on the issues faced by community members and their available technologies. Based on the co-occurrence of issues and technologies in the questionnaire data obtained in advance, we formulated an association rule mining problem to separate the relevant issues and technologies. However, most of the rules were infrequent, and there was a significant bias in the occurrence of some issues and technologies. Here, Conf ℓ [4] cannot deal with the occurrence bias, and Casual-Conf [6] mines infrequent and unreliable rules. Therefore, we proposed WCC, which can mitigate both problems encountered by the above measures. We conducted two experiments on mining using questionnaires. In experiment 1, we showed the rule lists for each measure to 24 university students and asked them to rank the measures. Many answered that WCC or Conf ℓ was the best (12 and 10 students, respectively), whereas 18 students answered that Casual-Conf was the worst. In experiment 2, we showed a two-rule set assigned high scores by Conf ℓ or WCC or both to cloud workers and asked them to compare their usefulness for knowledge discovery. Approximately 80% of them answered that the two-rule set was useful. In the comparisons of WCC with Conf ℓ and {Conf ℓ , WCC}, most of them answered that the rules of WCC were more useful than those of the others (79 and 109 workers, respectively). In the comparison of {Conf ℓ , WCC} with Conf ℓ , most of the respondents answered that the rules of {Conf ℓ , WCC} were more useful than those of Conf ℓ (104 workers). Overall, we confirmed the effectiveness of WCC. Both of them are useful Both of them are useful. In addition, {Conf ℓ , WCC} is more useful Conf ℓ is as useful as {Conf ℓ Conf ℓ is useful, but {Conf ℓ WCC} is useful, but Conf ℓ is not Conf ℓ WCC (1) Both of them are useful Both of them are useful. In addition, WCC is more useful Conf ℓ is useful WCC is useful, but Conf ℓ is not ) Both of them are useful. In addition, {Conf ℓ , WCC} is more useful Both of them are useful. In addition, WCC is more useful WCC} is as useful as WCC} is useful WCC is useful, but {Conf ℓ Fast algorithms for mining association rules Mining fuzzy association rules from questionnaire data. Knowledge-Based Systems A design of research group workshop to generate co-creation between researchers and citizens Using conservative estimation for conditional probability instead of ignoring infrequent case Confidence interval of probability estimator of Laplace smoothing Comparing machine learning and knowledge discovery in databases: An application to knowledge discovery in texts Open data and civic apps: First-generation failures, second-generation improvements Building -partite association graphs for finding recommendation patterns from questionnaire data Towards continuous collaboration on civic tech projects: Use cases of a goal sharing system based on linked open data A linked open data based system utilizing structured open innovation process for addressing collaboratively public concerns in regional societies This work was supported in part by JSPS KAKENHI Grant Numbers JP19K12266, JP17K00461.