key: cord-0446411-ktrw377u authors: Gupta, Abhishek; Lanteigne, Camylle; Institute, Victoria Heath Montreal AI Ethics; Microsoft,; Lab, Algora title: Report prepared by the Montreal AI Ethics Institute (MAIEI) on Publication Norms for Responsible AI date: 2020-09-15 journal: nan DOI: nan sha: f68bbdc1e313965737085466c24980a7c489b34e doc_id: 446411 cord_uid: ktrw377u The history of science and technology shows that seemingly innocuous developments in scientific theories and research have enabled real-world applications with significant negative consequences for humanity. In order to ensure that the science and technology of AI is developed in a humane manner, we must develop research publication norms that are informed by our growing understanding of AI's potential threats and use cases. Unfortunately, it's difficult to create a set of publication norms for responsible AI because the field of AI is currently fragmented in terms of how this technology is researched, developed, funded, etc. To examine this challenge and find solutions, the Montreal AI Ethics Institute (MAIEI) co-hosted two public consultations with the Partnership on AI in May 2020. These meetups examined potential publication norms for responsible AI, with the goal of creating a clear set of recommendations and ways forward for publishers. In its submission, MAIEI provides six initial recommendations, these include: 1) create tools to navigate publication decisions, 2) offer a page number extension, 3) develop a network of peers, 4) require broad impact statements, 5) require the publication of expected results, and 6) revamp the peer-review process. After considering potential concerns regarding these recommendations, including constraining innovation and creating a"black market"for AI research, MAIEI outlines three ways forward for publishers, these include: 1) state clearly and consistently the need for established norms, 2) coordinate and build trust as a community, and 3) change the approach. • Create tools to navigate publication decisions: The use of extrinsic measurements like benchmarks, or third-party expert panels could be a crucial step to navigating publication decisions in a fair way. Such methods could set a certain standard in terms of the acceptable level of risk associated with a publication. In line with this suggestion, it would also be pertinent to keep a record of the papers that were rejected due to their inherent risk, as well as some metrics on these. • Offer a page number extension: It may be beneficial to extend the page limit for published papers to allow researchers to include negative results (results that are insignificant or disprove researchers' hypotheses), which aren't traditionally printed. In addition to expanding the number of pages, there should also be a significant change in the culture surrounding the publication of negative results. • Develop a network of peers: Developing a network of peers to evaluate researchers' AI models in terms of potential risks and benefits may be an important tool towards better and safer publication norms. If such a mechanism were put in place, the evaluation could be fully or partly based on philosopher John Rawls's "veil of ignorance." If this idea was applied to reviewing AI research, peer reviewers could be asked to consider the potential advantages and risks of a new AI research from different social perspectives. • Require broad impact statements: The NeurIPS conference requires that submitted papers include a statement of broader impact with respect to the research presented. This incentivizes researchers to think about potential risks and benefits by making reflection a requirement for one's work to be considered at NeurIPS. Similar measures at all conferences and publications would encourage researchers to critically assess their research in terms of its effects, positive and negative, on the world • Require the publication of expected results: Requiring that researchers write and publish the expected results of their research project (including but not limited to its broader social and ethical impacts) could help foster reflection around potential benefits and harms even before researchers undertake their project. • Revamp the peer-review process: The well-established practice of peer review is a great opportunity for exchanges on the risks and benefits each reviewer sees in the paper they are revising. If a question or requirement were added to this effect when papers were reviewed, it may have a rapid and widespread impact in inciting researchers to consider what may follow from their research. An effective review process should promote limiting risks while also being clear, fair, and efficient. One way of doing this is to intensify the requirements for publication proportionally to how risky the research is deemed by peers reviewing the paper. The history of science and technology shows that seemingly innocuous developments in scientific theories and research have enabled real-world applications with significant negative consequences for humanity; from the eugenics movement in the late 19th and early 20th centuries to the cataclysmic nuclear destruction in Japan in 1945 . This history reveals that the 1 2 real-world impacts of scientific research cannot be separated from the research itself, and it's important to look at the social, cultural, political, and economic realities that shape the way science is used and the norms that regulate it. It's also important to consider the impacts these developments may have on people across the world. Often, researchers do not consult with or consider individuals that may be negatively affected by scientific developments, reflecting existing power imbalances in which a small group of privileged individuals make decisions or take risks that impact millions, if not billions, of lives. We need to be aware of these power imbalances and how they're rooted in existing inequalities-this is especially true in the field of artificial intelligence (AI). The pace and scale of impact from AI far exceed other technologies. Thus, critical examination is necessary, especially as this technology becomes increasingly deployed throughout society. The aforementioned power imbalances and inequalities in scientific research are apparent in the general disconnect between the research priorities of funders (e.g. grant-making bodies, companies, governments, etc.) and the broader societal interest. The academic paradigm of " publish or perish" and the undue pressure it creates, overshadows more fundamental 3 questions that need to be asked. The most important of which is, "Why is this research project being pursued in the first place?" This lack of critical reflection and external pressure has given rise to predatory journals with lax quality standards regarding what gets accepted for publication-this is especially an issue for researchers who are new to an academic field and are uncertain about publishing norms. For example, research in phrenology continues to be accepted in highly revered journals despite decades of precedent demonstrating that 4 phrenology is pseudoscience. The egregious inclusion of this type of research resulted in severe backlash from scholars that led to a retraction and apology from the journal. This example highlights how fallacious research can slip through the cracks, especially without critical reflection. In order to ensure that the science and technology of AI are developed in a humane manner, we must develop research publication norms that are informed by our growing understanding of AI's potential threats and use cases. Unfortunately, there haven't been many efforts to create a set of publication norms for responsible AI because the field of AI is currently fragmented in terms of how this technology is researched, developed, funded, etc. Thus, the norms around AI publications and ethical standards are not only fragmented, but also contradictory in many cases. A standardized approach to publication norms in AI across a large number of jurisdictions is essential. Many subfields in AI are experiencing a boom in interest, with growing demands to produce novel research. Thus, there is an elevated risk of a lack of awareness on what adequate and rigorous publishing norms are. Additionally, there is a high degree of susceptibility to predatory journals that lure budding researchers through a "pay-to-play" model. 5 This amplifies the potential impacts of harmful research that must be critically reviewed before published for public consumption. Most scientific and academic journals have particular guidelines for submissions, which are a form of publication norms. However, problematic research can still be published. For example, researchers exerted external pressure on Springer to withdraw the publication of a paper in 6 which the researchers claimed to "predict criminality" using neural networks for facial recognition. This demonstrates that the publication ecosystem requires norms with multiple filters. As well as editorial boards that have sufficient demographic diversity and a range of expertise to be able to flag problematic research and prevent its publication, especially in cases where there is potential to render harm on marginalized peoples. However, the prevalent practice in peer-reviewed journals is to evaluate a work purely on its scientific merit, which overlooks the inherent interaction between fundamental research and the social context in which it's conducted. Publication norms have a strong role to play in ensuring interdisciplinarity in review processes so that too narrow a focus does not allow potentially harmful work to pass as seemingly innocuous. Beyond the social implications of research, it's also important to consider the technical implications. For example, in cryptography, it's important to find vulnerabilities in a system to improve robustness. Cryptographers do this by looking at a system from an adversarial perspective, as well as by sharing their systems openly so they can be evaluated for undue risks by as many people as possible. For example, through this process decades ago, unnecessary risks arose from the use of substitution boxes (S-box) in data encryption standard 7 (DES), which led to the subsequent discovery of differential cryptanalysis. This is also important type of vulnerability was seen in the JBoss middleware, which shipped quietly with dozens of consumer grade softwares but often with an unpatched, or even unpatchable, version. 15 Similarly, software which incorporates AI that relies on an unpatchable library will be vulnerable. To expose such vulnerabilities, as well as better understand the social implications of AI research, journal editors must be better equipped to identify all actors who may engage with AI research and make well-informed decisions around whether research should be published, and how. We believe developing a standardized approach to publication norms in AI across a large number of jurisdictions is the first step towards ensuring that the science and technology of AI are developed in a humane manner. The use of extrinsic measurements like benchmarks, or third-party expert panels could be a crucial step to navigating publication decisions in a fair way. Such methods could set a certain standard in terms of the acceptable level of risk associated with a publication. In line with this suggestion, it would also be pertinent to keep a record of the papers that were rejected due to their inherent risk, as well as relevant metrics. For example, what subfield of AI was the paper part of? What institution or corporation was the authors affiliated with (if any)? Demographic and identity metrics may also be pertinent to help ensure the review process is not discriminatory. To this end, mechanisms like Retraction Watch that are invested in by the scientific community and held to the highest academic standards will be a cornerstone to strategies combating spurious 16 research. Awareness of such mechanisms and their integration into tools like Google Scholar, akin to COVID-19 research warnings for non-peer-reviewed articles, is essential. It may be beneficial to extend the page limit for published papers to allow researchers to include negative results (results that are insignificant or disprove researchers' hypotheses), which aren't traditionally printed. There is a bias towards publishing papers with positive results. That is, results that confirm or partially confirm one's hypothesis. In addition to expanding the number 17 of pages, there should also be a significant change in the culture surrounding the publication of negative results. While a higher maximum of pages may help, it should be accompanied by other measures to encourage researchers to share their negative results and push the broader scientific community to consider negative results in their analysis. Publishers could require the inclusion of all negative results that led the researchers to the positive results published and/or submitted. These could also be indexed through a standardized mechanism for retrieval by engines like Google Scholar so that downstream researchers are aware of what areas have already been explored. Improving citations and value of such negative results work will also incentivize the ecosystem to invest in making negative results more public and elevate the quality of scientific publishing in the space. Developing a network of peers to evaluate researchers' AI models in terms of potential risks and benefits may be an important tool towards better and safer publication norms. If such a mechanism were put in place, the evaluation could be fully or partly based on philosopher John Rawls's "veil of ignorance," which asks individuals to create a new society and choose its governing principles without knowing their individual characteristics (e.g. gender, race, social class, etc.). The idea is that they'll choose principles that will benefit everyone in society since they're "ignorant" of their personal circumstances and social standing. If this idea was applied to reviewing AI research, peer reviewers could be asked to consider the potential advantages and risks of a new AI research from different social perspectives. Specific details about fictional but realistic situations and personas may be used to help reviewers think more accurately and relevantly about privileges and disadvantages they might not have themselves. Of course, this shouldn't replace important efforts to make the field of AI more diverse and inclusive-but the exercise of imagining how another person may be affected, positively or negatively, by an AI model is relevant for everyone. This practice has precedent in design thinking approaches and 19 has shown to create products and services that are ultimately more empathetic to the needs of the communities that they are meant to serve and encourage the construction of more inclusive work. An important method to incite researchers to reflect on the impacts of their work is to give them incentives to do so. For example, the NeurIPS conference recently added a requirement for papers that are submitted: they must include a statement of broader impact with respect to the research presented. This incentivizes researchers to think about potential risks and benefits by 20 making reflection a requirement for one's work to be considered at NeurIPS, one of the most prestigious conferences in the field of AI. Therefore, similar measures at all conferences and publications may be a good way of encouraging researchers to critically assess their research in terms of its effects, positive and negative, on the world. ICLR has followed suit and has included a Code of Ethics for their 2021 edition which is an important step in the proliferation of these 21 standards becoming mainstream, not just something that one gives a brief nod to. It's important to note that how the research community adopts the formulation of broader impact statements is of fundamental importance. If broader impact statements become yet another "ethics stamp" on poorly conceived research, then the entire endeavour becomes counterproductive. Researchers need to critically evaluate the broader impacts of their research before starting a project and use these insights to guide a conscientious research methodology and not construct these perspectives after the fact. The point of writing a broader impact statement must be to introspect the ethics of one's work and not as a mechanism to avoid desk rejection. One way of doing so is to popularize the importance of these norms in the early stages of a researcher's journey into the publishing world. During the formative years in the field, early-career researchers might be naturally drawn to emulating the behaviour of the more experienced researchers in their labs and workplaces. An educational push for holding research ethics, norms, and standards as the paramount element of doing research and development in a consequential field like AI needs to be strongly encouraged. Principal Investigators (PIs) and other senior researchers in both academic and industry labs should shepherd those who are still growing accustomed to the modes of operation of research in the domain. The widespread presence of such a requirement could also incite greater awareness among technical researchers in the field of AI regarding fields and areas like ethics, psychology, anthropology, sociology, and critical race theory. Furthermore, making impact statements and other educational pushes necessary could spark interdisciplinary collaborations between researchers. Requiring that researchers write and publish the expected results of their research project (including but not limited to its broader social and ethical impacts) could help foster reflection around potential benefits and harms even before researchers undertake their project. For example, a list of journalists who would be interested in hearing about the benefits of the author's thesis on the rest of society could be created, incentivizing the consideration of their project benefits (scientists will want their papers to be promoted in newspaper articles). This would not only encourage researchers to reflect on the impacts of their work, but perhaps guide them towards generally less risky and more beneficial research. This could be similar to how many psychologists now publish a preliminary report on how they will conduct their research, the variables they will be measuring, and their hypothesis, before they perform an experiment. This prevents data manipulation in attempts to find statistically significant results. In the 22 language of software engineering, it could create something akin to Test-Driven Development (TDD) whereby the researchers can preemptively highlight what it is that they are aiming to achieve with their research and view negative results in a positive light as a way of showing ways that don't lead to results that were stated as the goals of the research nonetheless led to an understanding of some aspects of the field helping to improve the knowledge base for future researchers in the domain. Of course, having much greater diversity of researchers in the field of AI-in terms of race, gender identity, sexual orientation, geography, language, lived experiences, and socioeconomic status-could also make a significant difference in how and how much researchers think about the impacts of their work. It seems reasonable that, in at least some cases (if not many), the potential benefits and pitfalls of researchers' work may be more pronounced for a certain demographic, and people who are part of that demographic are more likely to be attuned to how it may impact them. Thus, having more diverse actors in the field of AI could help foster more comprehensive reflection on the potential impacts of a research project. Additionally, a framing whereby the researchers take on the onus to surface these impacts rather than placing the burden on the people who might be impacted (often they might not have adequate knowledge, resources, or abilities) to defend themselves creates a more pro-social way forward for conducting research. The well-established practice of peer review is a great opportunity for exchanges on the risks and benefits each reviewer sees in the paper they are revising. If a question or requirement were added to this effect when papers were reviewed, it may have a rapid and widespread impact in inciting researchers to consider what may follow from their research. This could be done in a way that is similar to how the World Health Organization (WHO) highlights Dual Use Research of Concern (DURC): research in the realm of the life sciences "that is intended for benefit, but which might easily be misapplied to do harm." To have a similar category for AI 23 research would be pertinent, and developing guidelines for this kind of research seems crucial, both to define it and to control it, especially given that AI squarely falls under the category of a dual-use and general-purpose technology. Such guidelines might include needing special authorization to conduct this type of research, or it may be required that the research be reviewed more extensively before being published. Further, it's obvious that an effective review process should promote limiting risks while also being clear, fair, and efficient. One way of doing this is to intensify the requirements for publication proportionally to how risky the research is deemed by peers reviewing the paper. One way to go about this could be to identify a list of specific risks that any AI model proposed in a research paper could have. These risks may have numerical weights assigned to them depending on how dangerous their consequences may become. Each peer reviewer could identify the risks they deem relevant to the AI model they are examining, and assign a score between 0 and 5 to each risk in accordance with how likely it is that the risk will materialize. The risks' weights could then be multiplied by their corresponding likelihood score. Each of these quotients can be compiled into a sum. The sums obtained by each reviewer for one paper can then be added up and made into an average. If this 'total risk average' is higher than a pre-established number, the paper could then be immediately rejected, require further reviewing and discussion by a third-party group of experts, or be published under much stricter requirements than other papers that are below the 'total risk average' threshold. Such a threat modelling approach is already used extensively in the field of cybersecurity to prioritize risks and vulnerabilities to guide the efforts of researchers and practitioners in working to address them . 24 This process, or one with a similar structure, has the advantage of remaining somewhat efficient by not requiring that papers that are considered non-risky be subjected to more time-consuming reviewing or more stringent requirements unnecessarily. Of course, a procedure like the one presented above risks potentially wrongly identifying a model as low risk, whereas a model that held all papers to higher standard regarding risk avoids this. However, it's unlikely that applying more stringent standards to all papers is necessary, efficient, or realistic considering the sheer volume of papers published. A more targeted approach seems better suited to the reality of publishing in the domain of AI. A consideration that publishers can adopt is to think about the rate of false positives and track them over the period of reviews and adjust as they go along to judge the efficacy of this mechanism. To the risk of this becoming repetitive, the cornerstone of an effective review process, as previously mentioned, will be a diverse body of reviewers from the point of view of race, gender, disability, socioeconomic status, geography, and more. Risk assessment is unlikely to be fair and comprehensive if all reviewers have similar backgrounds and experiences. This will result in direct harm to those not represented among reviewers, which are often individuals who are already marginalized and most at risk. This is something that needs to be created proactively and will not necessarily emerge organically since reviewers are often sourced from a tightly knit network of people one is already familiar with; breaking free from that requires constant and conscious effort. One important area where there may be some pitfalls is innovation. If publishing norms are more stringent, then cutting-edge research may be ignored or underfunded. In some cases this may be because the potential risks outweigh the benefits. However, in other cases it may be the result of a lack of awareness from the researcher's part on what those publication norms are. This could have a particularly negative impact on emerging scholars in AI from regions and countries where AI research and development is in nascent stages, especially those which have a less than proportionate representation in the scientific publications at major conferences and journals (which are mostly concentrated in the Western hemisphere). This could harm 25 scientific diversity in the field. One can undoubtedly see this as a missed opportunity, but it's better framed as an opportunity for better work and innovation. What if we could use this as an opportunity to build AI that did more good than harm? By erring on the side of caution, we may encourage researchers to better understand the consequences of developing and deploying a certain system or application. Understanding the tail risks of innovation is critical. The insights from fields such as cryptography are that the risks are enormous from poor research, and given the nature of AI systems, a similar level of risk is to be expected. In light of this understanding of asymmetric returns, we need to raise our standards of what constitutes "innovation," and who gets to decide if a new technological application constitutes positive innovation; bringing more good than harm. From this perspective, constraining innovation is not a pitfall of changing publishing norms. Instead, changing publishing norms may foster higher quality, more inclusive and positive innovation. It can be viewed as a mechanism to bend the field of research towards a prosocial direction, something that governments use frequently in the form of regulations to guide how innovation happens in the market. In a similar vein, like many other scientific disciplines, machine learning and artificial intelligence have been affected by their own reproducibility crisis, where a worrying number of algorithmic research results cannot be reproduced when other data scientists run the same experiments. 26 This is of particular concern in fields such as digital biomedicine, where faulty models for disease diagnosis and monitoring can place human lives at great risk. One key component of the problem is the absence of information about the training and evaluation code, the number of training runs required, and datasets used. Another part of the issue is the environment in 27 which data scientists operate, where there is a pressure to publish quickly, a reluctance to report failed replications and a lack of computational and human resources to test every condition and fine-tune each hyperparameter. Better publication norms will not only encourage 28 more rigour in scientific methodology, but also limit the number of cases where research is modelled on "false starts." The danger of this can ensure that future innovation in the field is based on a set of verifiable and veritable discoveries. In an inherently stochastic domain like AI, the degree of variability that can occur in experiments where one might possibly find an experimental run through sheer luck to get the results that showcase a correlation that they want is all the more reason for advocating for higher rigour; a focus on causation over correlation is going to be essential in addition to the points mentioned above. 29 With more stringent norms on what gets published in the field of AI, there is a possibility that research deemed too risky will be driven underground. Meaning, researchers whose work is 26 rejected may publicize their research in other ways, circumventing the measures put in place. Due to the stricter publishing norms, the rejected research may become even more dangerous as it's excluded from mainstream critique and necessary scrutiny. Put simply, stringent publishing norms may actually increase the likelihood of harmful AI by creating a "black market" for research that has subversive aims. Of course, there are counterarguments against this scenario. First, the field of AI/ML is heavily dominated by individuals who are affiliated with universities and companies. In both cases, getting research published by a reputable journal or presented at a recognized conference is key to advancing their career. Simply putting out research without it being published or affiliated with their institution (whether academic or corporate) because it is too risky according to publishing norms is of limited use. Hence it seems unlikely that someone with the knowledge and qualifications necessary to build innovative yet risky AI would have an incentive to share it using an alternative method. There may be a greater incentive for scientists to conduct research that is more likely to be published in accordance with the new, more stringent standards. It is highly unlikely that the risks for underground research are big enough to warrant not moving towards more stringent publishing standards in terms of security and risk in the field of AI. It's also important to note that the field's current exclusivity to those with affiliations to large and wealthy institutions is not in itself positive. We believe the field of AI should be more accessible than it currently is in that regard. 30 Paradoxically, there are questions as to whether it's actually more dangerous if risky AI research goes "underground" than if it's publicized and sanctioned by highly regarded institutions and publications. If an AI research submission is rejected by a publication, then it's likely that it'll receive minimal attention. However, if risky AI research were to be published by these journals and publications then it's likely it may do significant harm because people (especially academics and fellow researchers) tend to refer to these sources for the latest relevant information about progress in the field; under the assumption that the information published has been peer-reviewed. It's this same research that often gets widespread media coverage as well. This follows from the principle of using "strategic silence" as a way of limiting the oxygen that is provided to mis-, dis-, and mal-information by not offering it a platform by way of discussion, time, and resources. Furthermore, general audiences are less likely to scrutinize research sanctioned by a journal or an institution. In other words, when risky papers are published, we may not be able to count on public scrutiny to highlight its dangerous possibilities. There have been numerous instances of popular media coverage for dubious research stemming from low-quality journals, preprint servers, and other places that had a significant impact on the public's perception of AI. As an example, research that claimed to create a "gaydar" identifying a person's sexual orientation was thoroughly debunked by leading researchers in the field of AI, but the harmful and 32 offensive work still received significant media attention. We posit that this might be because 3334 of the highly technical nature of the field (not that this isn't the case in other domains, but there is a disproportionate attention paid to the advances in this field while the ability of the general public to parse the advances for what they truly are might not be sufficient) which leads to an overestimation in the capabilities of the systems. The fact that such research was published-and thus, endorsed by the publication or institution behind it-created a false sense of security and legitimacy for many people. Papers that are published are usually revised and reviewed, and this process, along with the metaphorical seal of approval from the publication/institution, can obfuscate the overly risky nature of the results being published. It essentially shields the paper from any scrutiny because, at first glance, the paper has all the characteristics of an acceptable or even outstanding paper. Thus, getting widespread public scrutiny, especially from individuals outside the AI community, is likely to be quite difficult as nothing seems particularly questionable. With this in mind, there is a responsibility on the part of the journal's editorial board to have experts outside of the field review papers in order to identify errors or risks with the research. Homogenous editorial boards (i.e. dominant avatars of society), don't have the required perspective to ask essential, critical questions like that of a diverse board. In this sense, a diverse board will be able to provide additional points of failure for controversial scrutiny, before even requiring any public intervention. Hence, stricter publication norms that include such a board would be desirable. In addition, one of the mechanisms trialled with NeurIPS and ICML over the last few months is the concept of ML Retrospectives (co-organized by Abhishek 35 Gupta, one of the authors of this document) where researchers are asked to reflect on their prior work and identify shortcomings, improvements, changes, and any other comments that they have on those papers to highlight the need to revise understandings of discoveries over time as new evidence and knowledge comes to light. Normalizing the "owning" of faults in prior publications can move accountability back to authors, no matter their prestige, leading to a healthier intellectual ecosystem. Research progress may be slowed to allow for more thorough risk evaluation. This may affect the quick rate at which AI papers are currently published. Nevertheless, pausing the influx of information on AI can provide more time to dissect, digest, and process research. Furthermore, a stricter process to pass, and one that takes time, could discourage researchers against cutting corners, which could lead to having their paper rejected and making the process unnecessarily long. This can then help guard against unnecessary research being proposed. One of the trends of taking rejected submissions from one conference or journal and making quick, minor tweaks and pushing it out to the next conference is a particularly problematic scenario which encourages the behaviour whereby researchers attempt a "keep trying to till you win" mindset that has the potential to lower the quality of overall scholarship in the field. Making explicit perhaps where this idea was submitted before and reasons for rejections, how those shortcomings were addressed can serve two purposes: improve allocation of scarce reviewing resources in the field and push higher standards of transparency in the field. Cross-linking submissions on a platform like OpenReview (acknowledging though that there are many caveats on the platform) can help us move in that direction. Some publishers may feel as if they have lost an element of control over what they publish if more stringent norms are put into place. They may also perceive these new norms as constricting what they traditionally viewed as "worthy" of scientific regard. Whether true or not, many publications perceive themselves as being apolitical; taking an explicitly agnostic stance towards scientific developments. Therefore, norms that have an arguably social and/or political bent could present a fundamental challenge and potentially give rise to an increase in the bifurcation of journals split on social, political, or other lines. There are a host of concerns that must be accounted for in the case that an external body-such as a journal, a norm-enforcement committee, an ethics board, etc.-oversees the release of research findings; one being the importance of equal opportunity and diverse representation in the technological sphere. This standard not only ensures that the domain of technological research remains democratic, but also prevents it from becoming an echo chamber or a site of informational homogeneity. In order to frame best publication practices for high-stake AI-related research, we must ensure that this set of practices is predicated upon a firm anti-oppression mandate, whether this oppression is related to socioeconomic status, race, gender, ethnicity, mental/physical disability, or any other characteristic that incites unjust discrimination. Responsible publication norms, by their very nature, must not only strive to mitigate harms related to the exposure of certain research findings, but also to inhibit the epistemic threat of information suppression under the guise of best practices. In order for a novel set of publication practices to be truly responsible in nature, it must be acknowledged that while a regulatory framework for AI research has the potential to reduce threats to public safety, it may also inadvertently serve as a site of gatekeeping, suppression and other forms of exclusionary practice. Such a framework therefore risks consolidating epistemic injustice within the academic, technological and political spheres of influence, the prevention of which will require a defined set of safeguarding measures. Injustice can be mitigated through the use of radical transparency where the publisher maintains an open list of papers that were rejected and the reasons for rejection in a public repository that is subject to public scrutiny and analysis preventing biases against any groups. To ensure that reactive measures for harm mitigation (i.e. flagging papers post-publication) are not discriminatory in nature or grounded in any form of identity-based prejudice, a more robust set of flagging requirements must be established. In addition to allowing the user to select a general reason for flagging certain content, the process could be made less enticing and/or gameable to those with unfounded reasoning and discriminatory motives by requiring a more exhaustive and substantiated explanation as to why the content should be removed. To this end, making the process involved enough to prevent trolling behaviour can be one proposed approach. In the case of suspected discriminatory practice throughout the publication process, there ought to be an appeal protocol in place whereby subjects of discrimination can formalize a complaint against the relevant publication or platform and negotiate reconsideration of their research by a designated secondary screening committee. Such a protocol may include an instrument of a similar nature to that of the checklist or DREAD-like (Damage, Reproducibility, Exploitability, Affected users, Discoverability) scoring principles . Rather than measuring security threats, this 36 instrument could assess and systematize the long-term personal impacts of discriminatory publication practices whether they be social, professional, or economic in nature. Within a novel regulatory framework for responsible release norms, this impact checklist could be appealed to and taken up with a third party entity in instances of publication prejudice; this entity having authority over the reconsideration of initially rejected research, along with the pursuit of certain disciplinary measures if necessary. Such an organizational framework could serve to encourage publisher accountability, caution against future discriminatory practices, and ultimately preserve epistemic justice in the domain of AI research. More staged or closed publication practices pose a threat to democratized exposure under the guise of responsible release norms, which may result in the omission of crucial information or gatekeeping. For instance, if research unveils potential risk of a given AI/ML model and there exists a conflict of interest between the publication platform and stakeholders invested in the model, the pretence of "alternative release strategies" could serve as something of a loophole for these stakeholders to suppress certain information and preserve their interests (whether financial, social, etc.) without being held liable for censorship. The malicious application of release norms not only entails self-interested nondisclosure, but also the suppression of projects carried out by actors who occupy marginalized identities, which could theoretically be falsely attributed to the overstated "harms" that their findings may yield, should they be disclosed publicly. Such scenarios lend themselves to the risk of epistemic discrimination against marginalized groups. They can also have a chilling effect on the field of research and promote underground attempts at recreating the systems without proper controls in place because of this obfuscation of the real impacts of the system, thus potentially leading to more harm. It's clear that additional restrictions and/or guidelines in the field of AI are required because there are risks that need to be addressed without exception. Unfortunately, however, there is no consensus on this in the field. Therefore, more individual researchers, institutions, governments, etc. need to clearly and publicly communicate the potential societal impacts of AI research in order to convey the need for third party regulation. In addition, more researchers and institutions supporting and replicating efforts like the NeurIPS impact statement requirement would make such considerations more standardized across the field. 37 Forming alliances, making connections, and building relationships are the hallmarks of effective community collaboration. One way this can be enacted is through connecting with a publication standard-setting body, whereby concerned researchers and institutions would be able to lobby for greater action on the topic and witness first-hand how the publication norms are created. For example, a potential partnership with COPE could prove fruitful. COPE is an organization 38 committed to educating and supporting editors, publishers and those involved in publication ethics with the aim of moving the culture of publishing towards one where ethical practices become a normal part of the publishing culture. So far, it seems they haven't done any work related to the field of AI, but perhaps they would be interested in expanding their work to account for new ethical issues arising in the technology and computer science space. On a larger scale, a solid foundation for this collaboration within the community is the potential for an institution or process that is recognized worldwide and respected in terms of trust around technology. MAIEI's work on SECure and Green Lighting ML are potential options. These 39 40 initiatives could provide a standardized comparative measure to evaluate the trust in AI systems across the broad range of topics and themes contained within the community, providing a much-needed lingua franca . As previously mentioned with COPE, the field of AI must work towards a new publication culture that prioritizes ethical practices and de-prioritizes progress for the sake of progress. To create safe and responsible AI, the core values and culture of the field must be informed by ethical practices, principles, and guidelines. Additionally, it may be important to place a greater emphasis on the application of the technology after it has been deployed, rather than simply the publishing of the information itself. For example, Canada has developed National AI Standards, which regulate the application of the technology rather than the published content about said technology. This isn't to say that AI publication norms shouldn't be created and followed, but 41 Physiognomy's New Clothes Gaydar" Study and the Real Dangers of Big Data The invention of AI 'gaydar' could be the start of something much worse . The Verge that they should be done in parallel with other efforts necessary to ensure that the science and technology of AI are developed in a more humane manner Committee on Publication Ethics (COPE) SECure: A Social and Environmental Certificate for AI Systems Green Lighting ML: Confidentiality, Integrity, and Availability of Machine Learning Systems in Deployment Canada's CIO Strategy Council publishes national AI standards