key: cord-0613359-894xo302 authors: Silva, Daswin De; Alahakoon, Damminda title: An Artificial Intelligence Life Cycle: From Conception to Production date: 2021-08-30 journal: nan DOI: nan sha: 5233deca38adab9b8c9762edfca3834197030ad0 doc_id: 613359 cord_uid: 894xo302 Drawing on our experience of more than a decade of AI in academic research, technology development, industry engagement, postgraduate teaching, doctoral supervision and organisational consultancy, we present the 'CDAC AI Life Cycle', a comprehensive life cycle for the design, development and deployment of Artificial Intelligence (AI) systems and solutions. It consists of three phases, Design, Develop and Deploy, and 17 constituent stages across the three phases from conception to production of any AI initiative. The 'Design' phase highlights the importance of contextualising a problem description by reviewing public domain and service-based literature on state-of-the-art AI applications, algorithms, pre-trained models and equally importantly ethics guidelines and frameworks, which then informs the data, or Big Data, acquisition and preparation. The 'Develop' phase is technique-oriented, as it transforms data and algorithms into AI models that are benchmarked, evaluated and explained. The 'Deploy' phase evaluates computational performance, which then apprises pipelines for model operationalisation, culminating in the hyperautomation of a process or system as a complete AI solution, that is continuously monitored and evaluated to inform the next iteration of the life cycle. An ontological mapping of AI algorithms to applications, followed by an organisational context for the AI life cycle are further contributions of this article. Artificial Intelligence (AI) is transforming the constitution of human society; the way we live and work. AI is enabling new opportunities for strategic, tactical and operational value creation in all industry sectors and disciplines, including commerce, sciences, engineering, humanities, arts, and law [1] . Private investment in AI was US$70 billion in 2019, of which start-ups were $37 billion, leading up to worldwide revenues of approximately $150 billion in 2020, and this is likely to surpass $300 billion in 2024 [2] . In parallel to commercial interests, AI must be leveraged for social impact with initiatives such as AI4SG [3] focusing on interdisciplinary partnerships for AI applications that achieve the United Nations' Sustainable Development Goals (SDG) [4] and promote public interest technologies for social welfare [5] . Globally, the signs of an AI race are evident, as the world's superpowers contend for leadership in AI research, investment, patents, products and practice [2, 6, 7] , as well as the contentious pursuit of an Artificial General Intelligence [8, 9] . Autonomous weapons, social exclusion, microtargeted disinformation, excess energy consumption, underpaid click-work, and technological unemployment are complex and imminent challenges of AI that require global collaboration, not competition. Fortunately, academia, industry and governments alike are affording an equal importance to AI, data and robot ethics, such as the lethal autonomous weapons pledge [10] , IEEE Ethically Aligned Design (EAD) [11] and the Ethics Guidelines for Trustworthy AI by the High-Level Expert Group on AI (HLEG) [12] . Despite these rapid advances, we discovered that existing work on AI life cycles and methodologies either do not render comprehensive coverage from conception to production, or are limited in the level of detail of each individual phase. D. De Silva and D. Alahakoon, "An Artificial Intelligence Life Cycle: From Conception to Production," arXiv Preprint, 2021. Drawing on this context of increasing impact, influence and thereby importance of AI, across national, social, economical and personal interests, we present the CDAC AI life cycle that characterises the design, development and deployment of AI systems and solutions. CDAC is the acronym of our research centre, Centre for Data Analytics and Cognition. This life cycle is informed by our experience and expertise at CDAC of more than a decade of AI, across academic research, technology development, industry engagement, postgraduate teaching, doctoral supervision and organisational consultancy. A few highlights from our recent work are, Bunji, an empathic chatbot for mental health support [13] , solar nowcasting for optimal renewable energy generation [14] , emotions of COVID-19 from self-reported information [15] , machine learning for online cancer support [16, 17] , self-building AI for smart cities [18] , an incremental learning platform for smart traffic management [19] and a reference architecture for industrial applications of AI [20] . We anticipate our contribution will create awareness, instil knowledge, stimulate discussion and debate that will inform research, applications and policy developments of AI for humanity. Figure 1 illustrates the complete AI Life Cycle, where the shaded parallelograms represent the three phases and corresponding human expertise: 1) Design -AI/data scientist, 2) Develop -AI/ML scientist and 3) Deploy -AI/ML engineer. The size of the AI team corresponds to the size of the project, but a minimum of one AI scientist and one AI engineer is recommended. In the following subsections, the 17 stages are prescribed as lists of activities that must be considered and undertaken for successful completion of that stage. • Define the problem in terms of environment, entities and data • How is the problem currently solved? • What is the flow of the solution (steps/phases)? • What are the alternate solutions and flows? • Where/how are the rules defined for the solution? • What data is collected, how is it stored, what historical data is available? 2.2 Review AI literature -ethics, algorithms and pre-trained models • Knowing the problem and its context, review applicable ethics guidelines, ethics frameworks, the state-of-theart in AI algorithms, models and pre-trained models that have been used to solve this or similar problems • Pre-trained models like AlexNet [21] , ResNet [22] , BERT [23] and GPT [24] are trained on large (billion+) datasets and they can be re-purposed, retrained or fine-tuned instead of building from scratch • Where to look: Google Scholar, research labs (e.g. CDAC) publishing platforms (e.g. Medium), Q&A sites (e.g. Stack Exchange), code repositories (e.g. GitHub), cloud platforms (e.g. Azure) and Social media (e.g. Twitter) • What to look for: Literature reviews, commentaries, letters, articles, op-ed, Case studies, best practices, product/tool documentation, tutorials, demonstrations, API documentation, forum responses, upvotes and likes • All data sources as a Single Version of Truth (SVOT) -data as digital representations of the problem description and potential AI solution • A unified data warehouse, data lake or data lakehouse setup • Identify, record and review: data access, data ownership, data stewardship, metadata, data ethics, governance and regulations • Mapping the granularity and inter-relationships of the data • Checking and validating data quality • Compare industry benchmarks and algorithmic baselines for similar problems • What are the expected formats of output? • What is the expected turnaround time? • what is the frequency of use? 2.15 Operationalise using AI pipelines (MLOps, AIOps) • Moving from standard deployment to AI pipelines using containers and microservices • A microservice is an application with an atomic function (e.g. classifier or regressor), although it can run by itself, it is more effective to run inside a container, so the two terms are used interchangeably. • A container bundles code and dependencies together, to provide reusability, reliability and time-efficient installation Although classification is a type of prediction, we have separated the two, but detection is grouped along with classification. A further deliberation; • Prediction -regression, classification, time series, sequence • Classification (also detection) -object, anomaly, outlier concept • Association -clustering, feature selection, dimensionality reduction • Optimisation -scheduling, planning, control, generation, simulation And finally in Figure 3 , we present an organisation context for this AI Life Cycle by positioning it within the flow of activities and information from organisational strategy to decision-making. The 17 stages have been condensed into five technical functions (in blue) to align with the organisational functions, depicted in gray. In this brief article, we have presented the CDAC AI Life Cycle for the design, development and deployment of AI systems and solutions. It consists of three phases, Design, Develop and Deploy, and 17 constituent stages across the three phases from conception to production. We anticipate the AI Life Cycle will contribute towards awareness, knowledge, and transparency of AI and its capabilities. The ontological mapping of AI algorithms to applications condenses all algorithms into four primary capabilities that will facilitate informed discussions between AI scientists, AI engineers and other stakeholders during the solution development process. The organisational context of the AI life cycle further integrates the AI team and their solutions with other stakeholders, such as senior management and the executive in working towards an organisational strategy. What can machine learning do? workforce implications The race to the top among the world's leaders in artificial intelligence Ai for social good: unlocking the opportunity for positive impact United nations sustainable development goals Power to the Public: The Promise of Public Interest Technology World order is going to be rocked by ai-this is how Artificial intelligence and global power structure: understanding through luhmann's systems theory The race for an artificial general intelligence: implications for public policy Introduction: Aspects of artificial general intelligence Lethal autonomous weapons pledge The IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems. Ethically Aligned Design Bunji -a good friend to chat Solar irradiance nowcasting for virtual power plants using multimodal long short-term memory networks Emotions of covid-19: Content analysis of self-reported information using artificial intelligence Can online support groups address psychological morbidity of cancer patients? an artificial intelligence based investigation of prostate cancer trajectories Machine learning to support social media empowered patients in cancer care and cancer treatment decisions Self-building artificial intelligence and machine learning to empower big data analytics in smart cities Online incremental machine learning platform for big data-driven smart traffic management Toward intelligent industrial informatics: A review of current developments and future directions of artificial intelligence in industrial applications Imagenet classification with deep convolutional neural networks Deep residual learning for image recognition Bert: Pre-training of deep bidirectional transformers for language understanding Improving language understanding by generative pre-training