key: cord-0058798-fncxsese authors: Avancini, Rodrigo; Silveira, Fábio Fagundes; Guerra, Eduardo Martins; Andrade, Pedro R. title: Software Visualization Tool for Evaluating API Usage in the Context of Software Ecosystems: A Proof of Concept date: 2020-08-24 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58817-5_26 sha: 1569860fb7aa8875c57c57532e667f71b15947d7 doc_id: 58798 cord_uid: fncxsese Software Ecosystem (SECO) is a consolidated research area in software engineering, emerging as a paradigm for understanding dynamics and relationships among software systems that collaborate with each other to achieve their goals. Understanding the ecosystem and how its elements interact is essential for software evolution, especially for those that provide functions and services for other systems, such as software APIs. Once an API is being used by different software, future changes need to be made in a systematic and appropriate manner, considering the whole ecosystem. However, there is a lack of formal and effective ways for APIs evaluation in the context of SECO. Thus, in this paper, we present Ecolyzer, a prototype tool that aims to support the analysis of API usage considering its ecosystem through interactive visualization. To demonstrate the feasibility of our tool, we conducted a proof of concept (PoC) using an open-source platform API. The results obtained with Ecolyzer are useful and show that the prototype meets the goals described for the accomplishment of this work. Increasingly, software systems depend on other software to leverage their business. This fact occurs due to the vast availability of specialized software and components that have already established themselves in their respective domains. There are rare cases in which software is developed from scratch without reuse [25] . Nevertheless, the vast majority of analysis and development support tools do not consider the relationships among the systems and their software dependencies, having a limited view of the impacts caused by the inevitable software evolution [9] . From the perspective of software that provides functions and services, guaranteeing permanence in the market does not depend only on its correct internal functioning, but also on a constant balance among software that use it [24] . This concern led to the study of software systems in a broader context, involving everyone that somehow influence their survival. To investigate the relationship among software products and their dependencies, the term software ecosystems (SECO) emerged in 2003 [3] , referring to a set of software that has some degree of symbiosis among them [20] . Several studies have emerged to model and represent a software ecosystem [21] . Despite the advances achieved so far, there is still no consensus between both academia and industry on the best way to present software ecosystem entities [22] . Also, the lack of formalism and tools to support the modeling and visualization of SECO is hampering the advance of knowledge in the area [15] . An API is a particular type of software that is intrinsically designed to deal with ecosystems. Useful APIs increase the productivity of developers who use them, the quality of the software they produce, as well as the corporate bottom line [5] . However, its evolution represents a high risk for its users since its changes might propagate other changes in the entire ecosystem, requiring additional maintenance by the software that uses the API. In the worst case scenario, the entire API is replaced by a more satisfactory solution. Thus, the evolution of an API needs to be strategically geared toward meeting the goals of its users. Unfortunately, there is a lack of effective ways to analyze APIs considering their ecosystem, which makes it difficult to devise a better design strategy for their evolution [11] . This article presents Ecolyzer, a tool that aims to explore the dependencies of software components considering its ecosystem. The novelty of our approach consists of the employment of visualizations, making it possible to analyze and explore an API that provides functions and services for other software, taking into account all software that use it, thus contributing to its development and evolution. To achieve our goals, we developed a tool that gathers the main ideas proposed in the literature on the visualization of software ecosystems. We adapted them to evaluate the API usage by other software through a proof of concept by using a real software. The remainder of this paper is structured as follows: Sect. 2 presents the background notions related to software ecosystems and an overview of current work involving visualization tools. Section 3 describes our approach for analyzing API usage, considering its ecosystem through an interactive visualization tool. Section 4 presents a proof of concept to demonstrate the feasibility of our approach. Finally, Sect. 5 presents our final remarks and plans for future works. There are many definitions related to software ecosystems. Some of them are more focused on technical aspects of software engineering and software development, while others are more focused on business, encompassing the software product and services, organizations, markets, vendors, and all actors involved [14, 21] . Although both distinctions are important, in this article, we limit our review to solutions available in the literature of software engineering applied to ecosystems. The term ecosystem was originated from ecology, which in short consists of a set of actors, connected to other actors with connections among them, interacting as a system [6] . In the context of software ecosystems, actors are represented by software systems, and the connections are the dependencies among them. According to Barbosa and Alves [3] , the term software ecosystem appeared in 2003, referring to a set of software that has some degree of symbiosis among them [20] . In turn, Lungu [17] defined it as a collection of software projects that belong to an organization and are developed in parallel with it. Later, this definition was refined for a collection of software projects that are developed and evolve together in the same environment [18] . As reported by Goeminne and Mens [9] , SECO consists of the source code together with the communities of users and developers that surround a software. In consonance to Bosch [6] , SECO consists of a central platform with products and applications built using this platform, developed by external developers. These definitions of SECO seem more appropriate when it comes to the software engineering area. They were merged or reformulated as new work emerged, always referring to a collection of software projects, which: (1) evolve together and are maintained by the same developer community [22] , (2) are dependent on each other linked through project artifacts, common contributors, and support communities [25] , and (3) base their relationships on a common interest in a central software technology [12, 21] . Despite the considerable number of definitions, the essence of software ecosystems remains the same as that of ecology: a set of actors and their relationships, where actors are represented by software products, developers, and users communities, and the relationships are the links and dependencies among the actors. In this work, we define software ecosystem, from the point of view of software engineering, as a set of software projects, which use a common (or central) software platform that provides resources for other software, through functions and services. These functions and services may be symbiosis or not, disregarding actors and relationships that can not be obtained in a computationally way, that is, restricted to systems. This definition allows us to study an API considering the entire ecosystem, using a single tool that extracts data, analyzes and presents the central software, all in an automated way, helping in its maintenance, improvements, evolution, and insights about the system, besides helping the evolution in the area of SECO. Despite several studies exploring the characteristics and relationships among software ecosystem entities, there are still few approaches dealing with software ecosystem analysis and visualization tools. This situation is even worse when it comes to automatized tools that lead the entire process from data extraction and analysis to presentation. Basically, a software ecosystem at a lower level is comprised of systems and their relationships. From a business perspective, it can involve internal and external actors, organizations, companies, and other stakeholders [21] . Given these elements, models are created, showing how they are interconnected and how these connections are. Therefore, the relationship is an essential entity in an ecosystem, and it can mean dependencies among components, trade relationships, and collaborations [15] . There are benefits provided by the modeling and visualization of a software ecosystem, such as a way to understand it, enabling analysis, formalization, and providing insights about it [15] . These advantages contribute technically not only at the software level, in its maintenance, improvements, and evolution, but also at the project level, helping in economic and strategic decision-making process [4] . However, most of the SECO visualization tools currently proposed to outline the relationships among the elements of the ecosystem are composed of tailored tools designed for one or a few scientific experiments [9] . Thus, software engineering becomes fundamental to enable the advance in this area [8, 19] . Lungu [18] can be considered one of the pioneers in the development of tools that support the analysis of software ecosystems, enabling visualization and exploration of an ecosystem through an interactive web interface, named Small Project Observatory (SPO). Developed as a web application, SPO extracts information about the ecosystem from data available in the versioning repositories of the projects. The authors call those repositories of an ecosystem as the superrepository. The architecture of SPO is depicted in Fig. 1 . Regarding visualization, SPO presents information taking into account all the super-repository, such as projects, developers, and inter-project relationships. The views are presented in Fig. 2 . Also, in their work, Lungu [18] present a usability study of the tool, showing that the main problems regard to the slowness of the system and the lack of scalability when many components are presented. It was pointed out the low interactivity on the visual elements, the lack of support for users to create their own views, and the impossibility for navigation to check relationships at a lower level, such as calls to methods between software. Although SPO has a simple architecture and limited views and resources, their work shows elements and concerns to be considered when building a tool for analyzing and visualizing software ecosystems. Here, it can be cited the data extraction from version control systems, the statistical analysis of the code, statistics and metrics, and the representation of relevant elements of the ecosystem at lower level [18] . Following a slightly different approach, the tool presented by Goeminne and Mens [9] concerns not only with the relationships that can be extracted from the repositories but also from other data sources, such as mailing lists and bug tracking. For this, they created a generic multi-layer framework that is divided into three layers in which each layer processes and makes information available to the upper layers from the data sources to the application layer. The lowest layer is called the mining layer, which extracts information according to the data source. The middle layer is called the analysis layer, which extracts metrics under the data extracted from the mining layer. Finally, the upper one is called the application layer, which is responsible for presenting the data. Goeminne and Mens [9] provide a resource to evaluate metrics commit to commit, making it possible to analyze in a straightforward way the evolution of the ecosystem. Although the tool proposed by the authors was designed to help understanding the evolution of software ecosystems by providing visualization of metrics through charts, it does not present ways of representing the relationships and dependencies among the systems. Another important work presents a software ecosystem visualization and analysis dashboard tool named SECONDA [22] . It is similar to the Goeminne and Mens's [9] framework and follows the same approach regarding architecture, having modules for reporting, visualization, static analysis, metrics computing, and identity matching. In addition, the tool offers a dashboard with a variety of charts with metrics and statistics, both for projects and developers individually, as well as for analyzing them together. Despite that, SECONDA has not improved the representation of entities and their relationships within the software ecosystem, as its visualization presents only charts with metrics. The work described by Santana and Werner [25] depicts a software ecosystem by visualizing the relationships of related software projects, providing views of relationships among project members, and relationships at the component level of the systems, such as method calls and object declaration between the systems. For that purpose, a procedure that consists of data extracting, data processing, and SECO visualization was built. Other approaches have also been proposed in the literature for SECO modeling and visualization, but either they are not considered specific tools for data extraction and visualization or do not present the ecosystem entities and their relationships. Among the modeling techniques proposed to visualize software ecosystems, the most common are: ad hoc notations, tabular representations, conceptual maps, class diagrams, network graphs, software supply network (SSN), meta-models, and i* modeling [21] . Different from the solutions proposed by other researchers related to the visualization of ecosystems, our proposal focuses on a central system as the main software. We explore its components not in isolation as usual, but considering the software that use them. The approaches proposed in previous works to create modeling and visualization of software ecosystems are powerful and supportive of SECO analysis, but as noted by Jansen et al. [15] , there are still opportunities for enhancements. This section presents a tool for analyzing central software, such as an application programming interface (API), a software development kit (SDK), a library, or any software that provides services to other software, considering its ecosystem. From now on, the term API always refers to the central system of our approach. Common requirements need to be observed in the software ecosystem visualization tools, such as concerns about system modularity, data extraction in version control systems, and other data sources such as bug tracking and mailing list. Besides, features as static analysis, calculation of statistics and metrics, as well as the presentation of the ecosystem -its actors and relationships -must be considered. Thus, a generic SECO visualization tool consists of component architecture, as shown in Fig. 3 . The first step in a SECO visualization system is related to data extraction from some data source. With the extracted data in hand, the second step consists of analyzing such data, which, in the context of SECO, refers to identify the relationships among the involved systems. The third step is responsible for extracting metrics and statistics from the mapped ecosystem. In this step, metrics and statistics should consider all software and their relationships within the ecosystem, and not only a software in isolation as in traditional approaches [16, 25] . The novelty of our approach consists in SECO visualization that different from other works in which the concern is to show the entire ecosystem and their relationships, our tool provides API-centric visualization and all its components, taking into account all the systems that use this API. The advantage of this visualization is to enable the analysis in-depth of the API quickly and intuitively, as shown in the following sections. Although it seems complex at first [25] , data extraction can be done by software repository mining (MSR) techniques, which makes the task feasible. There are currently some MSR APIs available depending on the type of data source [26] . Among data that are possible to obtain from repositories, we highlight the source code and metadata of commits, such as developers, changes in the source code, date of the changes, among others. The metadata of commits makes it possible to study the evolution of systems [9] . Besides, data can be extracted directly from the software's directory structure regardless of a version control system. This data is enough to identify relationships among the software. In the analysis process, relationships are created from the elements of the source code using functions (or methods) or some other resource made available by the components of the central API and calls (or invoking) for these functions coming from other software. The feasibility of this step depends on the source code analysis APIs, which build an abstract syntax tree (AST), providing the code elements, allowing the static analysis of the source code. The challenge of the analysis consists in identifying, for each call of a function from other software, whether it belongs to some component of the central API. Moreover, the relationships can contain information about data provided by version control systems such as authors, code elements creation, and modification dates. After the analysis step, some metrics and statistics in the context of SECO can be determined, such as: (1) the most-used components, the least used ones, Our approach provides a way to analyze an API as a whole, not only in an isolated way. The visualization considers other software and the relationships among its components at the level of elements of the source code, such as functions (or methods), function calls, and other shared resources. For this purpose, the visualization consists of a treemap combined with a heatmap. This strategy provides more dimensions for analysis through visual elements. Figure 4 depicts an example of visualizing an API. In this view, each component is a rectangle grouped by system modules or packages. The area of each rectangle represents the number of operations provided by the given component. The colors inform the number of relationships that the respective component has with other software. The legend on upper right describes the number of relationships. Also, the rectangles have a tooltip for checking the component name and other additional information relevant to SECO. By selecting an API component, it is possible to view the software that use this component and the software components that have some relationship with the selected API component. As can be seen in Fig. 5 , this view differs slightly from the API view (Fig. 4) . The areas of the rectangles represent the total number of calls to operations of some API component, and the colors represent the number of different called operations, disregarding calls to the same operation more than once. If it is necessary to go deeper into the source code, selecting a software component makes it possible to view the code of the API component, side by side, with the code of the component that uses it. Also, this view provides a way to navigate among components by selecting common code elements between the source codes, as shown in Fig. 6 . Aiming to achieve our goals, we developed a multi-tiered software system for data extraction, analysis, and visualization of the API's components, considering its ecosystem, named Ecolyzer. Its architecture, along with the main modules, is presented in Fig. 7 . Ecolyzer was developed as a client-server application and its architecture is based on RESTful web services. The main motivations that led to this architectural decision were simplicity, popularity, and the ease of developing RESTful applications due to the available frameworks [23] . The layers of Ecolyzer are briefly described here since it is beyond the scope of this paper to go deeper into architecture style. The API layer is responsible for extracting data from the repositories, analyzing, and calculating metrics and statistics over the ecosystem. This layer provides database services for data access and storage. Currently, the API layer supports only Git 1 repositories and the Lua 2 programming language. The web application tier (or the back-end) is a web service for data handling and visualization. It is responsible for handling requests and responses from the clients and interacting with the API layer. The provided services can be accessed either by RESTful services or by web templates. Finally, the client tier can be either an internet browser or a specific application that consumes REST services. This layer provides the visualization to the end-users by the web templates, which are HTML files that contain static data and placeholders for dynamic data rendering on the browser [10] . The graphs are provided by the data visualization engine D3.js [27] . To demonstrate the feasibility of our approach, we conducted an exploration of the relationships of a SECO. The main purpose of this Proof of Concept (PoC) was to highlight the usage of the proposed visualizations in a real case project. We expected that the visualizations would aid in analyzing the API usage considering its ecosystem. TerraME (Terra Modeling Environment) is an open-source platform API to develop and simulate spatially explicit dynamic models based on cellular automata, agents, and network models [7] . TerraME provides a Lua code interpreter [13] , as well as functions that ease the development of multi-scale and multi-paradigm models for environmental applications. Contributions to Ter-raME can be encapsulated in packages and might extend its functionalities or provide new resources and concepts through its API. TerraME is part of the National Institute for Space Research (INPE) software ecosystem, which develops and provides a set of systems in the geoprocessing area. In addition to being part of an ecosystem, TerraME has its own ecosystem, specifically focused on environmental modeling. All software of the TerraME ecosystem are publicly available on GitHub, consisting of ten packages developed by third parties and two software that only uses its functions. Among the software, we highlight INPE-EM (INPE -Emission Model), that produces annual estimates of greenhouse gas (GHG) emissions due to changes in land cover in Brazil [2] , and LuccME, an open-source framework for spatially explicit land use and cover change (LUCC) modeling [1] . The extraction process starts by cloning the repositories and setting TerraME as the central system. We cloned the master branches of all software on January 5, 2020. Then, Ecolyzer scans the repositories and creates all relationships. After that, the ecosystem is ready for visualization on an internet browser. Figure 8 shows the heatmap of TerraME components. Unlike object-oriented languages, where functions (or methods) are usually defined within classes, TerraME source files can be seen as components in which functions can be defined without the use of a structure like a class. By convention, TerraME defines most of its functions within Lua tables, which work similarly to classes. On the API heatmap view (Fig. 8) , it is possible to observe that TerraME has four modules composed of 52 components, 520 public functions, and 3658 relationships with other systems. The minimum and the maximum number of relationships among components is one and 256, respectively, as shown in the legend on upper right. Besides, ten components have no relationship at all (gray), eight components have a large number of connections (red), and the remainder has a low or medium number of relationships (blue). Also as a result of the experiment: (1) the component with the most relationships is Timer, used by 180 components from nine software; (2) the system with more components that uses Timer is LuccME, with 38 components and 181 relationships; (3) the system that uses less components is INPE-EM, with only one component and one function call; (4) the component that most uses the Timer functions is the MultipleRuns component of the calibration package, with 26 function calls; (5) most software use only one Timer function; (6) the LuccME software components use the most number of different Timer functions. Figure 9 shows the component with the most relationships and the software that uses it. Our exploratory analysis through visualization has successfully provided important data about TerraME's usage, considering its ecosystem. Besides, we conducted an analysis together with the TerraME team in order to find out if our tool was able to provide more interesting feedback for the team involved in its development. To test the hypothesis that one of the main benefits provided by SECO visualization is to provide systematic insights, we introduced Ecolyzer to the TerraME development team and requested feedback on our tool. No protocol was followed, as we believe it would facilitate insights. We presented only some concepts of SECO, the tool, its functionalities, and how it works. The team's first feedback reported that the TerraLib component, which is a facade for an API to access geographic data developed for the exclusive use of internal components, was strangely being used by two external packages. According to the development team, the functions of the facade is exposed through more specialized components in the most user-friendly way possible for endusers or other software that use TerraME, and should not be accessed directly by other software. The TerraME's development team delivered other reports, such as: (1) "the components supposed to be used a lot are, in fact, extensively used by other software"; (2) "some components in which we spent much work on them are not being used"; (3) "several new components with interesting functionalities have low usage, maybe we should publicize them better"; (4) "we could use this tool to measure the impact of changing any interface, which was not feasible so far"; (5) "we can use this tool to see how the functions are being used by other software"; (6) "this tool might be helpful for new developers to understand how functions are used in practice". This feedback allowed us to identify an important feature for those who analyze an API considering its ecosystem. It consisted of a filter on the functions provided by the components of the API. With this functionality, it would be possible to select a function provided by the component, and the heatmap would display only the software that uses this function, thus facilitating the measurement of the impact of its changes. There are several studies related to software ecosystems, and they can help the analysis and understanding of complex relationships among system dependencies. Furthermore, they come as an alternative to overcome challenges during the design, development, maintenance, and evolution of software, mainly on the ones that provide functions and services. However, there is still a lack of software tools to support software ecosystems analysis and visualization, thus making software engineering a critical research issue in the SECO domain. This paper presented Ecolyzer, a software visualization tool for the visualization and analysis of API usage considering an ecosystem. A brief description of its architecture was presented, which was designed and implemented based on the directives proposed in the literature on SECO visualization tools. The visualization provided by Ecolyzer was reported in detail. A proof of concept (PoC) in a real open-source platform API was conducted, demonstrating that the tool is useful for evaluating an API usage in the context of a SECO, allowing to quantify by who and how the API is actually used. Also, Ecolyzer proved to be useful for measuring the impacts of changes in API functions, providing insights on its usage, thus helping the organization that maintains the API to plan its evolution strategically. As future work, we intend to (1) add a new feature to the tool that enables it to analyze the historical evolution of an API and its ecosystem using the commits metadata, (2) implement support for other programming languages to carry out studies in other ecosystems, and (3) investigate new metrics of API usage in the context of SECO. Finally, we plan to conduct a systematic usability evaluation of Ecolyzer. Luccme-terrame: an opensource framework for spatially explicit land use change modelling Modeling the spatial and temporal heterogeneity of deforestationdriven carbon emissions: the INPE-EM framework applied to the Brazilian Amazon A systematic mapping study on software ecosystems Variability mechanisms in software ecosystems How to design a good API and why it matters From software product lines to software ecosystems An extensible toolbox for modeling nature-society interactions Mapping the systematic literature studies about software ecosystems A framework for analysing and visualising open source software ecosystems Flask Web Development: Developing Web Applications with Python Continuous API design for software ecosystems A longitudinal case study of an emerging software ecosystem: implications for practice and theory Lua -an extensible extension language A sense of community: a research agenda for software ecosystems Scientists' needs in modelling software ecosystems Towards a typification of software ecosystems Towards reverse engineering software ecosystems The small project observatory: visualizing software ecosystems Revisiting software ecosystems research: a longitudinal literature study Software Ecosystem: Understanding an Indispensable Technology and Industry Open source software ecosystems: a systematic mapping Seconda: software ecosystem analysis dashboard RESTful Web APIs: Services for a Changing World Software ecosystems' architectural health: another view Towards the analysis of software projects dependencies: an exploratory visual study of software ecosystems PyDriller: python framework for mining software repositories Data Visualization with D3.js Cookbook The authors would like to thank the World Bank (grant #P143185) and FAPESP (grant 2018/22064-4) for financial support.