key: cord-0058703-rxzsd70r authors: Bollini, Letizia; Fazia, Irene Della title: Situated Emotions. The Role of the Soundscape in a Geo-Based Multimodal Application in the Field of Cultural Heritage date: 2020-08-20 journal: Computational Science and Its Applications - ICCSA 2020 DOI: 10.1007/978-3-030-58808-3_58 sha: 3e0fb0f3682657b8a8815993d689321021ef8724 doc_id: 58703 cord_uid: rxzsd70r Cartography has traditionally been the privileged key to reading an urban territory. The form “map”, in fact, translates in two-dimensionally and with an abstract language the complexity of the space that we perceived and interact with through our physicality. In this process of conceptualisation many of our visceral abilities must be distilled into a form of representation that elects vision as a privileged channel. The possibility of using mobile devices has widened the range of expression and the ways in which we orient people in the environment. But, well beyond this instrumental function, the use of the acoustic channel, for example, opens up design scenarios for the construction of an identity of places that involves users from the point of view of experiential richness. Generally used to give procedural indications and free the user in movements, the sound interface has in itself further potential for exploring the communicative design and reading of the anthropic environment. The paper proposes an interpretation of the urban landscape of the city of Milan and its historical transformation through the narrative key of one of the major Italian novels and the construction of soundscapes that gives back the emotional richness of reality. Based on a holophonic recording, the prototype of the mobile app explores the possibility of extending a space-contextual experience through a multimodal storytelling that generates “sound vision”. not just cognitive beings, but rather emotional and cultural ones. Besides, the way we experience the world around us and socially interact with others is profoundly rooted in our senses aimed to feel and respond to the physicality of the environment. On the one hand, the visceral component is how we react to sensory elements, i.e. what can be perceived by our senses: visual, acoustic, tactile, taste or olfactory. The primordial reactions to these stimuli have kept us safe. They let us "decide" in an almost immediate timespan what is good and what is bad, what is safe and what is dangerous for us and our survival. On the other hand, the reflective level pertains to our conscious response based on our thoughts, reflections, and previous experiences. All the three aspects coexist and influence mutually inhibiting or enhancing each other. Nevertheless, many technologies aren't designed to face all these aspects, although strongly impacting our lives. In particular, the visceral component should be kept in higher consideration. Visual appearance, motion effects, sounds and sensory sensations real or synestheticcan play a huge role in digital interactions. Emotions have been often overlapped or confused with the blurred concept of beauty or aesthetic when discussing design issues and opposed to functionality [2] . To design for visceral level "is actually for affectthat is, eliciting the appropriate psychological or emotional response for a particular contextrather than for aesthetic alone." underline Alan Cooper [3] . That not the place to debate the different positions, but empirical and experimental studies conducted in the digital field have definitively shown that people find that what is beautiful is also useful. Specifically, Kurosu and Kashimura [4] introduced the concept of apparent usabilityopposite to usability [5] to describe the pleasantness that some interfaces were able to convey to users when compared to just usable ones. It has been Tratchinsky [6] together with Katz and Ikar [7] who explicitly referred to beautiful as component appreciated by people when using digital systems aimed to be experimentally tested. Emotional or visceral aspects, therefore, are essential and useful to users when facing the challenge to interact with complex systems and, in particular, with digital technologies. They are, in fact, one of the most powerful means of knowledge of the world. The transition from "fixed position"that means a desktop or laptop computer managed with a keyboard and a mouse/track-padto a poketable connection has profoundly changed the role of digital in our daily experience. Mobile devicesnamely smartphonesare the mediator of our interaction with the world in an onlife [8] ecosystem. But the two-dimensional surface of displays tends to dramatically reduce the potentialities of physical interactions to a singular sense, the vision according to the Graphical User interface pattern and its language and metaphors. Touch devices, as well, use this sense to mediate through a flattened space our experience of the surrounding environment both physical and on-line. Together with the evolution of connection in mobility, new tools are gaining popularity, for instance, smart-glasses and VR visors. The investments made by big tech companies -Facebook in Oculus Drift and Microsoft with HoloLensreveals a grooving niche opening new design opportunities and challenges in a blended environment where virtual, augmented and mixed reality are promising field to be explored. The revamp of virtual reality after the decline of the last almost 30 yearsis opening the opportunity to create richer interactions. The 3D simulation (or based on real photos) let people explore a simulative space where reality can be replay or (re)invented [9] . The point is not if virtual is real or vice versa, but rather that virtuality is, again, involving our senses, proxemic and synesthetic perception. Besides, starting from the second generation of mobile devices up to smart home devicessuch as Alexa in 2014in recent years voice interfaces [10] have opened the possibility to interact in a verbal and dialogic way, bringing the acoustic component among the project variables. Environments also have their own sound language, i.e. they produce a significant and recognisable identity in the construction of the urban experience. Already in 1969 Southworth introduces the concept of soundscape in his research on the relationship between acoustic and visual perception of space: "Two aspects of the soundscape that appear particularly important in city perception were central to the study. First, we evaluated the identity of the sounds, including (a) the uniqueness or singularity of local sounds in relation to those of other city settings and (b) their informativeness or the extent to which a place's activity and spatial form were communicated by sound." [10] . The experimental study he conducted in those years is even more significant for the type of subjects involvedblind or deafwhich emphasise even more the close relationship between the different factors of urban perception. The theoretical definition of the concept of "soundscape", however, is attributed to Canadian composer Raymond Murray Schafer in the 1970s. It indicates everything that makes up the acoustic environment. The soundscape is made up of all the acoustic, natural and artificial resources within a given area from the environment. A sound "photograph" of the environment. According to Schafer, the soundscape is generally composed of 3 different elements: • Keynote sounds: This is a musical term that identifies the key of a piece, not always audible "The character of the people living there". They are created by nature (geography and climate): wind, water, forests, plains, birds, insects, animals. In many urban areas, traffic has become the keynote sound. • Sound signals: These are foreground sounds, which are listened to consciously; examples would be warning devices, bells, whistles, horns, sirens, etc. • Soundmark: This is derived from the term landmark. A soundmark is a sound that is unique to an area. The elements have been further defined as to essential sources [11] . A soundscape is a sound or a combination of sounds that are formed or come from an immersive environment and therefor based in three elements: geophony, biophony and anthrophony [12] [13] [14] . To better understand the concept these concepts, it is possible to refer to Krause's studies. An American musician and ecologist of the soundscape, who during a TED Global, explains the three music sources that contribute to the composition of a natural soundscape: "The soundscape is made up of three basic sources. The first is the geophony, or the nonbiological sounds that occur in any given habitat, like wind in the trees, water in a stream, waves at the ocean shore, movement of the Earth. The second of these is the biophony. The biophony is all the sound that is generated by organisms in a given habitat at one time and in one place. And the third is all of the sound that we humans generate that is called anthrophony. Some of it is controlled, like music or theater, but most of it is chaotic and incoherent, which some of us refer to as noise." [15, 16] . Moreover, in relation to these 3 elements of the soundscape, Krause states that there is an important interaction between biophony and the other sources of sound: geophony and anthropophony. For instance, several studies, particularly those in process by Nadia Pieretti at Urbino University, have shown that birds alter their vocalizations to accommodate themselves to urban noise. And killer whales (Orcinus orca) do the same with boat noise in their marine environments. According Bernie Krause "The soundscape concept consists of what I call signature sources, meaning that each type of sound, from whatever origin, contains its own unique signature, or quality, one that inherently contains vast stores of information. That individual signature is unlike any other. So, also, is the natural soundscape unique in its collective state, especially as it becomes the voice of an entire habitat. […] I use the resulting term, soundscape ecology, to describe new ways of evaluating the living landscapes and marine environments off the world, mostly through their collective voices [17] . The premise behind this concept is the fact that the sense of hearing is the experience and knowledge of the surrounding reality. So, the soundscape is part of the context that affects human existence. This relationship between sound and place is a way of thinking about everyone's responsibility in the "composition" of the shared sound landscape [18] . This "reading of the world" through a specific acoustic scenario turns out to be a fundamental element for the construction and success of the project. Through the activation of the sound, the user is transported in time and space in another era, thus allowing a totally new use of that place. In this case the effect that we try to reproduce is narrative: evoking situations and contents extraneous to the user. The intent is to create a plausible environment from an acoustic point of view. This is possible thanks to the creation of a soundscape composition [19] . Artistic experimentation has explored the possibilities and influences of environmental sounds on musical compositions since the 1960'. On the one hand we have the recording practices and the innovative effects introduced by Lee "Scratch" Perry: "He buried microphones under trees to get a different sound, ran tapes backwards, used found sounds and techniques which 20 years later would be called sampling, and blew ganja smoke over tapes." [20] . On the other side are the sound installations and ambient music developed by Brian Eno. Although starting from opposite approaches, in both cases the room becomes the element of acoustic production capable of suggesting the sound mood and arousing emotions. In more recent time, the soundscape composition as a form of electroacoustic music, has been experimented at Simon Fraser University during the World Soundscape Project (WSP). This melody is characterised by the presence of recognisable sounds and environmental contexts, in order to evoke the listener's associations, memories and imagination related to the soundscape [21] . The word "holophony" from the ancient Greek holos means "everything" and phonia means "sound" and describes the particular recording technique created by Hugo Zuccarelli, [22] an Argentinian scientist and Umberto Maggi, an Italian musician. They wanted to emulate the idea of holography to a sound level. This, almost hypnotic effect, cannot be perceived with headphones nor by the classical stereo arc, but "out of the mind", almost in the exact spatial coordinates of recording. In this case we can talk about binaural registration. Binaural recording (with two ears) is a three-dimensional recording method of sound that has the purpose of optimising the recording for its listening through headphones, reproducing as faithfully as possible the acoustic perceptions of a listener located in the original environment of sound event, maintaining its 360°spherical directional characteristics. The holophonic principle develops the process of sound perception as it is performed by the human auditory apparatus. At a technical level, this sound modifies some classic parameters in the audio recording system: instead of using two microphones, one for the right channel and one for the left channel, as in the case of the stereo, a plastic head is used in which are inserted, in correspondence to the auricles, of the microphone capsules called "holophones", capable of capturing the sound coming from any direction, all trying to simulate the auditory capacities of the human head. In this way the brain introduces a slowness in the auditory reception: if the sound is perceived on the right, it will be perceived on the left only after a significant delay, causing the head to act as if it were a resonance box (Fig. 1 ). The first to use this type of technology, commercially, were the Pink Floyd in the song Get Your Filthy Hands Off My Desert, in The final cut, the twelfth studio album released on 21 March 1983. The Final Cut was recorded using "holophonics" -an audio processing technique used to enhance the aural three-dimensional 'feel' of the recording. Holophonics was used to make sound effects on the album appear more three-dimensional to the listener -sound effects, particularly when heard on headphones, appear to pass not just from left to right in the stereo spectrum but also from in front to behind of the listener and around them. Perhaps the most notable use of holophonics on the album is on the song Get Your Filthy Hands Off My Desert -during the intro, an airplane is heard to fly swiftly overhead, passing from in front of the listener to behind them, before a huge explosion from the bomb it has dropped is heard surrounding the listener both in front and behind them and to either side of the stereo picture. The use of this technique was in keeping with Pink Floyd's long-standing interest in using atmospheric sound effects combined with advances and innovations in audio technology to enhance the listener's experience of their music. It was also claimed that this process could not be duplicated if one made a copy of the recording i.e., copying from the vinyl record to tape cassette [23] . Despite the great innovation, this type of technology is known by very few and has never reached the apex of success, probably due to system difficulties: the inconvenient use of the plastic head and the obligation to use headphones to warn the effect. The Betrothed Next is the evolution of a previous case study developed between 2013 and 2016 [24] . The project was born from a research and analysis work related to the Lombard cultural heritage and more specifically to the Manzoni's novel "I Promessi Sposi". The two previous projects The Betrothed 2.0 [25] and 3.0 explored augmented reality applications and storytelling using geo-referenced information on historical maps [26] . This last phase explores the synesthetic and sound aspects of multisensory interweaving with urban space and historical and cultural heritage [27] . The project is based on the discovery of the Milan urban evolution, thanks to the historical stratification based on a visual time-machine. The project, in particular, explores new approaches to the reading of cultural heritage and its "exhibition" [28] . The interpretative key, in fact, wants to reintroduce the emotional and sensory aspects since it is an urban environment that can be actively explored. The key is threefold: the visual [29] , spatial and acoustic dimensions experienced both directly and synesthetically [30] through the mediation of digital technology. While the reading plans are two: the narrative of the writing that transposes the contemporary in a metaphorical key and the plot of the novel, which in its historical transposition reconstructs the urban and social antecedent. Following in the footsteps of one of the main characters the user can in fact walk along the streets of Milan discovering, through the use of mobile devices, the cultural and architectural change that occurred through three reference periods: the historical setting of the novel (17th century), the historical period of the author (18th century) and the contemporary city. In the development of the project, two of the 5 senses have been taken as reference: sight and hearing. In both cases, through the three historical periods, the user has the opportunity to discover and learn about the change in the city. In the first case through Augmented Reality, in the second thanks to the use of special sound effects, created emulating the holophonic recording. The main objective, particularly linked to this latter approach, is to implement is the "sound vision" of each century. Thanks to this expedient, the subject will learn to let him/herself be transported in the various eras not only visually (concept already successfully conceived and developed several times), but also from an acoustic point of view. These two elements joined together through the use of digital storytelling, allow a double immersion (Table 1) . The first step to understand the project's viability has been to assess the developed prototype. The subjects involved in the research were asked to perform a Task Analysis and more specifically a summative test, in order to verify the effectiveness of interaction between the user and the interface. To better understand the user's point of view, the Thinking Aloud method was used (or TA): each user was invited to talk about his/her actions and thoughts while using the app. Before performing the test, it was evaluated level of knowledge and familiarity with digital technology, throughout an interview. The technologies referred to are that of augmented reality, 3D maps or itinerary and the use of a virtual assistant (e.g. Siri by Apple, Cortana by Microsoft or Amazon Alexa). In the latter case, the user did not use this type of intelligent technology as a support for research, but acoustic interfaced with a sound element. For the effective research of users and in particular for the number of subjects chosen, we referred to Nielsen's research: "Some people think that usability is very costly and complex and that user tests should be reserved for the rare web design project with a huge budget and a lavish time schedule. Not true. Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford" [31] . Nielsen explains how there is no need to find large budgets or a large number of users for the success of an effective test, where it is not necessary to analyse several different groups. This is because after a certain number of subjects, the actions will be repeated more or less all in the same way, without making any truly significant intervention. Such a limited number of subjects is useful when testing users belong to different personas, as in the following case. For the test phase, the selected targets belong to three different archetypes, for this reason it is advisable to find 15 users, 5 for each group: • Milanese citizens: from subject 1 to subject 5 • Tourists: from subject 6 to subject 10 • Letter teachers: from subject 11 to subject 15 Each of them was subjected to a persona/scenario-based user test with 5 specific tasks to complete: 1. Listen carefully to the audio and explain what the purpose of the app is for you. 2. Find out how Manzoni describes the Bastioni of Porta Venezia. 3. Observe the architectural change of Piazza Duomo and discover the etymology of the word "Rebecchino". 4. Listen to the sounds and noises belonging to the Lazzaretto, starting from the century closer to and going backwards. After each sound you try to explain what feelings you've tried. 5. Where do you think the Lazzaretto is located? Try to give at least a couple of points geographic references (e.g. Piazza Duomo, Galleria). By viewing the maps, starting from the farthest century, you try to read today's map and understand where the hospital is located geographically. Objective measures were collected for each activity, first of all the success rate or the percentage of activities that each user is able to complete. In addition, for tasks no. 2, 3, 4 and 5 the time it took for each tester to perform the task was verified. The Table 3 shows all the data relating to the achievement of the tasks and the relative timing. As the professor Roberto Polillo affirms in "Facile da usare. Una moderna introduzione all 'ingegneria dell' usabilità", it is possible to calculate the overall success rate, through the formula [31] : = n: tot di task ¼ S equals the number of successfully executed tasks, P to the number of partially executed tasks, which conventionally will be counted as half of a total success, finally F represents the number of tasks never completed. The overall success rate of the system presented is 97%. Through this data it is possible to obtain a significant indication on the usability of the app. For a better interpretation, however, it is necessary to use some information relating to users. As already explained above, each tester was asked what their relationship with technology was and what, specifically, the apps they use. Most users have shown themselves to be in favour of the use of each proposed app, responding to use each of these functions daily (Fig. 2 and Table 2 ). • Have you ever used voice testing (e.g. WhatsApp)? • Have you ever used a 3D map/route system to reach a place? • What type of device do you use (smartphone/tablet/kindle/pc/smartwatch) • What type of operating system do you use? (IOS/Android/OS/Windows) • Have you ever used some kind of Voice user interface? (Siri/Cortona Alexa/Google Assistant) Observing the general data obtained by using the calculation of the response time, it is possible to affirm that the application, as a whole, has a good degree of usability and UX. However, by analysing in more detail the result that emerged from the success rate, we can see that 4 out of 5 tasks were performed by all users without difficulty. Otherwise, task number 3 appears to have been more difficult to complete. Through the collected data it is possible to highlight how 3 out of 15 users have only partially achieved the task, while one user has failed in the task. Through the analysis proposed by Professor Polillo and his formula, we obtain that this task had achieved a success equal to 83%. This result does not seem to depend on a lack of confidence with the smartphone or with this type of application, because as already analysed, 100% of users use functions similar to those present in the task analysis (Fig. 3) . The subjects, whose result is equivalent to a partial success, apparently did not consider the element "other" to be important, because when they reached this screen they continued to move between the various centuries without however completing the second part of the task "[. ..] discover the etymology of the word "Rebecchino". Let's examine the average duration used for the execution of each task. Task n.2, despite having been completed by all users, is the most difficult based on the average execution time. This "slowness" of operation was dictated by the type of iconography used. The book, in fact, represented as an element attributable to the Manzoni's novel, does not seem to have created any kind of assonance. The users, as verified after a more detailed conversation, having never seen this type of image, have never developed a precise cognitive psychological connotation. On the other hand, Tasks n.4 and n.5, have been much simpler and more immediate, precisely because the required task was based on two widely recognised iconographies: headphones for task n.4 and the symbol of geolocation for the task n.5. In addition to the analysis relating to the timing, it is interesting to check what the responses related to the second part of task no.2 have been: "[…] After each sound, try to explain what feelings you felt". By analysing these answers, it is possible to observe how in this case what emerges is a subjective and emotional aspect. Although the request is very clear, users are able to express the sensations they experienced only by listening to the latest audio, that relating to the seventeenth century. Each subject, in an unconscious way, passes from defining a clear, defined and concrete situation, to the explanation of totally abstract concepts, such as fear. Listening to the sound belonging to the 21st century (Road noise, multi-ethnic music) users do not express their feelings, but clearly explain what they listened to: multi-ethnic music, sounds of road works, sounds of machines, and so on (Figs. 4 and 5 ). In the case of 19th-century sounds (shop noises and the trains whistle in the distance), users do not define the sounds, but give physical references to each of them: animals, steam train, bells, church, market, etc. In both situations, users are unable to connect a specific feeling because, probably, none of these sounds can transmit a situation extraneous to their daily lives. The two soundscapes, although two centuries apart, do not emerge as such, rather they are considered in the same way: contexts of daily life. The sounds of the seventeenth century are the only ones that manage to convey a strong and predominant emotion, in this case in fact, no more physical The subjects claim to feel emotionally linked to fear, anxiety, loneliness and suffering. Through an analysis of the contexts, it could be affirmed that this exponential estrangement and this trend in moving from concrete to abstract depends on the situations experienced by the user. In the course of life, in fact, it is assumed that each subject experienced situations similar to those presented both in the 21st century (a busy street, characterised by the noise of road works and ethnic music from some bars) and in the 19th century (a market fruit and vegetables located near a train station). Precisely for this reason, each user is able to view the immersive context presented by the two soundscapes, defining a precise context. Hardly, however, will have experienced a situation like that of the Lazzaretto in the seventeenth century (a field of dying people affected by the plague). Having never experienced situation like that, and therefore difficult for him to visualise and physically connotate, the external user experiences what he feels through emotions. In parallel to this first qualitative evaluation, it would be interesting to expand the test by involving other parameters. Besides, the correlation between the impact of the sound experience relative to the single place was investigated, using the emotions aroused by the different historical thresholds. A wider investigation could explore, vicevesa, the relationship and frequency of words expressed in relation to places. The transition from systems of interaction based mainly on the fixed visual component to a wider and richer world allows a communicative range more similar to our sensorial richness and perception of the world around us. On the one hand, the forms of reality mediated by technologyaugmented, virtual, mixed or blendallow an exploration that tries to reproduce the three-dimensionality of the world or to enter into a direct relationship with it. On the other hand, the perspectives opened up by the Voice User Interfaces and, in general, by the use of the acoustic channel offer a range and a language to explode to designers. The combination of these two factors in mobility creates the scenario to transform space-related data into a narrative concept to be explored using one's own senses. An opportunity to discover a spatial identity played on new parameters -the soundscape -and not only on the "image of the city". Besides, the three-dimensionality of the holophonic technique also plays on the role that spatiality and haptic interactions play in the complexity of our knowledge process. The situational context, the plasticity of the experience reconstructs the emotional connection with the territory and its genius loci in a process of transformation of data into experience, of space into a place, of history into a narration. Emotional Design: Why We Love (or Hate) Everyday Things. Basic Books Beautiful interfaces. From user experience to user interface design Apparent usability vs. inherent usability: experimental analysis on the determinants of the apparent usability Usability inspection methods Aesthetics and apparent usability: empirically assessing cultural and methodological issues What is beautiful is usable The Logic of Information: A Theory of Philosophy as Conceptual Design Experiencing the cultural heritage of a place Designing Voice User Interfaces The sonic environment of cities The soundscape: our sonic environment and the tuning of the world The great animal orchestra: finding the origins of music in the world's wild places The anatomy of a soundscape Soundscape ecology: the science of sound in the landscape The Next Tech: Bernie Krause, l'uomo che da 45 anni registra i paesaggi sonori del mondo Biophony Soundscape ecology plunges us into a wilder world beyond the mundane and merely visual ABC: Qué es el audio 8D, la última moda sonora del "todo está inventado" que produce "orgasmos sonoros Walking into the past: design mobile app for the georeferred and the multimodal user experience in the context of cultural heritage User experience & usability for mobile georeferenced apps. A case study applied to cultural heritage field The time machine. Cultural heritage and the geo-referenced storytelling of urban historical metamorphose Sensitive environments. Spatial interactive technologies for preserving cultural heritage Reshaping exhibition & museum design through digital technologies: a multimodal approach Knowledge sharing and management for local community: logical and visual georeferenced information access Sinestesie nel progetto di comunicazione Why You Only Need to Test with 5 Users Facile da usare Acknowledgments. Although the paper is a result of the joint work of all the authors, Letizia Bollini is in particular the author of Sects. 1, 2 and 5; and Irene Della Fazia is the author of Sects. 3 and 4.