Tiani: Interaction between People and Environment through Music

Musification on Human Environment Interaction with smart phones

Yi WU, Yongliang He, Tejas Rode

Fall 2017

Introduction

Tiani, an iOS based generative music application I did in Fall 2017 with Yongliang He and Tejas Rode. The core idea of Tiani is that we believe every environment has its own identity, and that every person in the environment contributes towards building it. Vice versa, the people who are in the environment are also influenced by their surroundings. We tried to musify the process of individuals shaping, and being shaped by, their environment. The result is an iPhone app that generates music in real time taking cues from the user’s voice and their environment. At the core of the app, all the processing is done using pure data. There are three synth engine in the application: one is an subtractive synth in charge of environment related music generation; the other is a sinusoidal re-synthesis synthesizer in charge of user voice related music generation layer; the last but not least is a granular synthesis based “rain synth” in charge of rain sound generation.

Video Demo

As the app is loaded, the user first is required to speak something for 3 to 5 seconds. The app extracts the user’s tonal information such as timbre (sinusoidal re-synthesis) and pitch variation. Using the user’s vocal feature and the user’s location while they are using the application, the app generates music that the user can listen to while exploring various environments. There are also three sliders for the user to control the amount of importance for voice signature, the environment components and rain effect that contribute to the overall musical output.

System Design

The system consists three parts: environment sound section, user sound section and weather sound section. We collect current GPS location, user’s walking speed from iPhone’s sensor, and current place type information from Google Map API. Place type defines the overall “color” of the music it will generates as for different place type, we will have different presets of MIDI scales, synthesizer setup so that restaurant should sounds all similar. And because the location (X, Y coordinate) are different, a restaurant in New York will not sounds same as a restaurant in Atlanta. However, they should sound similar. We also extract a signature from the user’s voice (us- er is asked to record several seconds of their voice on startup of the application.) User’s signature defines the overall texture of the user sound layer. Environment sound and user sound are independent each other but environment information and user information interact each other’s’ layer so that the sound layers though separated but sounds interdependent: user’s walking speed affect the play speed of environment sound; place type also changes the timbre of user sound layer in some degree. Finally, we get weather information from weather application API. The weather information will trigger the rain sound synthesis depends on the weather condition.

Sound Engine Design Challenges

The biggest challenge in building the sound engine design was to meaningfully represent the interaction between people and their environment through sound. It could have been easily done by having two sounds, one for the environment and another for the user, and continuously looping them over time, while also using the information from the environment to control the effects added to the user’s sound, and vice versa for the environment’s sound. However, we were aiming at more interactivity and engagement, and also more generative elements in the sound, so that the user does not get bored over time. Therefore, we designed two separate sound engines for the user and the environment, thus generating sounds to represent them in real time, independently. Through this duo sound engine-design, we extended generative sound control to produce interesting results that afford both the sub-engines the freedom in their own generative music, while at the same time bonding them as a whole to emphasize the interaction between the environment and the user.

The second challenge was ’musification’ of the environment and the user. This challenging can be viewed from two aspects. For the environmental aspect, we have the following challenges. First, the user has to be able to identify the relationship with music and the environment that it represents to some extent, while at the same time ensure that it sounds pleasing, or, at least, does not bother the user over prolonged usage. In the previous section we have mentioned about our first user testing. Although it was intended to find out whether the first version represented the environment well, we noticed an interesting psychological phenomenon. People tend to pick sounds that they may have heard before in an envi- ronment rather then the sound that represents the feeling of the environment. From this understanding, we thought of a new way to ’musify’ the environment by only representing the feeling of the environment rather than mimicking the sound that could occur in the environment. We have designed it this way in order to actually musify of the environment rather than simply using the sounds that users might have heard before in the environment. Though the user might initially be unable to fully grasp the musification of the environment in the way we have done it in the application, we still consider it valid because, over time, the user learns the connection between an environment’s classification to its sound, and easily identifies the type of environment by simply listening to the sound. Moreover, users should not only be able to identify different place types, but also be able to differentiate between two places with the same place type. For instance, a park in Atlanta should sound similar to a park in New York, yet not exactly the same. Similarly, for the user, we wanted to extract information which is unique to the user and musify it. The user should be able to identify their signature in the music they hear, but at the same time, the music should not scare the user or make them feel unconformable by constantly listening to their own voice.

The third challenge was that the user should not be annoyed or bored by constantly listening to our application. Although this problem could be easily solved by using short notification sounds that are triggered when there is a change in the environment, we would not have utilized the richness of the inputs, such as the XY coordinates of the user, which are continuous signals to begin with. In order to make minor changes noticeable over time for the user, we thought it was better to have continuous music generation. The trick was to trigger sparse notes, but with long release-durations, so that it sounds sparse yet not empty.

In Tiani System Flowchart, note that we have a reverb module at the end of each stage. Also, we add a rain synth to simulate the sound of rain to create low fidelity feeling. The user is allow to change the volume of the rain sound. In future we intend to add musification of weather as well.

Sound Engine Design and Theory

The environment synthesis engine is built on a subtractive model. Subtractive synthesis is a method to synthesize audio by attenuating a signal. The attenuation is done by applying various stages of filters, envelopes and low frequency oscillators to the original signal. Thus at the micro level of a generative music system, we have full control over the timbre. The following diagram shows a simplified version our environment sound engine signal flow chart to demonstrate the idea of subtractive synthesis.

The input is a MIDI number which we convert to frequency and feed it to the oscillator generator. At this stage we can use four kinds of waveforms: sine, square, sawtooth, square waves and oscillations at the frequency of the MIDI input pitch. The signals generated by the oscillator are mixed together and passed on to the filter stage. The signal is then fed to the envelope stage to simulate the organic feeling. At this stage we set up attack, decay, sustain and release values for the signal. In the end, we use the pan module which tunes the volume of low frequency oscillators for left and right channel. This model produces sounds like that of a glass harp.

For the user’s music, we extract a signature from the user’s voice. The user is asked to record several seconds of their voice and we extract certain information from their voice. We simulate their voice but using a different timbre. The user still recognizes what they said, and can somehow identify that the processed sound is their own voice with a different timbre or feeling. To achieve this fully, we use sinusoidal re-synthesis. Any period signal can be expressed as the sum of sinusoidal functions with frequencies equal to integer multiples of a fundamental frequency. Thus, any sound can be expressed as a combination of sinusoidal waves, and we can similarly decompose a signal into a series of sinusoidal waves. We use a synthesizer which has the same parameter settings that we used for the environment. And as we sum up the components together, we can have a re- synthesized version of the original voice but with the timbre of the environment.

Environment Subtractive Synthesis Signal Flow

User Voice Sinusoidal Re-synthesis Signal Flow

Evaluation

Our overall parameters for evaluation include usage time and intu- itive association of sounds to the environment. We find usage time by indirectly observing users’ attention span as they use the app. That gives us an idea about how subtle or jarring the generative music is. Intuitive association with the environment is important as it reduces the memory load on the user.

In our first user testing, we made our participants listen to a set of sounds and asked them to identify the most suitable environment, from a list, for those sound. We intentionally mixed up two ap- proaches to identify how users unknowingly relate an environment to a sound; one approach involved a set of MIDI sequences, while the other, a set of timbres. What we learnt is that people immedi- ately relate to the timbres of the natural sounds that they hear in an environment. If we were to design sounds that users correctly asso- ciate with an environment, we must ensure that timbres are well taken into consideration. A majority of the users have been able to associate a majority of sounds to their respective environments correctly. Such an association is also learnt by reinforcement.

Our second user testing involved passively observing the user as they used the app, and trying to estimate their involvement in terms of the duration for which they use a set of parameters and how often they change those parameters. We noticed that after fiddling around with the parameters initially, the users tended to stand the music for around twenty to thirty minutes when they were walking or riding the bicycle. This is quite promising since feedback from more aggressive user testing could be implemented in the future to further improve the user experience.