key: cord-0135997-ci7sys9f
authors: Wei, Yu-Lin; Choudhury, Romit Roy
title: Estimating Angle of Arrival (AoA) of multiple Echoes in a Steering Vector Space
date: 2021-09-27
journal: nan
DOI: nan
sha: 8ad34c0721c8cdb6155184bc9ddad4256b22a842
doc_id: 135997
cord_uid: ci7sys9f

Consider a microphone array, such as those present in Amazon Echos, conference phones, or self-driving cars. One of the goals of these arrays is to decode the angles in which acoustic signals arrive at them. This paper considers the problem of estimating K angle of arrivals (AoA), i.e., the direct path's AoA and the AoA of subsequent echoes. Significant progress has been made on this problem, however, solutions remain elusive when the source signal is unknown (such as human voice) and the channel is strongly correlated (such as in multipath settings). Today's algorithms reliably estimate the direct-path-AoA, but the subsequent AoAs diverge in noisy real-world conditions. We design SubAoA, an algorithm that improves on the current body of work. Our core idea models signal in a new AoA sub-space, and employs a cancellation approach that successively cancels each AoA to decode the next. We explain the behavior and complexity of the algorithm from the first principles, simulate the performance across a range of parameters, and present results from real-world experiments. Comparison against multiple existing algorithms like GCC-PHAT, MUSIC, and VoLoc shows increasing gains for the latter AoAs, while our computation complexity allows real-time operation. We believe progress in multi-AoA estimation is a fundamental building block to various acoustic and RF applications, including human or vehicle localization, multi-user separation, and even (blind) channel estimation.

Angle of arrival (AoA) refers to the angle over which a signal arrives at a receiver. In reality, a transmitted signal bounces off multiple surfaces and arrives at the receiver over multiple AoA angles { 1 , 2 , ... }. This paper aims to infer the first AoA angles, 1: , < . Such capabilities can be helpful to various applications. For instance, a smart speaker like Amazon Echo may be able to infer the exact location of a user by reverse triangulating (or ray tracing) the AoAs of the voice [1] [2] [3] . Self-driving cars may be able to detect sounds of oncoming vehicles even if those vehicles are around the corner, hence not visible to the camera or LIDAR [4] [5] [6] . Multiple AoAs can also serve as useful information to multi-user detection, where a recording device is attempting to separate out voices in a meeting room [7] , or a RF spectrum-sensing radio is tracking multiple devices around it [8, 9] . This paper is aimed at solving an algorithmic question that could boost these and other applications. We call our algorithm SubAoA.

While a rich body of past work (e.g., GCC-PHAT, MUSIC, JADE, ESPRIT, MVDR, and others [10] [11] [12] [13] [14] [15] [16] ) have explored this algorithmic problem space, most have either focussed on optimizing the line of sight (LoS) AoA 1 , or have made implicit assumptions such as impulse-like signals [17, 18] , multiple uncorrelated sources [11] , co-prime channels [12, 19] , etc. This paper is an attempt to estimate 1: in uncontrolled conditions, such as real-world multipath channels and with unknown acoustic source signals (such as human voice). We are able to attain up to = 4 with an array of = 6 microphones.

The key challenge in estimating 1: can be viewed as a problem of estimating a specific type of delay from a mixture of delays. To elaborate, consider a voice signal that arrives over a multipath channel to an array of microphones. Three types of delays are in play: (1) Voice signals exhibit strong auto-correlation, meaning that the signal changes slowly with time. Thus, a delayed version of a signal looks similar to . (2) Multipath causes copies of the same signal to add up with different delays. (3) Finally, the signals arrive with relative delays across microphones, depending on the angle of arrival (AoA) of the signal. The net received signal is thus a mixture of all these delays and an AoA estimation algorithm must isolate the 3 type of relative-delay from this mixture. Observe that if the source signal is known [20] , or even if the source signal exhibits low auto-correlation (e.g., white noise) [21] , the problem is solvable. With signals like human voice, none of these are true.

Estimating 1: has gained particular relevance in modern times, primarily due to advances in speech recognition and voice-based interfaces. In past decades, significant work was accomplished around conference phones where the phone needed to recognize different voices and estimate their AoAs. In such applications, since these voice signals across users were uncorrelated, the problem was easier. On the other hand, acoustic communication modems, SONARs, ultrasound imaging, all developed sophisticated geometric signal processing techniques (including AoA), but had the flexibility to design the source signals. Known source signals (such as predesigned preambles) extended clear algorithmic advantages. Our problem of estimating 1: for an unknown voice signal inherits the worst of both worlds. Moreover, the solutions are expected to run in real-time (i.e., in the granularity of seconds), in noisy environments, and on small voice-enabled devices and cheap IoT radios. SubAoA needs to tackle these practical challenges.

The key idea we bring to the problem is that signals can be processed in a new AoA space which is a space spanned by all the 360 possible AoA vectors (that are known from basic geometry). This is in contrast to the conventional signal space, which is sensitive to noise. We will explain this from ground-up, but our high level intuition is as follows. We observe that received signals -the direct path and the echoes -become correlated when either their source data is correlated, or when their AoAs are from similar directions. Our central contribution is that representing signals in an AoA sub-space can give "immunity" to correlations of the source data. Said differently, while past techniques are crippled by two problems, we are crippled by one, at the cost of a slight increase in computation. The net result is that, even though the multipath echoes are delayed copies of each other, it is still possible to decode their AoAs one by one. Our algorithm is iterative in nature, where each AoA is decoded and then cancelled, so that the next AoA can be decoded from the residual (orthogonal) sub-space. This iterative process continues until all the residual AoAs are comparable to noise, at which point no other AoA can be decoded. This determines the value of .

In sum, the advantages of SubAoA arise along 2 dimensions: (1) Compared to signal sub-space approaches like MUSIC (and its many variants) [11, 13, 22, 23] , the geometric AoA sub-space reduces the AoA's angular error for multiple echoes, particularly in noisy environments. (2) In comparison to optimization based algorithms [2, 15, 24] that must solve nonconvex minimization problems to search for multiple AoAs, SubAoA benefits from combining the null-space and the AoA space to drastically reduce computational complexity. This delivers the real-time requirement.

Finally, what have we lost in exchange for the gains? When AoA vectors are close to each other, i.e., − is small, then only one AoA may get decoded (since the other AoA's projection to the null space is small). However, at the expense of some computation complexity, it is possible to recover some of these AoAs (i.e., by explicitly searching for the last few AoAs). SubAoA exports this as a knob that system designers can configure based on their application's requirement.

We evaluate SubAoA through simulations and real-world experiments. The real experiments are performed in various multipath settings (small apartments and large labs) using a 6-microphone array from SEEED [25] , mounted on top of a Raspberry Pi. We placed the device at different locations and gave voice commands, such as "Hey Siri", "Ok, Google", etc. To derive ground truth (which is non-trivial for AoAs), we place a high quality "reference" microphone right next to the human's mouth, and use this recording as the source signal. The known source signal permits channel estimation at each microphone, which yields the true AoAs. We compare this ground truth against SubAoA as well as a number of existing algorithms, including GCC-PHAT, MUSIC, and VoLoc. For tests of robustness, and sensitivity to various parameters, we run MATLAB simulations that mimic our real experiment settings. The main results from our experiments are as follows:

(1) While all algorithms estimate the direct path AoA accurately, SubAoA shows a marked improvement against others in estimating multipath AoAs. For instance, in a quiet lab, the 75 ℎ percentile of Δ 2 with SubAoA is 11 • , which is 13% smaller than MUSIC and 56% smaller than VoLoc. The gain is greater in a low SNR regime: Δ 2 is reduced by 54% and 76% compared to MUSIC and VoLoc, respectively, yielding only 12 • error for the 75 ℎ percentile in the presence of noise.

(2) The algorithm completion time is 16x faster with SubAoA compared to VoLoc, and slightly slower than MUSIC. In absolute numbers, SubAoA completes in 0.54 seconds on a 4 quadcore Intel laptop, compared to VoLoc's 6.67 seconds, and MUSIC's 0.21 seconds. (3) SubAoA's performance stays robust across different voice commands, users, SNRs, and background noises. In sum, the contributions in this paper can be summarized as follows:

(1) We identify room for improvement in a classical area by treating signals in a new AoA sub-space, then iteratively canceling each AoA to decode the next. We show that up to = 4 AoAs can be reliably decoded, even in noisy, multipath environments (where echoes are strongly correlated). (2) We show that the proposed ideas translate from algorithm to practice. Importantly, the accuracy and computation complexity inherit the good qualities of existing algorithms, although our SubAoA algorithm is unlike either of them.

This paper is focused mostly on the algorithm, its properties, and its performance in practical environments. We will build up the foundations of AoA and beamforming from absolute first principles, with an aim to make the paper self-sufficient to researchers and practitioners, so they can use it in application domains like IoT, acoustics, wireless localization, and mmWave beamforming. We are preparing to open-source our code and data for public use.

This section explains the mathematical constructs (and intuitions) for SubAoA and closely related papers (such as MUSIC, GCC-PHAT, MVDR, blind channel estimation, etc.). Familiar readers can skip to Section 2.1, or even to Section 3. Also, from now on, we will denote the LoS AoA as 0 , first echo as 1 , and so on for symbol consistency.

Phase lag at nearby microphones Figure 1 considers the toy case in which a signal from a transmitter arrives at a receiver composed of two adjacent microphones (separated by a distance ). If the transmitter is far away, i.e., the transmitter-receiver distance >> , then the signal paths are almost parallel. At any instant, the signals received by the two microphones are at a relative lag. In this particular case, microphone 2 receives the signal later than 1 because the signal travels an additional distance to arrive at 2 . This additional distance is , where is the angle of arrival (AoA). Translating this additional distance to phase, the phase difference Δ between 1 and 2 is 2 . d 2 1 Figure 1 : Angle of arrival causes signals to travel unequal distances at two separated microphones, causing a phase shift between the recorded signals as a function of .

Let us now generalize to microphones and assume there is no multipath (or echoes) from the environment. If 1 receives a signal 1 , we can say from above that the ℎ microphone receives a signal at a phase lag of Δ = 2 ( − 1) from 1 . This means Δ = ( − 1)Δ . In the frequency domain, this can be written as = 1 ( −1)Δ , where is a complex number. Writing this out as a vector for all the microphones, we have:

Given this vector [ 1 , 2 ... ] , decoding the AoA is easy. The basic idea is to pretend the signal is arriving from some unknown AoA = , subtract the corresponding delays from each microphone, and then add up the delayed signals as:

From all values of ∈ [0, 2 ], the one that results in the maximum gives us the correct , which is then mapped to . The intuition is that when matches , the relative delays between the microphone signals get compensated (or reversed). Hence the signals becomes identical, meaning

When these signals add up, they are called coherent, or constructive, or in phase, and their = 1 . For incorrect , the signals are not in phase and their sum is < 1 .

AoA under Multipath Figure 2 shows the multipath case where echoes of the source signal arrive at the receiver from different AoA angles. Considering the first microphone 1 , the received signal is now a sum of multiple echoes, denoted as ( 0 ) 1 for the direct path,

for the first echo, ( 2 ) 1 for the second echo, and so on. The next microphone also receives each of these echoes but at corresponding delays, depending on that echo's AoA. Thus, the received signal array (for echoes) will become:

Finally, the microphone hardware and the environment adds noise modeled as an additive noise vector on the right hand side of the above equation. Algebraically, the equation can now be written as:

where is the "received signal" vector, is the "steering. matrix", is the "source signal" as measured at the reference microphone 1 , and is the "noise" vector. The AoA question (in this multipath case) is aimed at estimating {Δ 0 , Δ 1 , . . . Δ } first; knowing these, it is easy to compute { 0 , 1 , . . . }. 

We sample 4 key ideas from literature and the popular algorithms that emerged from them. 1. Cross-correlation and GCC-PHAT [10, 14, 26] 2. Sub-space methods and MUSIC [11, 13, 22] 3. Sparse recovery and BCI [12, 15] 4. Successive cancellation and VoLoc [2, 27] ■ Cross-correlation and GCC-PHAT Observe that Equation 1 can be written as a dot product as follows:

This is a correlation between an AoA vector (of phase Δ ) against the measured signal at the receiver. A simple crosscorrelation method (called Delay-and-Sum [28] ) attempts to perform such correlations for all values of Δ ∈ [0, ] and the values of Δ that show high correlation gives all the AoAs. The intuition is that for a specific Δ , one of the echoes will add up coherently, and the result would be high even though other echoes are adding in-coherently.

As an analogy, if one is listening to an orchestra of many instruments, it might be possible to listen to the guitar by searching for the guitar pattern. Correlating with Δ is like listening to the guitar amid the other instruments.

GCC-PHAT refines this intuition but generalizes it by removing the effect of magnitude. In other words, GCC-PHAT does not want a louder violin to drown the guitar, so it normalizes the correlation function by the amplitudes of each instrument. The hope is that each instrument can now be identified by the patterns they create in time (and not in loudness). Mathematically, this normalization is performed in the denominator (the numertator is basic cross-correlation between 's ℎ column and , all in the frequency domain):

The AoAs correspond to those values of Δ that produce peaks in . The issue is, violins, cellos, and several other instruments can often add up to sound like guitars, so this technique is often unreliable in real scenarios.

■ Sub-space methods and MUSIC These methods look at the AoA problem through the lens of Signal sub-space

Noise sub-space Figure 3 : Subspace method computes noise space by eigen-decomposition [11] . linear algebra and vector spaces. Their core intuition is that signals received on microphones form an dimensional space, however, assuming there are only < echoes, the signals from all these echoes should span a R sub-space inside R . Said differently, the remaining R ( − ) space should be spanned only by noise, and importantly, should be orthogonal to the signal sub-space. Thus, if signals are represented in the appropriate sub-spaces, then orthogonality between signals and noise can reveal the AoA vectors.

Mathematically, MUSIC realizes this intuition by computing the auto-correlation of = + , which leads to: = + 2 (5) Eigen-decomposition of the matrix reveals the noise space, and it can be shown that different columns of , i.e., the correct AoA vectors, are orthogonal to the eigenvectors of noise. as shown in Figure 3 . Returning to our orchestra analogy, MUSIC shows that the cacophony (or noise) created by an amateur orchestra is a function of the instruments that are being played. So, whether a guitar (or a cello) is part of that orchestra can be determined by analyzing the noise generated by the band.

The issue is that MUSIC requires the musical instruments to be all different from each other. Mathematically, this means that the source signals

] must all be uncorrelated. Unfortunately, in a multipath channel, each of these signal components is the echo, or delayed version of each other; hence they are highly correlated. This derails MUSIC. While "spatial smoothing" optimizations have attempted to alleviate the issue, they are effective only in limited scenarios.

■ Blind Inference and Sparse Recovery Blind channel inference (BCI) is a relatively modern idea that attempts to estimate the channel blindly, i.e., without knowing the source signal . The core idea essentially models the received signals at 2 microphones as:

In other words, since the source signal is the same for both the microphone recordings, is it possible to solve for both the channels from the right side equation 1 2 − 2 1 = 0. If 1 and 2 are decoded, then the AoAs can be estimated by computing the relative delays between the same channel taps. Solving this is difficult in general, but assuming sparse channels (i.e., an 0 norm optimization), it might be possible to recover 1 and 2 . However, 0 norms are intractable, so 1 norms have been used. Still, the computation cost is extremely high, and with hardware noise, the algorithms remain far from practice. Table 1 presents a brief comparison between these techniques.

In 2019, VoLoc identified the opportunity that the LoS path is initially clean (i.e., unpolluted by echoes), so it might be possible to cancel the LoS after the first echo arrives. This should give the first echo, which can then be cancelled out when the second echo arrives. This iterative cancellation can reveal the echoes one by one, alongside their AoAs.

The issue is that perfect cancellation is a heavy (non-convex) optimization since the decision variables are every possible delay, as well as amplitudes of the signal. Moreover, with noise, the cancellation will never be perfect, indicating that the errors of cancellation will accumulate through each step. Finally, the initial LoS path, although unpolluted, might be very weak for human voices (since humans cannot produce loud sounds suddenly). This derails cancellation.

Our central idea draws inspiration from MUSIC and VoLoc. We begin with an intuitive explanation of SubAoA, followed by a formal description. Observe that is a vector formed by a weighted combination of AoA vectors, plus noise. Now, using a sub-space approach like MUSIC, the LoS path AoA 0 (= 0 ) can be decoded reliably from received signal vector . Knowing AoA 0 , SubAoA computes its orthogonal sub-space of dimensions ( − 1). Let's denote this orthogonal sub-space as 0 (Figure 4(b) ) Observe that the product 0 is a signal vector that has essentially cancelled out the LoS path (since they are orthogonal). Viewed another way, 0 is the residue signal after all AoA vectors (weighed by the echoes) have been projected to the 0 space. Similarly, 0 is a sub-space in which the projected AoA vectors lie. Thus, we have reduced the original problem to a smaller problem in a ( − 1) dimensional space, and importantly, this sub-problem does not contain the LoS signal component. Our goal is to now decode the first echo, i.e., AoA 1 (= 1 ), by applying the same method for AoA 0 . We then derive the orthogonal space 1 for this echo, and then compute 1 0 which eliminates both AoA 0 and AoA 1 . Iteratively, SubAoA yields the AoAs of echoes until the AoA vector projections are comparable to noise, at which point no AoAs can be detected anymore.

Two behaviors of the algorithm are worth discussing: (1) Transforming both Signal and AoA Space: From Figure 4(c), the projected AoAs are in the 0 sub-space, so when one of those AoAs is decoded, it should not be the same as the original AoA (before projection). However, recall that SubAoA also projects AoA-space with 0 (in addition to 0 ). This transformation preserves the correct AoA vectors, which can thus be decoded to Δ and ultimately .

(2) Order of AoA Estimation: If AoA 1 is angularly close to AoA 0 , then AoA 1 's projection onto sub-space 0 would be small. However, the AoA 2 vector might exhibit a stronger projection on 0 . Thus, in SubAoA's second step, AoA 2 will be decoded before AoA 1 . In the subsequent iteration (i.e., upon projection to AoA 3 's orthogonal space), AoA 1 's projection may prove stronger than AoA 3 's projection. AoA 1 will get decoded then. In general, the order of decoding AoAs in SubAoA is a function of the power of that echo, as well as the angular separation with other echoes. This is the reason why the acoustic channel cannot be immediately estimated from the AoAs alone. As an analogy, channel estimation can be viewed as a global ordering of tap-delays across all microphones, while AoA only gives a partial ordering between microphone pairs. However, knowing AoAs can help in blind channel estimation, a topic that we leave to future work.

Algorithm 1 presents the pseudo code for SubAoA. We denote the received signal matrix as X = [x 1 , ...x ], where x is the signals recorded by the ℎ microphone. The steering matrix is denoted as A = [a 1 , ...a 360 ] ∈ C ×360 , and a is the steering vector of angle . Note that the steering matrix A varies across different frequencies, but without loss of generality, Step 1: Compute noise space for max. likelihood AoA. The -dimensional received signal is composed of thedimensional signal sub-space and the ( − )-dimensional noise sub-space. We apply principle component analysis (PCA) on the received signal matrix, and extract the − least significant eigen-vectors to obtain the noise space N:

where is the expected number of signal paths.

Our next step is to compute the likelihood of each AoA angle by testing its orthogonality with N. A signal along a correct AoA angle will be strongly orthogonal to the noise space (close to zero). The negative log-likelihood is defined as = − log a NN a . For steering vectors orthogonal to N, this negative log-likelihood will be maximized. We compute the noise space and AoA likelihood across all frequencies, sum up the likelihoods, and output the 0 with the max likelihood. This 0 is the LoS AoA estimation.

Step 2: Compute the residual signals by removing signals arriving from the decoded AoA direction(s) Thus, once 0 is known, we intend to cancel out from X, all the signal power arriving from this AoA 0 . To this end, we project X to B 0 ∈ C ( −1)× , the null space of a 1 .

Initialize the residual R 0 with the received signal X for the ℎ path = 0 : − 1 do Compute the orthogonal sub-space N by PCA on the residual (R ) for all possible AoAs do Find the steering vector a perpendicular to N end for = arg max − log(a NN a ) Define the orthogonal residual sub-space B where B ⊥a Project both the residual and steering matrix to B :

Naturally, the residual matrix R does not contain any power from the LoS path after the projection. Then we also project the steering matrix A = (B 0 A). This gives both the received signal residual and the steering matrix residual, and importantly, both are free of 0 components. We now plug these matrices back to Step 1, replace X with R and estimate the next AoA 1 . This iteration continues times till . For MUSIC, the complexity is only a single round of (a)+(b) = ( 2 ). For GCC-PHAT, the complexity of a microphone pair is ( ), and with the GCC component repeated for all 2 microphone pairs, the complexity is also ( 2 ). The complexity of VoLoc depends on the resolution of parameter search. For each candidate location, the likelihood computation is ( 3 ), since a matrix inversion is needed to cancel the paths. The total complexity is ( 2 3 ) where is the location resolution.

In sum, SubAoA features the same order complexity as MU-SIC and GCC-PHAT ( times higher, but the path count is small in real-world). All of these algorithms are much more efficient than successive cancellation methods like VoLoc.

We discuss various properties, tradeoffs, and differences of SubAoA especially in contrast to sub-space algorithms like MUSIC and their variants.

In multipath environments, the signals for all paths are correlated (since they are delayed copies of the same source). Therefore, for conventional sub-space methods like MUSIC, the signal sub-space will be less than , and the noise space will be larger than the ideal rank, − . When we choose the least − eigenvectors out from the noise sub-space, part of the signal sub-space gets removed (note that even with unsupervised clustering of eigenvalues, the problem is not resolved since weak signals will cause small eigenvalues). Worse, a strong noise component will be selected into the signal space. In sum, decoding AoAs of correlated signals with MUSIC is similar to estimating AoAs under a noisy environment. In such cases, weak echoes may not be orthogonal to the noise space, so MUSIC (and all its variants) will suffer for AoAs with weak echoes.

To mitigate this, SubAoA estimates only the strongest path in each iteration. Now, consider an echo that is correlated to the LoS signal from the first iteration. Since SubAoA projects the AoA, a correlated echo will not be affected so long as it is angularly separated from the LoS AoA. In other words, the residual 1 does not contain any signals from the 0 angle, i.e., the a 0 axis is removed. When we compute the new nosie space using this residual, the correlated LoS signal is absent, so the reflected path will dominate the residual, resulting in a large likelihood for 1 . Figure 5 shows MATLAB simulations where each line in the graph corresponds to one iteration of SubAoA. The solid triangle shows the AoA detected in that iteration, since that is the maximum likelihood peak. Since SubAoA's first iteration is the same as MUSIC, the line 0 is actually MUSIC's performance. Observe that in noise-free environments (Figure 5(a) ), MUSIC and SubAoA are comparable. The difference becomes pronounced with noise; at SNR=10 , Figure 5 (b) shows how SubAoA preserves sharp AoA peaks while MUSIC falters. This is evidence that AoA vectors are preserved in SubAoA's residual space (after projection), allowing them to be decodable one by one. We will show similar results from real-world testbeds as well. From signal sub-space to AoA sub-space A large class of sub-space algorithms (inspired by MUSIC) [11, 13, 22] computes the noise space from the signal covariance matrix. This requires the source signals to be uncorrelated to guarantee a -ranked signal sub-space. In contrast, SubAoA performs the eigen-decomposition on the steering matrix space A, rather than the signal space X. This relaxes the assumption of uncorrelated signals; this highlights the core contribution of operating in the AoA sub-space. Figure 6 visualizes the gain of AoA sub-space across more than 100 simulated multipath settings. We use real recorded speech as the source signal and set SNR = 10 ; a room impulse response (RIR) outputs the indoor channel. SubAoA consistently outperforms MUSIC -SubAoA's median 1 error is 58% lower because the echoes are strongly correlated. Real testbed results will further corroborate these outcomes.

In each iteration of SubAoA, the rank of the residual reduces by 1 due to projection. In noise-free conditions, SubAoA can estimate up to − 1 AoAs. However, the estimation error grows high after 4 both in simulation and real-world scenarios (discussed later). The key reason is "error accumulation", i.e., AoA estimation error from previous iterations affects the residual in latter rounds. Specifically, if LoS AoA 0 is estimated with a 5 • error, the LoS component will not be eliminated completely after the projection, and this will pollute the detection of subsequent AoAs. This limits SubAoA to = 4 in practice. While the AoA sub-space offers immunity from correlated signals, it begins to falter when AoAs are angularly close to each other. For instance, if 1 is very close to 0 , the residual space B 0 will be nearly perpendicular to a 1 . The echo along 1 will suffer a strong attenuation after the projection. Meanwhile, if the noise vector is perpendicular to a 0 , its power will be preserved, and the SNR of the echo will drop proportionally. Figure 7 plots the signal attenuation factor F assuming 0 = 0 • . If 1 is 10 • , the echo suffers an 8.54 attenuation compared to the worst case noise vector (at 180 • ). Of course, the average case is much better. Some ideas: One natural question is: since 0 is known, why not utilize 0 as a weighing function to amplify AoAs angularly close to 0 . We explored this idea; however, the problem is that the noise vector is completely unknown. Thus, it is possible that we would end up multiplying an angularly far away from 1 with a small weight, but if the noise vector is close to 0 , we would amplify the noise. In other words, solving one problem would create another.

Perhaps one fall-back possibility is to invite VoLoc-style optimization when the AoA sub-space has reduced to few dimensions. Said differently, when 0 , 1 , and 2 have already been detected, and the residual space 2 is small, we can employ optimization to minimize residuals for candidate steering vectors. This is algorithmically not superior since the accuracy would improve at the cost of computational complexity. However, if the application permits some time cushion, such hybrid approaches may be practical.

SubAoA has been implemented on a 6-microphone array from SEEED [25] , mounted on a Raspberry Pi 4 [31] . The microphone arrangement is circular (shown in Figure 9 (a)) similar to Amazon Echo [32] . We do not use off-the-shelf Amazon/Google smart speakers since they do not expose the raw acoustic samples. The SEEED platform offers a sampling rate of 16 , covering most of the audible energy spectrum for speech, music, and ambient noise. We use an iPhone 11 Pro smartphone speaker as the sound source -we play various types of sounds at 80% of the max volume.

Experiments are performed in 2 multipath settings: (a) a relatively small studio apartment, and (b) a larger engineering lab (Figure 9(b) ). In each setting, we test SubAoA in 3 types of AoA regimes: (a) near the corner, (b) near the side wall, and (c) near the center of the room. The rooms are fully populated with everyday objects, including furniture, computers, cabinets, TVs, refrigerators, etc. The iPhone sound source is placed at discrete positions on a circle around the receiver; several concentric circles are formed. For corners and sides, the circles are limited to quarter circles and halfcircles, respectively. The voice of 3 volunteers -one female and two males -are recorded and played from the iPhone. The voice signals are composed of various wake-words, including Siri, Google, Bixby, Alexa, and a sentence: "How is the weather today?". In total, we tested at more than 100 transmitter-receiver location pairs.

To obtain ground truth AoA, we play a known chirp signal spanning [0, 8] to estimate the multipath channel at each microphone. Since each echo creates a peak in the channel, and since the peak from a given echo is time-shifted across microphones, we can carefully derive the AoAs by aligning the peaks between microphone pairs. Due to ambient and hardware noise, the alignments are not perfect; however, we leverage multiple pairs of microphones to solve a regression on AoA. Finally, we actively test our ground-truth technique by placing artificial reflectors at known angles and checking if we are able to detect the expected AoA reliably.

Experiments are designed to answer the following questions:

(a) What is SubAoA's AoA estimation error (compared to ground truth) for the LoS path and subsequent echoes? How does error grow with successive echoes? (b) How does SubAoA compare against existing algorithms, namely MUSIC [11] , GCC-PHAT [10] , and VoLoc [2] ? (c) How robust is AoA estimation across different distances, noise, reflection configurations, and users? (d) What is SubAoA's completion time in comparison to existing AoA algorithms mentioned above?

Overall AoA Estimation Accuracy The results from the studio are comparable since the size of the room does not matter much. This is because SubAoA is not sensitive to the actual propagation delays for the echoes; only the relative delay at microphones is necessary for AoA. Figure 11 shows the complete AoA-error CDFs corresponding to the above results. Beyond = 4 AoAs, the median AoA error grows more than 25 • , either because the signals are considerably weaker, or in some cases, co-aligned with the earlier AoAs. Comparison with Existing Algorithms Figure 12 compares of SubAoA's performance with 3 existing algorithms: (a) MUSIC [11] , (b) GCC-PHAT [10] , and (c)

VoLoc [2] in both the quiet and noisy environment. We run MUSIC with = 4 uncorrelated source signals, and pick the {0:3} AoAs with the highest likelihood. For GCC-PHAT, we pick the {0:3} with highest normalized correlation coefficient. Finally, since VoLoc needs a nearby wall reflector, we place the receiver near the wall, and assume the distance to the wall is perfectly known (this is clearly favorable to VoLoc). Also, we note that VoLoc only outputs {0:1} because the complexity of computing 2 is prohibitively high.

For the quiet setting, compared to GCC-PHAT's median errors of 14 • and 27 • (for 1 and 2 ), SubAoA reduces the errors by 71% and 33% respectively. Beyond 2 , GCC-PHAT's correlation approach begins to derail due to indoor multipath; as a result, the error bars grow large. VoLoc performs well for 0 and 1 , but SubAoA is still better. But for later AoAs, and in terms of running time, VoLoc is certainly inferior to SubAoA. MUSIC's sub-space approach balances multiple objectives well -it detects = 4 AoAs and runs in real time. However, SubAoA still outperforms MUSIC since the strong correlation in multipath signals affects the latter. In noisy environments, VoLoc fails in estimating even the LoS AoA because it cannot correctly find a clean start of the speech (a necessary assumption), and gaussian noises heavily affect the cancellation process. Figure 13 shows the performance breakdown in two SNR regimes, 20dB and 0dB (i.e., the white gaussian background noise is increased for both MUSIC and SubAoA). In low SNR regimes, MUSIC suffers more because the signal sub-space is directly affected, while the AoA sub-space is less sensitive. Compared to MUSIC, SubAoA reduces 1 's median error by 47% in the 20 regime, and 50% when the SNR is 0 . Figure  14 zooms into the AoA spectrum of SubAoA and MUSIC in a multi-echo environment. In MUSIC, we can clearly see that the weak AoA components are buried in the noise when the SNR decreases, while SubAoA maintains consistently strong peak heights for 0:3 . Figure 15 shows SubAoA's performance with increasing distance between the speaker and receiver. Evidently, there is no obvious impact of distance. This is because (1) distance impacts SNR less than the energy absorption due to reflections, and (2) SubAoA is not heavily sensitive to SNR since the decoding is in the AoA sub-space. Thus while the error grows with distance for 1 , it somewhat reduces for 3 .

With COVID, the background noises were particularly low in the lab settings. To evaluate how background noise would affect SubAoA's performance (i.e., at lower SNR), we add white gaussian noises to the received signal. Figure 16 shows the error across different SNR levels. Evidently, performance 

We test SubAoA's performance under 3 AoA environments, i.e., placed near the corner (2 nearby walls), side (1 nearby wall), and center of a room (no nearby walls). These configurations influence the number of AoAs, their angular separations, and strengths. Figure 17 plots the median AoA errors. For the corner setting, the first and second echoes from the walls are strong (and angularly well separated) so the errors are least. For the single-wall setting, because echoes from the environment are weaker than the wall reflection, the 2 error jumps up compared to 1 . When the device is placed at the center of the room, all the echoes are from the far-away walls, resulting in higher 1:3 errors. Figure 18 visualizes the AoA errors for 1 and 2 in the lab settings. This heatmap visualizes and confirms that errors are higher near the centers and least near corners or with various reflectors nearby. Robustness to Source Signal Figure 19 verifies that the performance of SubAoA is not sensitive to the source signal, such as different users or the speech signals they produce. We play different speech waveforms: S="Siri", G="Google", B="Bixby", A="Alexa", and a sentence Stc="How is the weather today". We also repeat the words to aid AoA detection with a longer waveform. Hence, S5 indicates "Siri" repeated 5 times.

Evidently, the source (speech) signal does not affect the AoA accuracy much; This is not surprising because the techniques underlying SubAoA does not utilize any structures of the signal, like base frequency, harmonics, pauses, correlation, etc. Longer speech (i.e., sentences or repeats) help slightly because it suppresses the error from random noise.

To test SubAoA's sensitivity to different users (speaking the same words), we record the speech from 3 volunteers and play them from identical locations. Figure 20 shows the results. The median error of 1 , for example, are 9.5 • , 11 • , and 9.75 • across the users. Again, these results confirm that SubAoA is robust, hence generalizable to any voice signal. 

We evaluate SubAoA on a laptop equipped with AMD Ryzen 7 4800H, 2.9GHz, and 32GB Memory. Table 2 shows the median computation time of different algorithms. The overhead of SubAoA is similar to MUSIC and GCC-PHAT, but much lower than VoLoc. This is because VoLoc needs to solve a minimization problem on path delays and amplitudes (in order to perform accurate cancellation). SubAoA side-steps that by estimating null-spaces for the decoded AoA. Observe that even though SubAoA takes higher compute time than MU-SIC, they both are sub-seconds, hence can support real-time applications in the acoustic time-scale.

Multi-AoA for RF Signals Although this paper has been presented and evaluated with acoustic signals, the core SubAoA algorithm is generalizable to any signals, including RF. In no part of the algorithm are there assumptions around acoustic/speech source-signals; neither are there implicit assumptions about the slow nature of sounds. We have chosen acoustics here since human voice/speech is an important application where the source signal is unknown. We leave the treatment and evaluation with RF signals to future work.

From AoA to (Blind) Channel Estimation Another opportunity lies in approaching blind channel inference (BCI) through AoA. In other words, while SubAoA is extracting relative delays between microphones, the only unknowns that remain are the absolute path delays and amplitude attenuation. Previous BCI algorithms like JADE [15] estimate the path delays and AoAs jointly, resulting in very high computational complexity. With SubAoA estimating the AoAs accurately, the problem of path delays/amplitudes seems reachable. We believe SubAoA ushers in such ideas.

■ Acoustic AoA Estimation: AoA estimation on microphone arrays is clearly a rich area. Many papers extend the foundational ideas from Section 2 to different settings [13-15, 22, 33-35] . ESPRIT [13] clusters microphones into different groups to reduce the complexity of MUSIC. JADE [15] targets multipath scenarios and jointly optimizes the path AoA and delay from the received signals (with similarities to blind estimation [12, 19, [36] [37] [38] [39] . Newer ideas introduce deep-learning/training phase to infer rich channel information [26, [40] [41] [42] . Nakadai et al. [40] uses a large microphone to capture and match the speaker directionality pattern. Brutti et al. [26] models the low-order reflection and accounts for the directional radiation pattern with modified GCC-PHAT algorithm. Compared to these algorithms, SubAoA shows a more foundational opportunity in the AoA sub-space representation. Perhaps many other works can exploit the AoA sub-space.

■ RF AoA Estimation: Radio frequency (RF) AoA estimation is another important and thriving topic [23, [43] [44] [45] [46] [47] [48] [49] [50] . Knowing path AoA enables beamforming techniques in RF communication (especially mmWave). Karanam et al. [45] utilizes only the received amplitude at each antenna to estimate the AoA. In [44] , a deep learning technique is introduced to estimate a large number AoAs accurately. Ghasempour et al. [47] creatively utilizes the leaky-wave devices' radiation characteristics to detect both the AoA and Angle of Departure (AoD) in a single-shot measurement. SubAoA applies to RF signals as well, and could complement new types of antennas as proposed by [47] . Finally, SubAoA could find critical applications in mmWave beamforming, mobility tracking, and link re-establishments.

■ Space of Applications: AoA estimation is also gaining prominence in personal computing and IoT. Vast literature [2, [51] [52] [53] [54] [55] explores a number of acoustic sensing capabilities. Yun et al. [51] uses the signals from a smartphone to track human fingers, providing a device-free HCI interface for VR/AR experience. In [54] , the authors transmit a sweeping sine wave to estimate the multipath and compute the room geometry from the reverberations. [55] develops an indoor positioning system based on acoustic active fingerprinting using a phone. Finally, UK and EU have passed legislation [5, 6] that electric cars broadcast artificial sounds for the safety of pedestrians and other vehicles; this implies that a car might be able to hear other cars around the corner (even if they cannot see them through cameras and LIDARs). SubAoA can serve as a foundational building block for these and many other applications.

We develop SubAoA, an algorithm that factors out multiple AoAs 0: −1 from a microphone array. The technique shows the advantages of operating in the AoA sub-space, instead of the conventional signal sub-space. We observe that the algorithm's behavior is amenable to practical settings, including unknown source signals, correlated multi-path, ambient noise, etc. We believe SubAoA could usher additional ideas in the algorithm as well as the space of applications.

A high-accuracy, low-latency technique for talker localization in reverberant environments using microphone arrays

Voice localization using nearby wall reflections

MAVL: Multiresolution analysis of voice localization

Audio-visual based nonline-of-sight sound source localization: A feasibility study

Automated and electric vehicles bill

electric cars: New vehicles to emit noise to aid safety

Blind source separation exploiting higher-order frequency dependencies

Bayesian compressed sensing based dynamic joint spectrum sensing and primary user localization for dynamic spectrum access

Pinpoint: Localizing interfering radios

The generalized correlation method for estimation of time delay

Multiple emitter location and signal parameter estimation

Joint order detection and blind channel estimation by least squares smoothing

Estimation of signal parameters via rotational invariance techniques-esprit

Adaptive eigenvalue decomposition algorithm for passive acoustic source localization

Joint angle and delay estimation (JADE) for multipath signals arriving at an antenna array

High-resolution frequency-wavenumber spectrum analysis

Reflection-aware sound source localization

Acoustic location of gunshots using combined angle of arrival and time of arrival measurements

Multichannel blind identification: From subspace to maximum likelihood methods

Norm-adaption penalized least mean square/fourth algorithm for sparse channel estimation

On channel estimation using superimposed training and first-order statistics

Weighted spatial covariance matrix estimation for music based tdoa estimation of speech source

Arraytrack: A fine-grained indoor location system

Adaptive signal processing in wireless communications

Respeaker 6-mic circular array kit for raspberry pi -seeed wiki

Inference of acoustic source directivity using environment awareness

Zigzag decoding: Combating hidden terminals in wireless networks

Microphone array systems for hands-free telecommunication

Delay-and-sum beamforming for direction of arrival estimation applied to gunshot acoustics

Time delay estimate based direction of arrival estimation for speech in reverberant environments

Raspberry-pi-4-product-brief

Amazon.com: All-new echo

Doa of gunshot signals in a spatial microphone array: performance of the interpolated generalized cross-correlation method

Time-delay estimation via linear interpolation and cross correlation

Environment aware estimation of the orientation of acoustic sources using a line array

Subspace methods for the blind identification of multichannel fir filters

Fast maximum likelihood for blind identification of multiple fir channels

Blind channel identification for speech dereverberation using l1-norm sparse learning

Blind sparse-nonnegative (bsn) channel identification for acoustic time-difference-of-arrival estimation

Sound source tracking with directivity pattern estimation using a 64 ch microphone array

Real-time sound source orientation estimation using a 96 channel microphone array

Estimation of sound source orientation using eigenspace of spatial correlation matrix

Direction of arrival estimation using the music algorithm for a mimo ofdm radar

Performance advantages of deep neural networks for angle of arrival estimation

Magnitudebased angle-of-arrival estimation, localization, and target tracking

Listeer: Mmwave beam acquisition and steering by tracking indicator leds on wireless aps

Single-shot link discovery for terahertz wireless networks

Comparison between music and esprit direction of arrival estimation algorithms for wireless communication systems

md-track: Leveraging multi-dimensionality for passive indoor wi-fi tracking

Spotfi: Decimeter level localization using wifi

Strata: Fine-grained acoustic-based device-free tracking

Weapon classification and shooter localization using distributed multichannel acoustic sensors

Demo: Alps -the acoustic location processing system

Acoustic echoes reveal room shape

Roomsense: an indoor positioning system for smartphones using active sound probing