key: cord-0446551-bwl8ms7v authors: Pavia, Joao Pedro; Velez, Vasco; Ferreira, Renato; Souto, Nuno; Ribeiro, Marco; Silva, Joao; Dinis, Rui title: Low Complexity Hybrid Precoding Designs for Multiuser mmWave/THz Ultra Massive MIMO Systems date: 2021-07-24 journal: nan DOI: nan sha: 8629fa81e3f984ee87b1c599a99686edf79f310d doc_id: 446551 cord_uid: bwl8ms7v Millimeter-wave and terahertz technologies have been attracting attention from the wireless research community since they can offer large underutilized bandwidths which can enable the support of ultra-high-speed connections in future wireless communication systems. While the high signal attenuation occurring at these frequencies requires the adoption of very large (or the so-called ultra-massive) antenna arrays, in order to accomplish low complexity and low power consumption, hybrid analog/digital designs must be adopted. In this paper we present a hybrid design algorithm suitable for both mmWave and THz multiuser multiple-input multiple-output (MIMO) systems, which comprises separate computation steps for the digital precoder, analog precoder and multiuser interference mitigation. The design can also incorporate different analog architectures such as phase shifters, switches and inverters, antenna selection and so on. Furthermore, it is also applicable for different structures namely, fully connected, arrays of subarrays (AoSA) and dynamic arrays of subarrays (DAoSA), making it suitable for the support of ultra-massive MIMO (UM-MIMO) in severely hardware constrained THz systems. We will show that, by using the proposed approach, it is possible to achieve good trade-offs between spectral efficiency and simplified implementation, even as the number of users and data streams increases.) Over the last few years, significant advances have been made to provide higher-speed connections to users in wireless networks with several novel technologies being proposed to achieve this objective. However, future generations of communication systems will have to fulfil more demanding requirements that cannot be met by the methods adopted in today's communications systems. This motivates the exploration of other candidate technologies like the millimeter wave (mmWave) and Terahertz (THz) bands. These bands offer great underutilized bandwidths and also allow a simplified implementation of large antenna arrays, which are crucial to combat the severe signal attenuation and path losses that occurs at these frequencies [1] - [4] . While these technologies (THz systems in particular), are expected to ease the spectrum limitations of today's systems, they face several issues, such as the reflection and scattering losses through the transmission path, the high dependency between distance and frequency of channels at the THz band and the need of controllable time-delay phase shifters, since the phase shift will vary with frequencies based on the signal traveling time, which will also affect the system performance. These limitations require not only the proper system design, but also the definition of a set of strategies to enable communications [5] , [6] . The exploration of the potentialities of millimeter and submillimeter wavelengths is closely related to the paradigm of using very large arrays of antennas in beamforming architectures. This gives rise to the so-called ultra-massive multiple-input multiple-output (UM-MIMO) systems. Still, to achieve the maximum potential of these systems it is necessary to consider the requirements and the challenges related not only to the channel characteristics but also to the hardware component specially regarding THz circuits [5] , [7] , [8] . Considering that high complexity and power usage are pointed out as the major constraints of large-antenna systems, the adoption of hybrid digital-analog architectures becomes crucial to overcome these issues. By adopting this type of design, it is possible to split the signal processing into two separate parts, digital and analog, and obtain a reduction of the overall circuit complexity and power consumption [9] . Adopting a proper problem formulation, the analog design part can then be reduced to a simple projection operation in a flexible precoding or combining algorithm that can cope with different architectures, as we proposed in [10] , [11] . Despite the ultra-wide bandwidths available at mmWave and THz bands, and besides considering the problem of distance limitation, MIMO systems should take into account the operation in frequency selective channels [12] . To make the development of hybrid schemes for these systems a reality, it is necessary to handle the fading caused by multiple propagation paths typical in this type of channels [13] . Therefore, solutions inspired on multi-carrier schemes, such as orthogonal frequency division multiplexing (OFDM) are often adopted to address such problems [14] . Spectral Efficiency (SE) of point-to-point transmissions is a major concern in SingleUser (SU) and MultiUser (MU) systems. To achieve good performances, it is necessary to develop algorithms that are especially tailored to the architecture of these systems. Several hybrid precoding schemes have been proposed in the literature [16] - [18] . The authors of [15] proposed two algorithms for low complexity hybrid precoding and beamforming for MU mmWave systems. Even though, they assume only one stream per user, i.e., the number of data streams (Ns) is equal to the number of users (Nu), it is shown that the algorithms achieve interesting results when compared to the fully-digital solution. The concept of precoding based on adaptive RF-chain-to-antenna was introduced in [16] for SU scenarios only but with promising results. In [17] , a nonlinear hybrid transceiver design relying on Tomlinson-Harashima precoding was proposed. Their approach considers fully-connected architectures only but can achieve a performance close to the fully-digital transceiver. A Kalman based Hybrid Precoding method was proposed for MU scenarios in [18] . While designed for systems with only one stream per user and based on fully connected structures, the performance of the algorithm is competitive with other existing solutions. A hybrid MMSE-based precoder and combiner design with low complexity was proposed in [19] . The algorithm is designed for MU-MIMO systems in narrowband channels, and it presents lower complexity and better results when compared to Kalman's precoding. Most of the hybrid solutions for mmWave systems aim to achieve near-optimal performance using Fully-Connected (FC) structures, resorting to phase shifters or switches. However, the difficulty of handling the hardware constraint imposed by the analog phase shifters or by switches in the THz band is an issue that limits the expected performance in terms of SE. Array-of-SubArrays (AoSAs) structures have gained particular attention over the last few years as a more practical alternative to FC structures, especially for the THz band. In contrast to FC structures, in which every RF chain is connected to all antennas via an individual group of phase shifters (prohibitive for higher frequencies), the AoSA approach allows us to have each RF chain connected to only a reduced subset of antennas. The adoption of a disjoint structure with fewer phase shifters reduces the system complexity, the power consumption and the signal power loss. Moreover, all the signal processing can be easily carried out at the subarray level by using an adequate number of antennas [6] . Following the AoSA approach, it was shown in [20] that, to balance SE and power consumption in THz communications, adaption and dynamic control capabilities should be included in the hybrid precoding design. Therefore, Dynamic Arrays-of-SubArrays (DAoSA) architectures could be adopted. The same authors proposed a DAoSA hybrid precoding architecture which can intelligently adjust the connections between RF chains and subarrays through a network of switches. Their results showed that it is possible to achieve a good trade-off for the balancing between the SE and power consumption. Within the context of multiuser downlink scenarios, the authors of [21] studied some precoding schemes considering THz massive MIMO systems for Beyond 5 th Generation (B5G) networks. Besides showing the impact on EE and SE performance, carrier frequency, bandwidth and antenna gains, three different precoding schemes were evaluated and compared. It was observed that the hybrid precoding approach with baseband Zero Forcing for multiuser interference mitigation (HYB-ZF) achieved much better results than an ANalog-only BeamSTeering (AN-BST) scheme with no baseband precoder. In fact, this approach was capable of better approaching the upper bound defined by the singular value decomposition precoder (SVD-UB). Other relevant conclusion is that the design of precoding algorithms should be adapted to the communication schemes. While considering all the specific constraints may allow the maximization of the system performance of the system, formulating and solving the corresponding optimization problem may not be so simple. Motivated by the work above, in this paper we developed an algorithm for hybrid precoding design which can accommodate different low-complexity architectures suitable for both mmWave and THz MU MIMO systems. It is based on the idea of accomplishing a near-optimal approximation of the fully digital precoder for any configuration of antennas, RF chains and data streams through the application of the alternating direction method of multipliers (ADMM) [22] . ADMM is a well-known and effective method for solving convex optimization problems but can also be a powerful heuristic for several non-convex problems [22] , [23] . To use it effectively within the context of MU MIMO, proper formulation of the hybrid design problem as a multiple constrained matrix factorization problem is first presented. Using the proposed formulation, an iterative algorithm comprising several reduced complexity steps is obtained. The main contributions of this paper can be summarized as follows:  We propose a hybrid design algorithm with near fully digital performance, where the digital precoder, analog precoder and multiuser interference mitigation are computed separately through simple closed-form solutions. The hybrid design algorithm is developed independently of a specific channel or antenna configuration, which allows its application in mmWave and THz system. Whereas our previous work [10] also proposed an hybrid design algorithm for mmWave, it did not address multiuser systems, and in particular the MIMO broadcast channel. Therefore, it does not include any step for inter-user interference mitigation within its design. As we show in here, for this multiuser channel the hybrid design method must also deal with the residual inter-user interference as it can degrade system performance, particularly at high SNRs. The paper is organized as follows: section II presents the adopted system model. The adopted formulation of the hybrid design problem for the MU MIMO scenario and the proposed algorithm are described in detail in section III, which includes the implementation of the algorithm for different analog architectures. Performance results are then presented in section IV. Finally, the conclusions are outlined in section V. Notation: Matrices and vectors are denoted by uppercase and lowercase boldface letters, respectively. The superscript ( .) H In this section, we present the system and channel models adopted for the design of the hybrid precoding algorithm. Let us consider the OFDM base system illustrated in Fig. 1 . In this case we have a mmWave/THz hybrid multiuser MIMO system, where a base station (BS) is equipped with tx N antennas and transmits to u N users equipped with rx N antennas over F carriers, as can be seen in Fig. 1 analog precoder (combiner) is located after (before) the IFFT (FFT) blocks, it is shared between the different subcarriers, as in [25] , [26] . Regarding respectively. Regarding the channel model, it is important to note that even though the mmWave and THz bands share a few commonalities, the THz channel has several peculiarities that distinguish it from the mmWave channel. For example, the very high scattering and diffraction losses in the THz band will typically result in a much sparser channel in the angular domain with fewer multipaths components (typically less than 10) [21] . Furthermore, the gap between the line of sight (LOS) and non-line of sight (NLOS) components tends to be very large making it often LOS-dominant with NLOSassisted [26] . An additional aspect relies on the much larger bandwidth of THz signals which can suffer performance degradation due to the so-called beam split effect, where the transmission paths squint into different spatial directions depending on the subcarrier frequency [21] . In light of this, in this paper we consider a clustered wideband geometric channel, which is commonly adopted both in mmWave [15] and THz literature [20] , [26] , [27] , [29] . However, it should be noted that the hybrid precoding/combining approach proposed in this paper is independent of a specific MIMO channel. In this case the frequency domain channel matrices can be characterized as , , where cl N denotes the scattering clusters with each cluster i having a time delay of , the k th subcarrier frequency, B is the bandwidth, fc is the central frequency and γ is a normalizing factor such that . By carefully selecting the parameters of the channel model we can make it depict a mmWave or a THz channel. Considering Gaussian signaling, the spectral efficiency achieved by the system for the transmission to MS-u in subcarrier k is [29] , , is the covariance matrix of the total inter-user interference plus noise at MS-u, which is characterized by In this section, we will introduce the algorithm for the hybrid precoding problem and show how it can be adapted to different architectures. Although we will focus on the precoder design, a similar approach can be adopted for the combiner. However, since our design assumes that inter-user interference suppression is applied at the transmitter, only single-user detection is required at the receiver and therefore the algorithm reduces to the one described in [10] . Although there are several problem formulations for the hybrid design proposed in the literature, one of the most effective relies on the minimization of the Frobenius norm of the difference between the fully digital precoder and the hybrid precoder [22] , [30] , [31] , [32] . In this paper we follow this matrix approximation-based approach which can be formulated as is the set of feasible analog precoding matrices, which is defined according to the adopted RF architecture (it will be formally defined for several different architectures in the next subsection). Matrix opt k F denotes the fully digital precoder which can be designed so as to enforce zero inter-user interference using for example the block-diagonalization approach described in in [33] . Even if opt k F is selected in order to cancel all interference between users, the hybrid design resulting as a solution of (5)-(7) will correspond to an approximation and, as such, residual interuser interference will remain. To avoid the performance degradation that will result from this, an additional constraint can be added to the problem formulation, namely To derive a hybrid precoder/design algorithm that can cope with the different RF architectures we can integrate the RF constraint directly into the objective function of the optimization problem. This can be accomplished through the addition of an auxiliary variable, R, combined with the use of the indicator function. The indicator function for a generic set  is defined as A similar approach can be adopted for integrating the other constraints, (11) and (12), also into the objective Lagrangian function (ALF) for (13)-(16) can be written as Based on the ADMM [22] , we can apply the gradient ascent to the dual problem involving the ALF, which allows us to obtain an iterative precoding algorithm comprising the following sequence of steps. We start with the minimization of the ALF over RF F for iteration . which can be obtained from , Z (20) leading to the closed form expression After obtaining the expression for RF F , can be found by following the same methodology. In this case the minimization is expressed as from which by applying leads to the closed form expression The next steps consist of the minimization over R and k B . The minimization of (18) with respect to R and k B can be written as and and onto the set of matrices whose squared Frobenius norm is u s N N , respectively. While the former projection depends on the adopted analog architecture and will be explained in the next subsection, the second projection is simply computed as The minimization of (18) , which also involves a projection, , :, sub ject to 0 The general solution for this problem is presented in [30] corresponding to In this expression,   1 , k u V denotes the matrix containing the right singular vectors corresponding to the nonzero singular values associated to the singular value decomposition (SVD) given by Therefore, to compute matrix X one can perform a single value decomposition of Appropriate values for the penalty parameters can be obtained in a heuristic manner by performing numerical simulations. Regarding the initialization and termination of the algorithm, the same approach described in [10] can be adopted. The whole algorithm is summarized in Table I . In this table, Q denotes the maximum number of iterations. using (24), for all k=1, ..., F. using (26), for all k=1, ..., F. using (28) The projection operation is the only step specific to the implemented architecture, as will be explained in the next subsection. The projection operation is the only step specific to the implemented architecture, as will be explained in the next subsection. The projection required for obtaining matrix R in step 5 of the precoding algorithm, has to be implemented according to the specific analog beamformer [6] , [20] , [34] - [38] . This makes the proposed scheme very generic, allowing it to be easily adapted to different RF architectures. In the following we will consider a broad range of architectures that can be adopted at the RF precoder for achieving reduced complexity and power consumption implementations. We will consider FC, AoSA and DAoSA structures as illustrated in Fig. 2 . Besides phase shifters, we will also consider several alternative implementations for these structures, as shown in Fig. 3 . In the first case we consider the use of infinite resolution phase shifter. For this architecture the RF constraint set is given by and the corresponding projection can be performed simply using The second case considers a more realistic scenario, in which phase shifters can be digitally controlled with b N bits. These devices allow the selection of 2 b N different quantized phases and the RF constraint set becomes   , =0,...,2 1 The implementation of the projection in line 5 of Assuming that 1 b N  , then each variable phase shifter of the previous architecture can be replaced by a pair of switched lines, including also an inverter. The corresponding constraint set can be reduced to and the implementation of the projection simplifies to Alternatively, each of the variable phase shifters can be replaced by a switch. This simplification results in a network of switches connecting each RF chain to the antennas. The RF constraint set can be represented as and the projection can be implemented elementwise as The simplest scenario that we can consider corresponds to an architecture, where each RF chain can be only connected to a single antenna (and vice-versa). The RF constraint set will comprise a matrix with only one nonzero element per column and per row, i.e., : 8 In this definition 0 . represents the cardinality of a vector. The computation of j t is performed for all columns j=1, …, tx RF N , sorted by descending order in terms of highest real components. It should be noted that during this operation, the same row cannot be repeated. Within the context of UM-MIMO, one of the most appealing architectures for keeping the complexity acceptable relies on the use of AoSA, where each RF chain is only connected to one or more subsets of antennas (subarrays , assuming UPS in these connections. Clearly, the phase shifters can be replaced by any of the other alternatives presented previously. As a variation of the previous AoSA architecture, we also consider an implementation where each subarray can be connected to a maximum of max L RF chains (which can be non-adjacent assuming the use of UPS. Care must be taken to guarantee that at least one subblock will be active in every column of R. Similarly to the AoSA, the phase shifters can be replaced by any of the other presented alternatives. Another appealing architecture relies on the use of double phase shifters (DPS) since these remove the constant modulus restriction on the elements of RF F , following the idea in [38] . In this case the projection can be implemented elementwise simply as Similarly to other architectures, DPS can be used not only in the fully connected approach but also in the AoSA and DAoSA cases, replacing the constant modulus setting operation. In the proposed algorithm, the   Table II presents the total complexity order of the proposed method and compares it against other existing low complexity alternatives namely, AM -Based [15] , LASSO -Based Alt-Min (SPS and DPS) [14] and element-by-element (EBE) [20] algorithms. Taking into account that in UM-MIMO, Ntx will tend to be very large, it means the algorithms with higher complexity will typically be EBE and the one proposed in this paper due to the terms   It is important to note however, that while the computational complexity of these two design methods may be higher, both algorithms can be applied to simple AoSA/DAoSA architectures and, in particular, the proposed approach directlysupports structures with lower practical implementation complexity (and more energy efficient) such as those based on switches. Furthermore, in a single-user scenario, the interference cancellation step of the proposed algorithm is unnecessary, and the complexity reduces to In this section, the performance of the proposed algorithm will be evaluated and compared against other existing alternatives from the literature, considering multiuser MIMO systems. We consider that both the transmitter and receivers are equipped with uniform planar arrays (UPAs) with tx tx N N  antenna elements at the transmitter and rx rx N N  at the receiver. The respective array response vectors are given by is assumed (in this case we are admitting very weak NLOS paths compared to LOS which is typical in the THz band [28] ). A fully digital combiner was considered at each receiver and all simulation results were computed with 5000 independent Monte Carlo runs. First, we evaluate the performance assuming a fully Fig. 4 for F=1 and Fig. 5 for F=64. The number of RF chains at the transmitter ( tx RF N ) is equal to u s N N . Besides our proposed precoder, several alternative precoding schemes are compared against the fully digital solution, namely the LASSO-Based Alt-Min, the AM-Based and ADMM-Based precoding [14] , [15] , [10] . It can be observed that when F=1, only the LASSO-Based Alt-Min with single phase shifters (SPS) and the ADMM-Based precoder from [10] (which does not remove the inter-user interference) lie far from fully digital precoder. All the others achieve near optimum results and, in fact, can even match them when adopting DPS (proposed approach and LASSO-Based Alt-Min). As explained in Section II, whereas for F=1 we have BB F and RF F designed for that specific carrier, when F=64, RF F has to be common to all subcarriers. While this reduces the implementation complexity, it also results in a more demanding restriction that makes the approximation of opt k F (problem (5)-(7)) to become worse. Additionally, when this approximation worsens, there can also be increased interference between users. Therefore, it can be observed in the results of Fig. 5 that the gap between the fully digital precoder and all the different hybrid algorithms is substantially wider. Still, the proposed precoder manages to achieve the best results. Given the performances of the different approaches, it is important to remind that the AM-based precoding algorithm has the lowest performance in wideband but also one of the lowest computational complexity (see Table II of section III.C). In general, the proposed precoding algorithm is the one that can achieve better results at the cost of some additional computational complexity. Later on, we will address strategies based on lower complexity architectures that will allow reducing the power consumption associated to its complexity. In Fig. 6 we consider a scenario where the BS employs a larger array with 256 tx N  antennas to transmit 2 s N  simultaneous streams to each user, where 2 u N  . To better fit this scenario to a typical communication in the THz band we consider the existence of a LOS component, a center frequency of fc=300 GHz and a bandwidth of B=15 GHz (it is important to note that the beam split effect is also considered in the channel model). AM precoder from [15] requires a single stream per user and thus was not included in the figure. In this scenario, the LASSO-Based Alt-Min precoding schemes present a performance substantially lower when compared to the proposed approaches. Furthermore, the best performance is achieved with the use of double phase shifters, as expected. Once again, comparing the curves of the proposed precoder against the ADMM-based precoder from [10] , it is clear the advantage of adopting an interference cancellation-based design over a simple matrix approximation one. Next, we will focus on the adoption of different reduced complexity architectures according to the typologies presented in section III.B. The objective is to evaluate the performance degradation when simpler architectures are adopted.  . This figure is placed in a perspective of simplifying the implementation of the analog precoder but keeping a fully connected structure. We can observe that the versions based on DPS and single UPS achieve the best results, as expected. Considering the more realistic QPS versions, the results can worsen but it is visible that it is not necessary to use high resolution phase shifters since with only 3 bits resolution the results are already very close to the UPS curve. It can also be observed that the simplest of the architectures, AS, results in the worst performance but the spectral efficiency improves when the antenna selectors are replaced by a network of switches, or even better if branches with inverters are also included. In Fig. 8 , we intend to simplify the implementation even further with the adoption of AoSAs. In this case we considered that the maximum number of subarrays that can be connected to a RF chain ( max L ) is only one. The scenario is the same of To reduce the large performance loss due to the adoption of a simple AoSA architecture, we can allow the dynamic connection of more subarrays to each RF chain by adopting a DAoSA structure, as introduced in section III.B. In Fig. 9 we study the effect increasing the maximum number of subarrays that can be connected to an RF chain ( max L ) in the performance of these schemes. Each subarray has a size of 32 antennas (nt). Curves assuming the use of SPS as well as of DPS are included. It can be observed that the increase in the number of connections to subarrays, max L , has a dramatic effect on the performance, resulting in a huge improvement by simply going from Lmax=1 to Lmax=2. Increasing further to Lmax=4, the results become close to the fully connected case showing that the DAoSA can be a very appealing approach for balancing the spectral efficiency with hardware complexity and power consumption. Combining the increase of Lmax with the adoption of DPS can also improve the results but the gains become less pronounced for Lmax>1. It is important to note that the penalty parameters can be fine-tuned for different system configurations. One of the objectives of adopting these low complexity solutions is to reduce the overall power consumption. Based on [20] , we can calculate the total power consumption of each precoding scheme using   P P N P +P P N P N P N +P N P N P where P BB is the power of the baseband block (with NBB=1), P DAC is the power of a DAC, P OS is the power of an oscillator, P M is the power of a mixer, P PA is the power of a power amplifier, P PC is the power of a power combiner, P PS is the power of a phase shifter, P SW I is the power of a switch and P T denotes the transmit power. The Nx variable represents the number of elements of each device used in the precoder configuration. Based on the values provided in [20] and [39] for the power consumption of individual devices in the 300 GHz band we adopt the following values: PBB=200 mW, PDAC=110 mW, POS=4 mW, PM=22 mW, PPA=60 mW, PPC=6.6 mW, PSWI=24 mW and PT=100 mW. Regarding the phase shifters, we assume values of PPS=10, 20, 40, 100 mW for 1, 2, 3 and 4 quantization bits. Considering the same configuration scenario as Table III. For the fully-connected structure with UPS, we assumed that PPS=100 mW which corresponds to quantized phase shifter with Nb=4 bits [39] . For the remaining phase-shifter based precoder structures we assumed that PPS=40 mW which corresponds to quantized phase shifters with Nb=3 bits, since with only 3 bits resolution the results are already very close to the UPS curve (see Fig. 7 ). As can be seen from this table, the use of architectures based on DAoSAs allows us to reduce considerably the amount of power that is consumed at the precoder. In fact, we can reduce up to 55% the amount of consumed power if we consider a precoder scheme based on DAoSA with DPS and Lmax=4 versus a FC structure precoder based on UPS, with only a small performance penalty (Fig. 9 ). This saving increases to 73% if the DPS structure is replaced by an SPS one. In the particular case of architectures based on quantized phase shifters, we observed that by decreasing the number of quantization bits, it is possible to substantially reduce the power consumption without excessively compromising the complexity (as seen in Fig. 7) . The conclusion is corroborated by [20] and [39] , since the architectures based on low resolution QPS, AoSAs and DAoSAs present a superior energy efficiency when compared to the fully-connected structure with UPS. In Fig. 10 and Fig. 11 , we provide a comparison between our proposed precoder and the EBE precoder from [20] , considering an architecture based on DAoSAs (with SPS) and a scenario configuration similar to Fig. 9 , i.e., with 2 s N  , rx N  . These figures present various curves where the maximum number of subarrays that can be connected to a RF chain, Lmax, is changed. Fig. 10 refers to a SU scenario (Nu=1) whereas Fig. 11 corresponds to a MU scenario with Nu=4. In the SU case, the proposed precoder achieves results very close to the fully digital precoder, even with only Lmax=2. Compared to the proposed algorithm, EBE shows a wider gap, even though it has smaller complexity (as presented in Table II of section III.C). When we increase the number of users from 1 u N  to 4 u N  , we can clearly observe that the EBE algorithm suffers a substantial degradation compared to the proposed solution which can be explained due to the lack of inter-user interference cancellation (it was not specifically designed for MU scenarios). Even though a sub-6 GHz system often adopts fully digital processing [40] , where each antenna element has a dedicated RF chain, it is possible to apply the proposed hybrid design algorithm to a sub-6GHz channel since it is independent of a specific MIMO channel (as are the other alternative algorithms that we used as benchmarks and which are targeted at solving the matrix approximation problem). To exemplify, Fig. 12 presents the simulated results obtained for the same scenario of Fig. 4 but considering an ideal uncorrelated channel which approximates a rich scattering environment that is typical in sub-6 GHz bands. It can be observed that the proposed approach displays similar behavior to the ones in the upper-bands channel, showing that it can also be used for this particular type of channels (even though it may require a higher number of RF chains to achieve a good approximation to the fully digital solution in some scenarios, due to the channel not being sparse, as noted in [41] ). While we have shown how the proposed approach can deal with several relevant types of analog precoders/combiners, it is important to note that are other alternative structures that have been recently proposed in the literature. For example, some authors have considered precoding paradigms based on time-delayers structures for THz systems [28] , [42] . One of the most notorious is the Delay Phase Precoding (DPD), which consists in the use of a Time Delay (TD) network between the RF chains and the traditional phase shifters network in order to convert phasecontrolled analog precoding into delay-phase controlled analog precoding. The main advantage related with this type of precoding is that the time delays in the TD network are carefully designed to generate frequency-dependent beams which are aligned with the spatial directions over the whole bandwidth [42] . While we do not address the adoption of time-delay structures in this paper, it should be possible do derive a projection algorithm that simultaneously takes into account the constraints imposed in both analog-precoding steps: time-delay network and frequency-independent phaseshifters. In this paper, we proposed an iterative algorithm for hybrid precoding design which is suitable for multiuser MIMO systems operating in mmWave and THz bands. The adopted approach splits the formulated design into a sequence of smaller subproblems with closed-form solutions and can work with a broad range of configuration of antennas, RF chains and data streams. The separability of the design process allows the adaptability of the algorithm to different architectures, making it suitable to be implemented with low-complexity AoSA and DAoSA structures which are particularly relevant for the deployment of ultra-massive MIMO in hardware constrained THz systems. It was shown that good trade-offs between spectral efficiency and hardware implementation complexity can in fact be achieved by the proposed algorithm for several different architectures. Millimeter wave mobile communications for 5G cellular: It will work! A Comprehensive Survey on Millimeter Wave Communications for Fifth-Generation Wireless Networks: Feasibility and Challenges 6G and Beyond: The Future of Wireless Communications Systems THz Precoding for 6G: Applications, Challenges, Solutions, and Opportunities An Overview of Signal Processing Techniques for Terahertz Communications Terahertz Communications: An Array-of-Subarrays Solution A survey on hybrid beamforming techniques in 5G: Architecture and system model perspectives Massive MIMO Systems for 5G and beyond Networks-Overview, Recent Trends, Challenges, and Future Research Direction Hybrid digital and analog beamforming design for large-scale antenna arrays An alternating direction algorithm for hybrid precoding and combining in millimeter wave MIMO systems Hybrid Precoding and Combining Algorithm for Reduced Complexity and Power Consumption Architectures in mmWave Communications On millimeter wave and THz mobile radio channel for smart rail mobility MIMO Precoding and combining solutions for millimeter-wave systems Alternating minimization for hybrid precoding in multiuser OFDM mmWave systems Asilomar Conference on Signals, Systems and Computers Low Complexity Hybrid Precoding for Multiuser Millimeter Wave Systems Over Frequency Selective Channels Hybrid precoding based on adaptive RF-chain-to-antenna connection for millimeter wave MIMO systems MIMO-Aided Nonlinear Hybrid Transceiver Design for Multiuser mmWave Systems Relying on Tomlinson-Harashima Precoding A Kalman Based Hybrid Precoding for Multi-User Millimeter Wave MIMO Systems On the MMSE-based multiuser millimeter wave MIMO hybrid precoding design A Dynamic Array-of-Subarrays Architecture and Hybrid Precoding Algorithms for Terahertz Wireless Communications Terahertz Massive MIMO for Beyond-5G Wireless Communication Distributed optimization and statistical learning via the alternating direction method of multipliers Global convergence of ADMM in nonconvex nonsmooth optimization Alternating Minimization Algorithms for Hybrid Precoding in Millimeter Wave MIMO Systems Frequency Selective Hybrid Precoding for Limited Feedback Millimeter Wave Systems Hybrid Beamforming for Terahertz Multi-Carrier Systems Over Frequency Selective Fading TeraMIMO: A Channel Simulator for Wideband Ultra-Massive MIMO Terahertz Communications Subarray-Based Coordinated Beamforming Training for mmWave and Sub-THz Communications Hybrid MMSE Precoding and Combining Designs for mmWave Multiuser Systems Nonlinear programming Spatially sparse precoding in millimeter wave MIMO systems Alternating Minimization Algorithms for Hybrid Precoding in Millimeter Wave MIMO Systems Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels Hybrid MIMO Architectures for Millimeter WaveCommunications: Phase Shifters or Switches? AF relaying for millimeter wave communication systems with hybrid RF/baseband MIMO processing Hybrid Beamforming with a Reduced Number of Phase Shifters for Massive MIMO Systems Switch and Inverter Based Hybrid Precoding Algorithm for mmWave Massive MIMO System: Analysis on Sum-Rate and Energy-Efficiency Doubling Phase Shifters for Efficient Hybrid Precoder Design in Millimeter-Wave Communication Systems Dynamic-subarray with Quantized-and Fixed-phase Shifters for Terahertz Hybrid Beamforming Hybrid Precoding Using Out-of-Band Spatial Information for Multi-User Multi-RF-Chain Millimeter Wave Systems Dynamic Subarrays for Hybrid Precoding in Wideband mmWave MIMO Systems Delay-phase precoding for THz massive MIMO with beam split ACKNOWLEDGMENT This work was supported by the FCT -Fundação para a Ciência e Tecnologia under the grant 2020.05621.BD. The authors also acknowledge the funding provided by FCT/MCTES through national funds and when applicable cofunded EU funds under the project UIDB/50008/2020. ..., t r t r t r tx rx i l u i l u tx rx i l u