key: cord-0273981-6f2o761q
authors: Kranzler, Matthias; Herglotz, Christian; Kaup, Andr'e
title: Energy Efficient Video Decoding for VVC Using a Greedy Strategy Based Design Space Exploration
date: 2021-11-23
journal: nan
DOI: 10.1109/tcsvt.2021.3130739
sha: c3ab6a4f127cd2497d2eb2756a66d0f50ad37f06
doc_id: 273981
cord_uid: 6f2o761q

IP traffic has increased significantly in recent years, and it is expected that this progress will continue. Recent studies report that the viewing of online video content accounts for a share of 1% of the global greenhouse gas emissions. To reduce the data traffic of video streaming, the new standard Versatile Video Coding (VVC) has been finalized in 2020. In this paper, the energy efficiency of two different VVC decoders is analyzed in detail. Furthermore, we propose a design space exploration that uses an algorithm based on a greedy strategy to derive coding tool profiles that optimize the energy demand of the decoder. We show that the algorithm derives optimal coding tool profiles for a subset of coding tools. Additionally, we propose profiles that reduce the energy demand of VVC decoders and provide energy savings of more than 50% for sequences with 4K resolution. Thereby, we will also show that the proposed profiles can have a lower decoding energy demand than comparable HEVC-encoded bit streams while also having a significantly lower bit rate.

Abstract-IP traffic has increased significantly in recent years, and it is expected that this progress will continue. Recent studies report that the viewing of online video content accounts for a share of 1 % of the global greenhouse gas emissions. To reduce the data traffic of video streaming, the new standard Versatile Video Coding (VVC) has been finalized in 2020. In this paper, the energy efficiency of two different VVC decoders is analyzed in detail. Furthermore, we propose a design space exploration that uses an algorithm based on a greedy strategy to derive coding tool profiles that optimize the energy demand of the decoder. We show that the algorithm derives optimal coding tool profiles for a subset of coding tools. Additionally, we propose profiles that reduce the energy demand of VVC decoders and provide energy savings of more than 50 % for sequences with 4K resolution. Thereby, we will also show that the proposed profiles can have a lower decoding energy demand than comparable HEVC-encoded bit streams while also having a significantly lower bit rate.

Index Terms-Complexity, Optimization, Energy Efficiency, Video, Coding Tools, VVC, HEVC.

I N recent years, IP traffic has increased significantly, and by 2022, it will rise by another 60 % in relation to 2020 [1] .

A key factor in this development is the increased usage of internet video streaming, which will have a traffic share of over 80 % in 2022. The gained popularity of video-on-demand (VoD) services and the increased use of video conferencing resulting from the COVID pandemic have an amplifying effect on the IP traffic. In addition, the growth in video streaming demand is driven by a larger proportion of video devices with higher resolution such as Ultra High Definition (UHD) TVs and the rising speed of broadband and mobile connections [1] .

Simultaneously, it is reported that the viewing of online video content had a share of 1 % of the global greenhouse gas (GHG) emissions in 2018 [2] , which is comparable to the total emissions of a country such as Spain. Considering the previously mentioned increase in video streaming, it can be assumed that this share will increase as well. Consequently, it is essential to improve the energy efficiency of video coding systems in order to reduce GHG emissions. Energy efficiency is not only necessary from an environmental perspective, but also for battery powered mobile devices, such as smartphones, which have a limited battery lifetime [3] .

To reduce the global IP traffic caused by video communication and to manage the increased demand for immersive Manuscript video technology such as Virtual Reality (VR), the Versatile Video Coding (VVC) standard was finalized in 2020. The goal of VVC is to reduce the bit rate by approximately 50 % at an equal subjective visual quality compared to its predecessor High Efficiency Video Coding (HEVC) [4] . The improvement of the compression performance in terms of rate-distortion (RD) efficiency results in a significant rise in computational complexity and energy demand of both the encoder and the decoder. In previous works, it is reported that the energy demand of the reference decoder implementation of VVC is increased by over 80 % relative to HEVC [5] . Therefore, we propose a solution to improve the energy efficiency of VVC that has a higher energy efficiency than HEVC.

The energy demand of mobile devices for video streaming applications is modeled in [6] . It is shown that the display brightness, the frame rate, and the streaming bit rate are well suited to model the energy demand accurately. The authors conclude that those components have a major influence on the energy efficiency of video streaming. In [7] , the authors analyze the energy demand of HEVC software and hardware decoders on different devices. It turns out that hardware decoders consume up to 30 % of the total device energy and for software decoding, this share is up to 50 %. However, the authors found that the energy demand highly depends on the resolution and frame rate of the video sequence, similar to the findings in [6] . Therefore, with a higher resolution or frame rate, the ratio of hardware and software decoding power increases. For a resolution such as 1080p, which is common for mobile devices, the ratio remains below four. According to the authors, the ratio between hardware and software decoding can be further reduced, when the software decoder gets more efficient, which is one aspect that we propose in this paper.

In literature, one can find several approaches that reduce the energy demand of a video decoder. In [8] , the authors propose an algorithmic method for HEVC, which utilizes two modifications towards the regular HEVC decoder to reduce the energy demand by up to 40 %. First, the complexity of the motion compensation filters is simplified by reducing the length of the filters. Second, in-loop and motion compensation filters are skipped with a specific ratio. Another approach that addresses the motion compensation and deblocking filter operations is evaluated in [9] . The authors propose a method that is capable of reducing the decoder complexity at a specified rate. Furthermore, to improve the subjective quality of the decoded images, those actions are mainly applied to nonsalient areas, which are derived with the approach presented in [10] . Thereby, the authors achieve complexity reductions from 15 % up to 40 %.

Alternatively, the decoder's energy demand can be reduced by lowering the processor's clock frequency while keeping the frame rate of the output video. This method is called Dynamic Voltage and Frequency Scaling (DVFS) [11] . With a lower processing frequency, the energy demand is reduced without a loss in visual quality.

Despite the previous methods, not only the decoder itself can be optimized, but also the encoder can be utilized to optimize the energy demand of the subsequent decoding process. This is achieved by changing the rate-distortion optimization (RDO) in the encoder. Therefore, RDO has to be adjusted to the energy domain [12] - [14] .

In [12] , the complexity, which is closely related to the processing energy, is modeled in terms of CPU cycles that are needed for decoding. With this optimization, it is possible to reduce the energy demand of the decoder by 41 %, while the peak signal-to-noise ratio (PSNR) decreases by approximately 2.5 dB.

In [13] , the energy is modeled in terms of the time demand to execute the decoder. Thereby, the authors report an average energy reduction of approximately 10 %, a bit rate increase of 10 %, and a PSNR reduction of 0.26 dB.

Alternatively, the energy demand of the decoder is estimated in advance in the encoder with a bit stream feature-based energy model [15] , which is called Decoding-Energy-Rate-Distortion-Optimization (DERDO) [14] . The authors achieve an energy reduction of up to 30 % for a maximum bit rate increase of 50 %. Therefore, for each video codec, a bit stream model is necessary. In [16] and [17] , corresponding bit stream models are proposed for HEVC and VVC, respectively.

In literature, several energy models describe the energy demand of a smartphone while decoding video content [18] or virtual reality content [19] . These models can also be utilized to optimize energy consumption.

In addition to the decoder, the encoder complexity is optimized in [20] - [22] . The classical RDO is extended by Li et al. [20] with a metric that describes the complexity of the coding mode testing in Advanced Video Coding (AVC), through the rate-complexity-distortion optimization (RCDO). With RCDO it is possible to adaptively control encoder complexity. For HEVC, Huang et al. [21] propose an improved RCDO algorithm that utilizes a constrained mathematical description of the performance to trade-off complexity and compression efficiency. Furthermore, several approaches are applied that improve the coding mode decision process.

In [22] , an algorithm is proposed that can reduce the complexity of the intra mode by an early skipping of intra prediction and partition modes in VVC. Furthermore, a fast intra mode decision is proposed that is based on a gradient descent search, which is a greedy algorithm.

The block partitioning that is used by VVC is presented in [23] . The authors show that the usage of the binary tree and ternary tree partitioning has a huge influence on the compression efficiency and encoding time (over 6-times compared to HEVC), but little influence on the decoding time, which increases in the range of 2-4 %. Hence, we neglect the partitioning in this work.

For this paper, we develop an approach based on the findings in [5] , where it was determined that several coding tools increase the compression efficiency by adding computational complexity and thereby, increase the energy demand of the decoder as well. However, it was also discovered that some coding tools even reduce the energy demand. In this paper, the evaluation of coding tools in terms of their energy efficiency will be studied with a tool sensitivity analysis showing both the impact on rate-distortion efficiency and energy efficiency.

Furthermore, we propose to use a novel design space exploration (DSE) to optimize the energy demand. The DSE will derive coding tool profiles, which indicate the usage of each considered coding tool.

In summary, this work will provide the following contributions:

• General energy efficiency analysis of several decoder implementations of VVC and HEVC. • Methodology to derive optimum energy efficient coding tool profiles. • Analysis of greedy strategy based algorithm on a subset of features • Tool sensitivity analysis of VTM decoder. • Energy efficient profiles for VVC with superior energy efficiency to HEVC. In Section II, we will give a brief overview of VVC and show for each coding configuration (RA, LB, AI), which coding tools are enabled. Afterward, Section III presents the proposed design space exploration for the determination of energy efficient coding tool profiles. Then, in Section IV, we will explain the energy measurement setup, the evaluation metrics, the used test sequences, and the software setup with the used video codec implementations of encoder and decoder. Thereafter, we will analyze the energy efficiency of two VVC decoders, namely VTM and VVdeC [24] , in detail and compare the results to comparable HEVC decoders. Furthermore, in Section V, we will show the results of the proposed algorithm in comparison to a full search method for subset of coding tools. Then, we will evaluate the DSE in detail and propose two coding tool profiles for each coding configuration. Finally, Section VI will conclude this paper.

In this section, we will give a short overview of all coding tools that we used for our energy analysis. As mentioned before, the goal of VVC is to reduce the bit rate in relation to its predecessor HEVC by 50 %. This reduction is mainly achieved by various coding tools and a changed partitioning scheme. The partitioning of coding tree units (CTUs) in VVC is realized by a quadtree structure, which was previously used in HEVC [25] , and an additional nested multi-type tree (MTT) [26] splitting, which allows splitting into nonsquare blocks.

Besides the enhanced partitioning scheme, VVC introduces a significant number of coding tools. All coding tools that were considered for the evaluation in this paper are shown in Table I . In this table, it is denoted whether a coding tool is enabled according to the CTC configuration of the VTM encoder [27] . Cross-component linear model (CCLM)     2 Intra sub-partition (ISP) 3

Matrix-based intra-picture prediction (MIP) 4

Multiple reference line (MRL)

Affine motion (AFFINE) -6

Adaptive MV resolution (AMVR) -7

Biprediction with CU-level weights (BCW) -8

Bidirectional optical flow (BDOF) --9

Combined inter-/intra-picture prediction (CIIP) Throughout the paper, the usage settings of Table I will be referred to as CTC profile. Tools that are initially enabled are denoted by a () and disabled tools are denoted by a () in Table I . Furthermore, coding tools that are not relevant for the specific coding tool profile are denoted by (-) (e.g., inter prediction tools for AI). A detailed description of the corresponding coding tools can be found in [28] .

In the following, we will describe the basic concept to achieve increased energy efficiency with a design space exploration (DSE) [29] . In [29] , the encoder's complexity of the HEVC mode decision process is optimized by parallelization and skip decisions. In contrast to [29] , we will focus on coding tools and the energy demand reduction of the decoder. Based on the findings in [5] , we disable and enable coding tools of VVC in the encoder and thereby, reduce the decoding energy demand.

We define the design space by the tradeoff between energy and compression efficiency. To achieve a higher energy efficiency for VVC, we introduce the coding tool profile u, which is defined by

where u(η) indicates the usage of a coding tool, η corresponds to the index of a specific coding tool, and N to the number of coding tools from Table I . Each entry represents a binary value u(η) ∈ {0, 1} indicating whether the tool is disabled or enabled. For the initialization of u, we consider 28 coding tools according to Table I . For each coding tool η that is marked with (), u(η) is 1. Otherwise, for a tools that is marked with (), u(η) is 0, and the remaining tools marked with (−) are not considered.

To evaluate the influence of a changed coding tool profile on energy and compression efficiency, we use the Bjøntegaard-Delta (BD) metric. With a BD-metric, we compare the efficiency of an arbitrary video codec to another codec. Each BD metric that we use in this paper is based on the Bjøntegaard-Delta bit rate (BDR-PSNR) [30] , which describes the bit rate savings in % for the same objective quality measured in PSNR. To evaluate the energy efficiency of a decoder, we substitute the bit rate by the decoder's energy demand. We call the resulting BD metric Bjøntegaard-Delta decoding energy (BDDE-PSNR), which describes the energy savings in % for the same PSNR. The measurement of the decoding energy and of PSNR will be explained in detail in Section IV.

For the optimization of the decoding energy demand, we iteratively change the coding tool profile u, encode the sequences with four different QPs, and subsequently measure the decoding energy of the resulting bit streams. Mathematically, we model this process using the function f (u) with u as input with min

where BDDE-PSNR is calculated using the CTC coding tool profile from Table I as a reference. To find a solution for (2), a full search is a possible solution. However, in this case, it would be necessary to encode and measure the decoding energy demand of all combinations, which has a complexity of O (2 n ) = 2 28 = 268,435,456 coding tool profiles for RA coding, which is not feasible in practice.

Therefore, we propose a novel approach that is a greedy strategy based iterative Design Space Iteration (DSE) to determine coding tool profiles that increase the energy efficiency. Thereby, we can reduce the complexity to determine coding tool profiles to O (i · n), where i corresponds to the number of iterations. Consequently, the complexity can be reduced significantly. However, as the algorithm does not guarantee

optimality, we will evaluate the algorithm in detail for optimality in Subsection V-A using a subset of tools. The encoded bit streams of the proposed coding tool profiles can be decoded with any decoder that is conforming with the specification of VVC.

In Algorithm 1, the proposed method is described. We define the vector u i,ν , where i corresponds to an iteration of the while loop (c.f. line 3 in Algorithm 1) and ν to a coding tool within the for loop (c.f. line 5 in Algorithm 1). If ν equals zero, u i,0 corresponds to the reference of an iteration i. For the calculation of BDDE in Algorithm 1, we use the CTC profile (u 1,0 ) as a reference. Furthermore, the algorithm gets the values of PSNR and decoding energy demand from the function Analyze(u i,0 ) (c.f. line 4 and 12 in Algorithm 1).

At first, we initialize u 1,0 with the values from Table I for the corresponding coding tool profile. After the bit streams are generated, we measure the decoding energy demand and analyze the bit streams in terms of PSNR. Then, for each coding tool ν in the for loop, we change the usage of u i,ν (ν). Then, we encode and evaluate the quality and the decoding energy of the corresponding bit streams. Afterward, we calculate the BDDE value, which evaluates the changes in terms of the energy demand if the coding tool is disabled or enabled. If the energy demand is decreased in relation to the reference coding tool profile of the iteration (BDDE (u 1,0 , u i,ν ) < BDDE (u 1,0 , u i,0 )), c.f. line 14 in Algorithm 1, we conclude that the reference coding tool profile of the iteration u i,0 has a lower energy efficiency and we keep the change of the ν-th coding tool for the next iteration (c.f. line 15 in Algorithm 1).

Finally, after all coding tools are evaluated for an iteration i, we compare u i,ν with u i+1,ν . If both are equal (c.f. line 3 in Algorithm 1), we stop the algorithm because the algorithm has converged. Another stopping criterion is the condition in line 18, which checks whether the reference profile of the previous iteration has a higher energy efficiency than all tests of the current iteration. In Section V, we will evaluate the DSE algorithm to show that the method can achieve significant energy savings. The used software decoder should serve as an example to verify the methodology, which is not limited to a specific decoder implementation.

As this paper evaluates the energy performance of VVC decoding, we will explain the energy measurement setup in Subsection IV-A. Subsequently, in Subsection IV-B, we describe our software setup. Then, in Subsection IV-C, we will show the sets of video sequences used for the evaluation of the energy efficiency of the decoders. Afterward, in Subsection IV-D, we will focus on the quality metrics VMAF and PSNR. In Subsection IV-E, we will describe several Bjøntegaard-Delta metrics that are based on the commonly used metric Bjøntegaard-Delta bit rate (BDR) [30] . Finally, in Subsection IV-F, we will compare the RD and energy efficiency of VVC with the VTM and VVdeC decoder in relation to HEVC with the HM and openHEVC decoder, respectively.

For our measurements, we use a desktop PC with an Intel i7-8700 CPU, which is a processor with x86 architecture that has six cores with a base frequency of 3.20 GHz, and CentOS 7 as an operating system (OS). For the energy measurement, we use the integrated power meter Running Average Power Limit (RAPL) [31] that is built-in our CPU.

As a second measurement setup, we use a Raspberry Pi that has a Cortex-A74 quad-core CPU, which is based on the ARM architecture, and as OS we use Raspbian. The energy demand of the device is measured with the external power meter LMG95 by ZES Zimmer. The results of the evaluation with this device will be shown in Section V-C4.

Similar to [15] , we perform multiple measurements and verify the statistical correctness with a confidence interval test. Thereby, we ensure that our measurements are not affected by noise, which can be caused, e.g., by background processes of the OS. The statistical test is defined by

where m is the number of measurements, σ corresponds to the standard deviation of the measurement series, β to the maximum deviation of the energy, α to the probability that the condition of β is fulfilled, t α to the critical t-value of the Student's t-distribution, and E dec is the average decoding energy demand of the measurement series. We define β to be 0.02 and α to 0.99, which means that with a probability of 99 % we have maximum deviation of 2 % for E dec from the actual energy demand. Each measurement of the active decoding energy demand E dec is based on two separate measurements. First, the power demand is measured during the decoding process. Afterward, the power demand is measured in idle mode for the same duration as for the decoding process.

Finally, E dec is derived by subtracting the idle energy from the decoding energy. 

For HEVC, we use the reference software implementation HM-16.20 [32] for the encoding and decoding of video sequences. Additionally, we use openHEVC with the version 2.0 [33] as an optimized decoder implementation.

For VVC, we use the reference software implementations VTM-8.0 and VTM-10.0 [27] for the encoding and decoding of bit streams. Furthermore, we use an optimized software decoder implementation of VVC called Versatile Video Decoder (VVdeC) [24] , which is proposed in [34] and targets real-time decoding. For VVdeC, we use the version 1.1.2 for the energy measurements. For the reference software implementations HM and VTM, the decoder is executed on a single core and the optimized decoders openHEVC and VVdeC use the maximum number of available threads and cores.

For our test sequence sets, we use two sets of video sequences from the literature. First, the set from the common test conditions (CTC) of JVET [35] and second, the test set of the ultra video group (UVG) [36] . The JVET set includes 26 sequences from 416 × 240 to 4K resolution and frame rates from 20 to 60 frames per second (fps). We divide the sequences of the UVG set into three classes. For class UVG1, the sequences have a 4K resolution and a frame rate of 120 fps. For the class UVG2, the sequences also have a 4K resolution and a frame rate of 50 fps. For class UVG3, the sequences have a full HD resolution and a frame rate of 120 fps.

The encoding configurations from the CTC of JVET are used for the encoding of the bit streams. For each sequence set, we code three coding configurations, which are All Intra (AI), Lowdelay B (LB), and Randomaccess (RA). All sequences are coded with a quantization parameter (QP) of 22, 27, 32, and 37. For sequences with 4K resolution (A1, A2, UVG1, and UVG2), LB is not coded, and for class E, RA is not coded as recommended by [35] . Furthermore, we use an internal coding bit depth of 10 bit for both HEVC and VVC, and the temporal subsampling of AI is set to one, which allows to measure the decoding energy demand of all frames.

To evaluate the visual quality, the peak signal-to-noise ratio (PSNR) is used, which is commonly used in video coding and measures the objective visual quality [37] . Since we do not measure E dec of a single color component, we use PSNR YUV [37] , which is the weighted average of all color components as shown by the following equation

The luma (Y) component is weighted by 6, which shall reflect the characteristic of the human perception that is more sensitive to luminance than to chrominance. An alternative to evaluate the visual quality of video sequences is Video Multimethod Assessment Fusion (VMAF) [38] , which is a full reference metric. According to [38] , the subjective quality is predicted more accurately with VMAF than with PSNR. This is achieved by using a support vector machine that combines several quality metrics from the literature.

For the derivation of the VMAF score, the implementation of [39] is used with the model version 0.6.1. The default VMAF score is trained on subjective viewing tests with full-HD displays. As we also consider sequences with a 4K resolution, we use the VMAF score that is trained for 4K displays for the corresponding video sequences.

In Figure 1 , several of the previously described metrics are visualized for the Tango2 input sequence. The diagram on the left side shows the PSNR YUV with the corresponding bit rate for the HM-16.20 encoder (blue) and the VTM-10.0 encoder (red). In the diagram in the middle, we show the decoding energy demand on the horizontal axis. Finally, the diagram on the right depicts the VMAF-score.

In Section III, we described both the metric BDR-PSNR and BDDE-PSNR. In the following, we also want to utilize VMAF, which has a higher correlation to the subjective impression of video sequences than PSNR. Therefore, we use the results of the VMAF score and substitute PSNR by VMAF for each previously mentioned BD metric to obtain BDR-VMAF and BDDE-VMAF. For BDR-VMAF, we obtain the bit rate savings for an equal VMAF score and for BDDE-VMAF, the decoder's energy savings, respectively. In Figure 1 , the diagram on the left shows that VVC has a higher compression efficiency because it uses less bit rate for equal PSNR YUV (BDR-PSNR compared to HM: -39.2 %). For the diagram in the middle, the VTM decoder has a higher energy demand than the HM decoder, which results in a BDDE-PSNR of 81.2 % compared to HM decoding. Finally, for the diagram on the right side, the VTM decoder also has a higher energy demand than the the HM decoder with a BDDE-VMAF of 79.9 %.

In the following, we will analyze the energy and compression efficiency of VVC compared to HEVC, which is shown in Figure 2 . In the figure, the horizontal axis depicts the average compression efficiency in terms of BDR-VMAF and the vertical axis the energy efficiency in terms of BDDE-VMAF for both test sets. For this figure, a low value for both metrics is desirable, which means that markers at the bottom left corner have a low increase in decoding energy demand in relation to the corresponding HEVC decoder and high decrease in bit rate. In the figure, we determine that the energy demand of the VTM decoder is increased in the range between 60 % and 90 % in relation to HM. For the optimized implementations, the energy demand increase of VVdeC is in the range between 30 % and 190 % in relation to openHEVC. In Table II each coding configuration, we show the results of each class and the mean average of each video sequence set, which will be discussed in the following. For AI coding in Figure 2 (blue markers), we measured that the VTM decoder has an average energy demand increase of 73.55 % for the JVET set in terms of BDDE-VMAF, and by 75.51 % for the UVG set. For RA coding (black markers), the corresponding results are 86.17 % and 69.19 %, respectively. For LB coding (red markers), the average BDDE value is 75.65 % for the JVET set and 61.48 % for the UVG set.

For VVdeC decoding, we observe that the increase of the energy demand is higher for AI coding compared to the VTM decoder. For the JVET set, BDDE-VMAF is 174.57 % and for the UVG set, 171.32 %. Furthermore, we determine that for the JVET set, the energy demand increase is also higher with the VVdeC decoder, which has a BDDE-VMAF of 96.52 % for LB coding, and of 98.40 % for RA coding. However, for the UVG set, the energy demand is lower with 33.43 % for LB coding and 51.00 % for RA coding. By comparing the results of RA and LB for VVdeC for the corresponding classes in Table II , we observe that the sequences with UHD (Class A1, A2, UVG1, and UVG2) and HD resolution (B and UVG3) have a lower BDDE value than the other classes (C, D, E, and F). Therefore, we conclude that video sequences with a higher resolution have a lower relative increase in energy demand than sequences with a lower resolution.

In summary, both VVC decoders have a significant increase in energy demand. Consequently, it is desirable to find coding tool profiles that have a lower energy demand than the reference profiles of VVC for all coding configurations.

In the following, we will at first evaluate the proposed algorithm on a subset of coding tools in Subsection V-A. Then, the results of the DSE algorithm and our tool sensitivity analysis will be presented in Subsection V-B. Finally, in Subsection V-C, we will select two coding tool profiles u i,ν for AI, LB, and RA coding and validate them on both test sets with VTM and VVdeC.

As shown in Section III, a full search is not possible for all coding tools. However, to evaluate the algorithm on optimality, we select a subset of coding tools for each coding configuration. For each subset, we randomly selected the following coding tools from the groups in Table I : ISP, CCLM, DQ, MTS, ALF, SAO, AFFINE, and GPM.

In the following, we encode all combinations of the corresponding coding tools for the class C sequences. For RA and LB, we encode 256 coding tool profiles and for AI, 64 profiles since the coding tools AFFINE and GPM are not applicable. Figure 3 shows the results of the evaluation with the greedy algorithm for RA coding. In the figure, the vertical axis shows the BDDE-PSNR and the horizontal axis the BDR-PSNR. For the visualization of Algorithm 1 in Figure 3 , we use different colors for each iteration. We determine from Figure 3 that the algorithm successfully obtained the coding tool profile with the highest energy demand reduction in terms of BDDE-PSNR, which is -13.86 %. This coding tool profile is determined both in the second iteration (orange marker) and the forth iteration (green marker).

For LB and AI, the algorithm also successfully determined the coding tool profile with the lowest BDDE-PSNR value. Thus, although we are not able to prove optimality for all coding tools, we show that the optimal solution was determined for each subset of coding tools.

For the training of the DSE, we only use class C sequences from the JVET set, which has sufficient coverage of the Spatial Information (SI) and Temporal Information (TI) according to [40] . Furthermore, we reduce the number of encoded frames to 128. Thereby, we reduce the time demand and computational complexity to encode bit streams. The coding tools in Table I are selected based on the recommendations in [41] . If the coding tool dependent quantization (DQ) is disabled, we enable the coding tool sign data hiding, which is an alternative to DQ. Furthermore, we use VTM-8.0 as an encoder, which has the same coding tool set as VTM-10.0. In Figure 4 , the results for the training of all coding tool profiles are shown. The red markers correspond to the performance of the CTC profile of VTM, which will be the reference for the calculation of the BDR and BDDE metrics in the following. The blue plus-shaped markers correspond to the results of the HEVC CTC profile. Additionally, for each CTC coding configuration, two profiles are selected that will be evaluated profoundly in Section V-C. The green diamondshaped marker corresponds to the profile with maximum energy reduction, which we call energy efficient (EE) profile. The green asterisk marker corresponds to the joint tradeoff of the energy demand reduction and the bit rate increase, which we call energy and bit rate efficient (EBE). We select the EBE profile for each coding configuration based on a refinement selection. At first, we select the best three coding tool profiles with a BDR value of less than 10 % and the lowest sum of BDDE and BDR. Then, we encode and measure five sequences of the JVET set. Finally, the profile with the lowest sum of BDDE and BDR after the refinement selection will be used as the EBE profile.

Additionally, we will analyze the energy sensitivity of all tested coding tools. Therefore, according to their influence on the energy efficiency for the first iteration of the DSE, we assign each coding tool to one of the following four categories: major increase, minor increase, minor decrease, major decrease. Coding tools that change the energy demand by their usage by ±1 % will be assigned to the categories with minor influence. Since most of the coding tools in Table I , are enabled in the first iteration of the DSE, and a negative BDDE value corresponds to a lower energy demand when the corresponding coding tool is used. The results are summarized in Table III . 1) All Intra: For AI coding (cf. Fig. 4 (a) ), the deblocking filter has the highest energy demand reduction within the first iteration with a BDDE-PSNR value of -17.29 % and a BDR-PSNR of -0.40 %, which corresponds to the blue marker. Therefore, it can be concluded that DBF has a huge potential to optimize the energy demand of the decoder, which is similar to the findings of [8] , where the deblocking filter was skipped.

For the second iteration, we can see in Figure 4 (a) that the updated coding tool profile u 2,0 has a BDDE-PSNR value of -26.52 % and a BDR-PSNR value of 5.14 %. For this profile, all coding tools that have a lower BDDE-PSNR value than 0 % are disabled, which is fulfilled for the tools ALF, CCALF, DBF, ISP, LFNST, LMCS, and SAO.

Finally, in the third iteration, the lowest BDDE-PSNR value is measured for the coding tool profile u 3,0 with -30.93 % and a BDR-PSNR value of 11.24 %. This coding tool profile will be used as the EE profile, which is also visualized by the diamond-shaped marker in Figure 4 . Furthermore, the usage of each coding tool is shown in Table IV . For the other profiles in the third iteration, we determined that a change in the usage decreases the energy efficiency for all coding tools. Therefore, the condition for the termination of Algorithm 1 is met.

The green asterisk marker in Figure 4 (a) corresponds to the EBE profile, which is a profile that we selected from iteration 2 based on the refinement that we explained above. Again, the usage of the corresponding coding tools is shown in Table IV . The BDDE-PSNR value for this profile is -25.96 % and the BDR-PSNR value is 4.72 %.

In Table III , the assignment to each category is shown for all coding tools. A major energy efficiency increase is measured for the coding tools: CST, DQ, JCCR, and MRL. For the coding tools ALF, DBF, IBC, ISP, LFNST, and SAO, the energy efficiency is decreased significantly. Therefore, the decoder energy demand of VVC can be reduced if in-loop filters such as ALF, DBF, and SAO are not used by the encoder.

2) Lowdelay B: For the coding tools DQ, GPM, and MMVD, we observe a major energy efficiency increase. In Figure 4 (b), these tools have a positive BDDE-PSNR value of more than 1 %. For GPM, the decoder's energy demand is increased by 4.27 % if the coding tool is disabled, and the bit rate is increased by 2.24 %.

For the coding tools AFFINE, ALF, DBF, LMCS, SAO, and SBTMVP, the energy efficiency is decreased significantly. In particular, DBF has a significant impact on the energy efficiency with a BDDE-PSNR value -20.04 %. Simultaneously, the bit rate is slightly increased with a BDR-PSNR of 0.86 % (c.f. Figure 4 (b) ). Furthermore, for AFFINE, we determine a BDDE-PSNR value of -5.11 % and a BDR-PSNR value of 1.54 %. For ALF, the BDDE-PSNR value is -6.13 % and the BDR-PSNR value 3.99 %. Therefore, ALF can significantly improve the compression efficiency by spending a significant amount of computational complexity.

For the EE profile, we use a profile that has a BDDE-PSNR value of -39.88 % and a BDR-PSNR value of 10.98 %, which is a profile from the second iteration of the DSE training. For the EBE profile, we use another profile from the second iteration that has a BDDE-PSNR value of -32.70 % and a BDR-PSNR value of 4.69 %. This profile was selected based on the refinement similar to AI.

3) Randomaccess: For RA coding, we determine that three coding tools (BCW, DQ, and GPM) have a major increase of the energy efficiency. The highest increase in decoding energy can be observed for the coding tool GPM (BDDE: 2.91 % and BDR: 1.84 %) if the tool is disabled. This can be explained by the functionality of the coding tool, which is the prediction of motion with blocks that have a triangular or trapezoid shape [42] . Thereby, it is not necessary to have further splits of a CU to represent complex motion, which results in higher block sizes if GPM is used. For such blocks, GPM avoids subpartitioning into multiple, small rectangular blocks such that the overall share of large block sizes increases. The lower share of blocks with less pixels is observed by the bit stream feature analyzer proposed in [17] , where for the CTC profile, 36.70 % pixels are predicted with a block size of less than 512 pixels and for the profile with disabled GPM, the share is 37.80 %. In total, GPM is used for the CTC profile for a share of 5.66 % pixels. As a consequence, the complexity decrease related to a lower number of small blocks overcompensates the complexity increase due to GPM.

Finally, for the coding tools AFFINE, ALF, BDOF, DBF, DMVR, and SAO, the energy efficiency is decreased significantly when disabled. Again, DBF has the highest energy demand reduction (BDDE-PSNR: -12.84 %). For ALF, the energy demand is reduced by -5.57 % and the bit rate is increased by 3.63 % if the tool is disabled. Correspondingly ,   TABLE IV  PROPOSED EE AND EBE PROFILES FOR ALL CTC CONFIGURATIONS AS  DERIVED BY THE DSE.   AI  LB  RA  Tool  EE EBE  EE EBE  EE EBE   CCLM        ISP        MIP        MRL         AFFINE  --AMVR  --BCW  --BDOF  ----CIIP -

for the coding tool DMVR, the bit rate is increased slightly (BDR-PSNR: 0.47 %) and the energy demand is reduced by -4.61 %.

For the EE profile, we choose the profile that has the minimal BDDE-PSNR value of -44.13 % and a BDR-PSNR value of 17.08 %. The usage of the corresponding coding tool is shown in Table IV . For the EBE profile, we choose another profile with a BDDE-PSNR value of -36.33 % and a BDR-PSNR value of 9.99 % based on the earlier described refinement.

In the following, we will evaluate the proposed coding tool profiles of Table IV on both video sets introduced in Section IV-C. Therefore, we will at first take a look at the impact of the EE and EBE profile in terms of bit rate and decoding energy demand compared to the CTC profile with the VTM decoder. The results of this evaluation are shown in Figure 5 and in Table V . For all BD metrics, we use the CTC profile with the VTM decoder as a reference. Finally, we will compare both profiles for the VVdeC decoder in Table VI. 1) All Intra: For the EE profile, we determine that we have a BDR-VMAF of 10.65 % for the JVET set and of 5.07 % for the UVG set, which is significantly less than for HEVC with over 30 % (c.f. Tab. V). The energy demand is reduced by -34.25 % (BDDE-VMAF) on average for the JVET set and by -42.49 % for the UVG set. Furthermore, we observe from the results of Table V that the energy demand of the EE profile is  TABLE V  EVALUATION OF THE TWO PROPOSED CODING TOOL PROFILES IN COMPARISON TO THE CTC CODING TOOL PROFILES, WHICH ARE ENCODED WITH  VTM. FURTHERMORE, THE VTM CODED BIT STREAMS ARE COMPARED TO THE CTC HM CODED BIT STREAMS. FOR ALL, THE BDR AND BDDE IS  CALCULATED WITH BOTH PSNR AND VMAF AS A VISUAL QUALITY METRIC AND THE VTM CTC PROFILE AS A comparable to the HM decoder. However, the bit rate increase is significantly lower on average than for HEVC.

With the EBE profile, the additional bit rate for the JVET set is reduced to 1.41 % (BDDR-VMAF). Therefore, we have a similar compression performance with respect to the CTC profile of VVC. For the UVG set, which has a BDR-VMAF value of -1.13 %, the bit rate with respect to the CTC profile is even improved, if the same VMAF quality score is targeted. For the energy efficiency, the EBE profile has a BDDE-VMAF value of -32.11 % for the JVET set and of -36.59 % for the UVG set.

In Figure 5 , the compression and energy efficiency for both proposed coding tool profiles and both sequence sets are illustrated. The horizontal axis in each plot shows the BDR-VMAF value and the vertical axis the BDDE-VMAF value. In the plots, a lower BDDE-VMAF value corresponds to a higher energy efficiency and a lower BDR-VMAF value correspond to a higher compression efficiency, which means that a profile at the bottom left corner is desirable. In the figure, the diamond-shaped marker corresponds to the EE profile, the asterisk-shaped markers to the EBE profile, the plus-shaped markers to the HM CTC profile, and the circle-shaped markers to the VTM CTC profile, which is the reference for the calculation of the BD metrics. The blue markers correspond to the JVET set and the green markers to the UVG set.

For both Pareto curves, we determine that the EE and the EBE profile in Figure 5 (a) reduce the bit rate significantly in relation to the HM CTC profile. For the UVG set, the energy efficiency of the EE profile is slightly higher than the HM CTC profile and the compression efficiency is higher than for the VTM CTC profile.

2) Lowdelay B: Similar to AI coding, the energy demand of LB can be reduced by about 42 % (c.f. terms of BDR-VMAF and the energy demand is reduced by -35.61 % (BDDE-VMAF). In Figure 5 (b), we determine that the energy efficiency of the EE profile is significantly higher than of the HM CTC profile for the UVG set. For the EBE profile, the energy efficiency is slightly less than the HM CTC profile for both video sequence sets.

3) Randomaccess: Strikingly, the BDDE-PSNR and BDDE-VMAF values of HM are outperformed by the RA EE profile for both sequence sets. This is shown in Figure 5 (c) by both Pareto curves, which are below the markers of the HM CTC profile. Furthermore, the energy efficiency of the EBE profile for the UVG set is similar to the HM CTC profile.

For the EE profile, we measured a BDDE-VMAF value of -48.79 % for the JVET set and of -50.03 % for the UVG set, which means that the energy demand of those sets can be halved in relation to the CTC coded bit streams. In particular, the bit streams of class A1, A2, UVG1, UVG3, and B have a BDDE-VMAF value of under -50 % (c.f. Table V) . Therefore, we observe the highest energy demand reduction for sequences with a very high resolution. From Figure 5 , we observe that both the energy and compression efficiency are improved by the EE profile in relation to HEVC. For the EBE profile for RA coding, we show that the energy demand can be reduced by approximately -40 %. In terms of bit rate, the BDR-VMAF value for the JVET set is 14.04 % and for the UVG set 11.08 %.

For class F, which comprises screen-captured or computergenerated content, we can determine that both for RA and LB the two proposed coding tool profiles have the lowest energy savings compared to other classes. This can be explained by the different characteristics of the sequences. Therefore, several coding tools have been introduced in VVC, such as palette mode, adaptive color transform, and transform skip residual coding, to improve the compression efficiency of screen content in addition to IBC and BDPCM [4] . In the future, the influence of these types of sequences can be studied in more detail.

In general, we observe from our evaluation that both BD metrics with VMAF are improved by a higher degree than with PSNR, which is shown by a lower BD value. Since VMAF has a higher correlation to the subjective visual quality than PSNR, this is desirable.

Besides the decoder, we also evaluated the influence of both proposed profiles on the encoding time. For the sequence MarketPlace, we encoded the first 65 frames with RA coding. To evaluate the encoding time demand, we use the metric Bjøntegaard-Delta encoding time (BDET) with PSNR as objective quality metric. As a reference for the calculation of BDET-PSNR and BDR-PSNR, we use the VTM CTC profile. For the EBE profile, we measure a BDET-PSNR of -42. 43 For AI coding, the energy demand of openHEVC is lower than for both proposed profiles. For the EE profile, we observe a BDDE-VMAF value of -38.37 % for the JVET set and of -47.55 % for the UVG set. For LB coding, the EE profile has a higher decoding energy demand reduction than openHEVC for HD sequences (class B and UVG3). For RA coding, the energy demand of the UVG set can be reduced in relation to openHEVC. Therefore, the compression and energy efficiency of sequences with a high resolution is improved for RA and LB coding compared to HEVC. For the EBE profile, the energy demand is similar to openHEVC for RA and LB coding, but the compression efficiency is increased. Therefore, when using our proposed EE profile, VVdeC outperforms openHEVC both in terms of energy and compression efficiency for the UVG set.

For the ARM-CPU, we show the results of measurements on the Raspberry Pi in Table VII . The evaluation incorporates the coding configurations LB and RA, which are commonly used in streaming applications. Similar to Table VI, the CTC coded bit streams decoded by VVdeC are the reference for the BDDE-PSNR and BDDE-VMAF calculations. Comparing openHEVC with VVdeC, for both coding configurations and video sequence sets, the energy savings are significantly higher on the ARM architecture, e.g, the JVET set has a BDDE-VMAF value of -75.60 % for RA (for x86: -45.97 %). This can be explained by the fact that VVdeC is mainly optimized for the x86-architecture [24] . For the EBE profile, the energy savings are similar to the x86-CPU. For RA coding, the BDDE-VMAF value is -23.80 % for the JVET set and -20.43 % for the UVG set (for x86: -25.53 % and -23.63 %, respectively). For the EE profile, the energy savings on the ARM-CPU are significantly higher for both coding configurations. For RA, the JVET set has a BDDE-value of -59.66 % and the UVG set of -49.85 %. Based on the findings in Section V-B, where we concluded that the usage of the coding tool ALF leads to a major decrease in energy efficiency, we identify this coding tool as a reason for the higher energy savings on the ARM-CPU. The coding tool ALF is enabled for both EBE profiles and disabled for both EE profiles. Therefore, it can be assumed that the tool ALF is not yet fully optimized in VVdeC for the ARM architecture.

In this work, we propose a novel approach to optimize the energy demand of the VVC decoder with a greedy strategy based DSE. We show that the approach determines the optimal coding tool profile in terms of energy efficiency for a subset of features. Based on the algorithm, we derived multiple coding tool profiles that improve the energy efficiency of VVC significantly. We showed that the energy demand of the EE profiles can reduce the energy demand compared to the CTC coded VTM bit streams by over 50 %. By evaluating two decoder implementations of VVC and HEVC, we found that by using our proposed coding tool profiles for LB and RA encoding, we have a lower energy demand for VVC decoding than HEVC decoding. Furthermore, we show that the complexity to determine such coding tool profiles can be reduced significantly.

Additionally, we analyzed the energy efficiency of two decoder implementations of VVC. For VTM, we determined that the energy demand increases by 60 % to 80 % compared to the HM decoder. For VVdeC, the energy demand is increased by 30 % up to 180 %, depending on the coding configuration and the resolution of the sequences.

From a tool sensitivity analysis, we determined that coding tools can be assigned to different categories, which evaluate the decoder's energy efficiency. For in-loop filter coding tools, we saw that the energy efficiency is decreased. However, for several coding tools (e.g., GPM), we saw that the energy efficiency of the decoder is improved.

In future work, we will study if the energy efficiency of the proposed profiles can be enhanced by DERDO [14] . Also, block partitioning will be studied that might have further potential to reduce the decoder complexity. Furthermore, the influence of the proposed profiles on the encoding process will be analyzed in detail.

The DSE algorithm can also be incorporated into the convex hull video encoding framework that is proposed in [43] . For video streaming applications, adaptive bit rate (ABR) streaming is commonly used, which means that several encodings are processed with different quality levels and resolutions to optimize the video quality adaptively to the network connection speed. In [43] , the authors propose a method that incorporates encodings with multiple QPs and resolutions to determine optimal encoding parameters for a given target quality or bit rate. However, with our proposed DSE, new coding tool profiles can be derived that optimize for the encoding complexity.

Finally, the complexity of other state-of-the-art video codecs such as AV1 can be studied.

Cisco visual networking index: Forecast and trends

Climate crisis: The unsustainable use of online video. The practical case for digital sobriety

Power modeling for virtual reality video playback applications

Overview of the versatile video coding (VVC) standard and its applications

A comparative analysis of the time and energy demand of versatile video coding and high efficiency video coding reference decoders

Power modeling for video streaming applications on mobile devices

HEVC hardware vs software decoding: An objective energy consumption analysis and comparison

Algorithmic-level approximate computing applied to energy efficient HEVC decoding

Saliency-guided complexity control for HEVC decoding

Subjective-quality-optimized complexity control for HEVC decoding

A modified HEVC decoder for low power decoding

A feature based complexity model for decoder complexity optimized HEVC video encoding

OTED: Encoding optimization technique targeting energy-efficient HEVC decoding

Decoding-energy-rate-distortion optimization for video coding

Modeling the energy consumption of the HEVC decoding process

Extending video decoding energy models for 360 • and HDR video formats in HEVC

Decoding energy modeling for versatile video coding

Modeling power consumption for video decoding on mobile platform and its application to powerrate constrained streaming

Power-efficient video streaming on mobile devices using optimal spatial scaling

Rate-complexity-distortion optimization for hybrid video coding

Rate-distortioncomplexity optimized coding mode decision for HEVC

Lowcomplexity CTU partition structure decision and fast intra mode decision for Versatile Video Coding

Block partitioning structure in the VVC standard

Fraunhofer versatile video decoder (VVdeC)

Overview of the high efficiency video coding (HEVC) standard

Algorithm description for versatile video coding and test model 8 (VTM 8)

Joint Video Exploration Team (JVET)

Developments in international video coding standardization after AVC, with an overview of Versatile Video Coding (VVC)

Analysis and exploitation of CTU-level parallelism in the HEVC mode decision process using actor-based modeling

Calculation of average PSNR differences between RD curves

RAPL: Memory power estimation and capping

Joint Collaborative Team on Video Coding (JCT-VC). HEVC test model reference software

Towards a live software decoder implementation for the upcoming versatile video coding (VVC) codec

JVET common test conditions and software reference configurations for SDR video

UVG dataset: 50/120fps 4K sequences for video codec analysis and development

Working practices using objective metrics for evaluation of video coding efficiency experiments

Toward a practical perceptual video quality metric

VMAF -video multi-method assessment fusion

Tunable VVC frame partitioning based on lightweight machine learning

20th Meeting of ITU-T/ISO/IEC Joint Video Experts Team (JVET), Teleconference, document, JVET-T0013-v2

Low-complexity geometric inter-prediction for versatile video coding

Encoding parameters prediction for convex hull video encoding

During his studies, he worked as a student assistant on modeling and optimizing the energy demand of video decoders from

Member, IEEE) received the Dipl.-Ing. in electrical engineering and information technology in 2011 and the Dipl

he worked as a PostDoc-Fellow atÉcole de technologie supérieure in collaboration with Summit Tech Multimedia, Montréal, Canada on energy efficient VR technologies. Since 2019, he is with Friedrich-Alexander University Erlangen-Nürnberg as a senior scientist

Since 2001, he has been a Full Professor and the Head of the Chair of Multimedia Communications and Signal Processing at Friedrich-Alexander University Erlangen-Nuremberg (FAU), Germany. From 2005 to 2007 he was Vice Speaker of the DFG Collaborative Research Center 603. From 2015 to 2017, he served as the Head of the Department of Electrical Engineering and the Vice Dean of the Faculty of Engineering at FAU. He has authored around 400 journal and conference papers and has over 120 patents granted or pending

He was a Siemens Inventor of the Year 1998 and received the 1999 ITG Award and several IEEE Best Paper Awards. His group won the Grand Video Compression Challenge from the Picture Coding Symposium 2013. The Faculty of Engineering with FAU and the State of Bavaria honored him with Teaching Awards