Design of a High Speed Architecture of MQ-Coder for JPEG2000 on FPGA (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 165 | P a g e www.ijacsa.thesai.org Design of a High Speed Architecture of MQ-Coder for JPEG2000 on FPGA Taoufik Salem Saidani Department of Computer Sciences Faculty of Computing & Information Technology Northern Border University, Rafha, Saudi Arabia Hafedh Mahmoud Zayani Department of Information System Faculty of Computing & Information Technology Northern Border University, Rafha, Saudi Arabia Abstract—Digital imaging is omnipresent today. In many areas, digitized images replace their analog ancestors such as photographs or X-rays. The world of multimedia makes extensive use of image transfer and storage. The volume of these files is very high and the need to develop compression algorithms to reduce the size of these files has been felt. The JPEG committee has developed a new standard in image compression that now also has the status of Standard International: JPEG 2000. The main advantage of this new standard is its adaptability. Whatever the target application, whatever resources or available bandwidth, JPEG 2000 will adapt optimally. However, this flexibility has a price: the JPEG2000 perplexity is far superior to that of JPEG. This increased complexity can cause problems in applications with real-time constraints. In such cases, the use of a hardware implementation is necessary. In this context, the objective of this paper is the realization of a JPEG2000 encoder architecture satisfying real-time constraints. The proposed architecture will be implemented using programmable chips (FPGA) to ensure its effectiveness in real time. Optimization of renormalization module and byte-out module are described in this paper. Besides, the reduction in computational steps effectively minimizes the time delay and hence the high operating frequency. The design was implemented targeting a Xilinx Virtex 6 and an Altera Stratix FPGAs. Experimental results show that the proposed hardware architecture achieves real-time compression on video sequences on 35 fps at HDTV resolution. Keywords—MQ-Coder; High speed architecture; FPGA; JPEG2000; VHDL I. INTRODUCTION The current development of computer networks and the dramatic increase in the speed of processors reveal many new potentialities for digital imaging. Whether in the medical, commercial or military field, new applications are emerging each with its specificities. The JPEG Group has developed a new, more flexible and better image encoding standard: JPEG2000 [1]. It is built around a wide range of image compression and display tools. This makes the algorithm appealing to many applications, whether for Internet broadcasting, medical imaging or digital photography [2]. The main JPEG2000 coding steps are shown in Fig. 1. Several features are available for encoding, such as progressive quality and/or resolution reconstruction, fast random access to compressed image data, and the ability to encode different regions of the image called regions of interest (ROI). Fig. 1. Overview of JPEG2000 coding process. The JPEG2000 standard can be broken down into several successive blocks. The original image is cut into tiles after the component transformation. All the tiles are then transformed into wavelets (transformation with or without loss), independently of each other [3]. The wavelets used in the JPEG2000 standard are bi-orthogonal, that is to say different wavelets are used for decomposition and reconstruction. Two types of bi-orthogonal wavelets are used: wavelets of Daubechies 9/7 and Le Gall 5/3 [4], [5]. These two wavelets are chosen according to the type of compression desired, lossless or lossy. Le Gall 5/3 wavelets used to perform a reversible transform are used for lossless compression. The wavelets of Daubechies 9/7 allowing realizing a reversible transform are used only for lossy compression. The coefficients of the block-code undergo quantization and the quantized coefficients are decomposed into bit planes. The quantification minimizes the number of bits necessary for coding the supplied coefficients of the preceding block, by retaining only the minimum number of bits making it possible to obtain a certain quality level [6], [7]. Based on the wavelet decomposition technique, JPEG2000 is very different from previous standards and has many advantages that will allow it to be adopted in a wide range of applications, or even to be extended to video encoding. In contrast, this type of compression requires much more computational power than the original JPEG process, which makes software implementations irrelevant when very fast processing is required. Fig. 2 shows the comparison between JPEG and JPEG2000 in terms of performance. We note that the performance of JPEG2000 is greater than those of JPEG standard. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 166 | P a g e www.ijacsa.thesai.org Fig. 2. Comparison between JPEG and JPEG2000 in terms of performance. Due to many successive processes, JPEG2000 requires more computing power to achieve encoding and decoding speeds similar to JPEG. A hardware solution is therefore indispensable for fast applications. The JPEG2000 image compression standard was created to meet the new requirements arising from the diversification of applications in the multimedia field. The many features that it offers bring a new breath to this sector. However, they have led to an increase in the complexity of the algorithm compared with existing standards. Faced with this complexity, a hardware encoder is the solution that allows satisfying the real-time constraints of certain applications. This paper presents an FPGA-based accelerator core for JPEG2000 encoding. Comparison with various FPGA implementations is provided. Contributions in this work are listed as follows: 1) The proposed high speed efficient MQ-coder architecture modifies the probability estimation (Qe) representation to minimize the memory consumption. The modification in probability estimation reduces the bitwise representation to 13 bit. 2) Due to the less memory occupation, the time and power required for the hardware-based JPEG2000 compression are reduced. Thereby, the operating speed is improved (more operating frequency) with the help of proposed MQ encoder for real time image processing. 3) The minimization in bitwise representation in proposed architecture of MQ coder reduces the count of memory elements to (32 9 13) 416 that leads to the preservation of silicon (Si) area further in the compact chip development. 4) The optimization of Renormalization and Byteout modules help speeding up the proposed architecture. The remainder of this paper is decomposed into six sections. After the introduction, Section 2 details the JPEG2000 MQ encoder. Previously proposed hardware architectures for MQ- coder are described in Section 3. Section 4 describes the proposed hardware architecture of MQ coder. In Section 5, experiments and results are detailed. Finally, this paper is concluded in Section 6. II. JPEG2000 MQ-CODER The arithmetic coder used in JPEG2000, called the MQ encoder, takes as inputs the binary values D and the associated contexts CX resulting from the preceding step of binary modeling of the coefficients, and this in the order of the coding passes. Fig. 3 shows the arithmetic encoder inputs and outputs. Fig. 3. The inputs and outputs of MQ-Coder. Rather than representing the intervals associated with the probabilities of ―0‖ or ―1‖, it was chosen to represent the data using the LPS (Less Probable Symbol) and MPS (More Probable Symbol) symbols, respectively representing the probabilities occurrence of the minority and majority species. Obviously, it is necessary to keep track of the meaning attributed to one or the other of the variables ―0‖ or ―1‖ is the minority species. Thus the current interval is represented by the interval 1, which is then divided into two sub-intervals corresponding to the minority and majority species. From a representation point of view, LPS is always given as a lower interval. Each binary decision, represented by a bit, is divided recursively. The divisions are made to estimate the probability of Elias: MPS and LPS. The binary sequence from the MQ is divided into a number of packets. Each of them contains the bit-stream corresponding to the same component, the same resolution level, the same quality layer and the same spatial zone of the resolution level. The spatial areas of each resolution level are called precincts. Each of the packets is preceded by a header containing information allowing identifying very precisely the data conveyed by this packet. Four different progress orders are defined in JPEG 2000. They make it possible, during the decoding, to obtain in priority either the data of the same component, or those of the same resolution level, or those of the same quality layer, or those of the same spatial zone of the image. In JPEG2000, the realization of the arithmetic coder is performed by means of an index table. The table represents the LPS probability estimate (Qe). For each input pair (decision, context), we look for the most probable symbol in a variable containing the different states. As each state is represented in the index table, the context can be associated with the index of the table. On its side, the decoder has the index replica of the table, which makes it possible to carry out the decoding. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 167 | P a g e www.ijacsa.thesai.org The Finite state Machine (FSM) with 47 states defines the Probability Estimation Table structure clearly. The number of calculations to obtain the coding information and the utilization of resources are high that degrade the hardware performance. The hardware modeling of MQ-coder contains the following limitations: 5) Low clock frequency. 6) Consumption of LUTs and registers is more. 7) High hardware resource requirements. A large number of context and decision pairs in MQ encoder shift the parallel operation into a serial operation; such a new architecture is called high-speed MQ encoder architecture. The storage of transformed coefficients in code block consumes more registers that lead to large flip-flop (FF) requirement. Hence, the reduction in code block based on the context pair probability estimation reduces the number of lookup tables (LUTs) and slice registers that leads to less memory consumption. The motivation behind the research work proposed in this paper is the reduction in memory, time and power consumption by reducing the size of the bitwise representation. III. RELATED WORKS The main advantage of JPEG 2000 was to combine most of these qualities, allowing using it in a very wide range of applications. This flexibility, coupled with a very high compression efficiency, unfortunately has a price. Moreover, some applications have real-time aspects which impose very high flow constraints. An architecture composed of three stages is proposed by Mei et al. in [5]. When implemented on an APEX20K FPGA board, it operates with 37.27 MHz. Indeed in this architecture, if the state MPS occurs then two symbols will be coded simultaneously, if not a single symbol will be coded. Shi et al. [8] proposed a MQ-coder hardware core that allows treating two symbols. Indeed, this architecture is based on the following hypothesis: a maximum of two offsets occurs when there is a renormalization operation. The architecture proposed in [9] is composed by three blocks. The first block is responsible for initializing register A at 0x8000, register C at 0, table MPS (Cx) at 0 and index table in ILT RAM, and perform all arithmetic operations. The second block is used to shift the registers A and the register C and to decrement the counter CT by 1. If CT = 0, the third block will be activated and the register B will be emitted as compressed data. The proposed architecture was implemented on a Startix FPGA and works at a frequency of 83.271 MHz. The complexity of the JPEG 2000 algorithm is a problem for these real-time applications. In [10], the author indicates that in view of current technology, it is not possible for purely software implementations to respect the constraints imposed by these real-time applications. This is the reason why a growing number of companies and researchers are interested in (partially) hardware-related achievements of the standard, in which the computing resources have been optimized and the memory requirements reduced. Below we give an overview of the hardware achievements to date and the results obtained. As part of the PRIAM project, Thales Communications has developed an implementation of a JPEG 2000 encoder on an MPC74XX processor. This is studied in [11]. The MPC74XX processor is based on a PowerPC architecture (RISC type processor) to which is added a vector calculation unit called AltiVec. This allows multiple data sets to be processed in parallel in a single instruction. Unlike the other blocks in the decoding chain, the entropy coder, due to its non-systematic behavior, is complex to optimize by means of vectorial instructions. This achievement gives overall very good results, but the entropy coder, requiring 400 cycles per 8-bit pixel, is truly the ―bottleneck‖ of the system. Bonaldi [12] has been working on the creation of a mixed software-hardware encoder. The medium used is the ARM- VIRTEX card of the DICE unit. An input rate of 6.6 Mbps for the entropy encoder is especially supported. Moreover, everything concerning the formation of the bit-stream is carried out in software, on the ARM. This approach of Co- Design is very judicious and is moreover widely supported by the literature. The Amphion company offers an ASIC encoder-decoder available since 2003 [13]. Amphion announces speeds of 480 Mbps at encoding and 160 Mbps at decoding. This embodiment has interesting characteristics, such as the few constraints on the format of the input images, a division of tasks between hardware and software and an architecture compatible with the AMBA bus, which allows easy integration into other systems. Analog devices [14] offer the ADV-JP2000. This circuit operates at a maximum 20 MHz frequency including a 5/3 wavelet transform (no 9/7) and an entropy encoder. The circuit is not fully compliant with the standard. The ADV-JP2000 offers two modes of operation: encodes and decodes. In the encode mode it accepts a single tile and generates the stream of code-blocks conforming to the standard. The ADV-JP2000 communicates via an asynchronous protocol but also allows an interrupt mode. Finally, the circuit supports the DMA mode. Zhang et al. [15] proposed an architecture composed of four stages and three parts (P1, P2, P3). Indeed, P1 is implemented in Stage 1 to determine the new value of Qe when A < (0x8000). The P2 is called in Stage 2 and Stage 3, because the latter updates the Reg A and Reg C and also to perform the arithmetic operations and the offset operations. Finally, the P3 is used in Stage 4 to realize the bit stuffing when the counter CT is equal to 0. The processing frequency of this architecture was 110 MHz on an Altera FPGA card. IV. PROPOSED ARCHITECHTURE The proposed architecture of the encoder is shown by the block diagram of Fig. 4. The pairs (C, D) are received by the MQ coder as input and a sequence of bytes called ByteOutReg are provided as output. This architecture consists of two parts: the part of the prediction of the probability of the symbol to be (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 168 | P a g e www.ijacsa.thesai.org coded composed of 2 RAMs (ICX, MPS) and 4 ROMs (NMPS, NLPS, Switch, Qe), and the coding part which is composed of a state machine. The four ROMs are not updated during coding operation. The pairs (CX, D) are first sequentially read. Subsequently, the CX context will be transmitted over the bus address of the ICX RAM and the MPS RAM. Then the value of I(CX) and MPS(CX) will be read. Then the I(CX) index will be delivered to the four ROMs. The mps_D will be executed with signal D, which causes the LPS_en signal to be generated. If this signal is equal to one then the CODELPS state will be carried out otherwise the CODEMPS state will take place. The updating of the ICX RAM depends essentially on the signal Ren_out. Indeed this signal will set to one if the renormalization is carried out. However, the MPS RAM will update if the LPS_SW signal is equal to one. The Probability estimation architecture is shown in Fig. 5. Fig. 4. MQ-Coder architecture. Fig. 5. Probability estimation architecture. We are then interested in the coding part to manage the process of the coding by a machine of finite states by substituting the various sub-algorithms by states. The outputs depend on the current state and the inputs and react directly to changes in inputs. Fourteen states have been set up in order to describe the MQ encoder process. Fig. 6 shows the MQ-Coder state machine. The states used in this state machine are as follows: Fig. 6. MQ-Coder State machine. 8) Repos: This state essentially depends on the input ―go‖, if go = 1 it switches to the state INITENC otherwise it remains in the same state. 9) Initenc: In this state, register Reg_A at (0x8000), register Reg_C at 0 and counter CT at 12 are initialized. Then the MPS RAM is filled with 0s and the RAM of the indexes by the 19 possible values of the context CX. Then you will automatically play (Read). The index/probability tables should be presented in the memory before coding begins. 10) Read: In this state, the context is read to deduce the value of the corresponding MPS (Cx) according to the table initialized in INITENC. We also read Decision D. 11) CODAGE: If Decision D = MPS (Cx), it switches to the CODEMPS state otherwise it goes to the CODELPS state. 12) CODEMPS: The register Reg_A is adjusted to Qe_reg. Then Reg_A is compared to (0x8000). If Reg_A is less than (0x8000), the index will be updated according to the NMPS table of the context index and the Renorme state will be used. If register Reg_A is greater than (0x8000), we add (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 169 | P a g e www.ijacsa.thesai.org the probability Qe_reg to the register Reg_C and we will pass to the Finis state. 13) Finis: In this state if the even number (number of pairs (CX, D) lu) is equal to 256 then we pass to the flush state, otherwise the next state will be Read. 14) CODELPS: In this state, register Reg_A to Reg_A- Qe_reg is adjusted. Then, if Reg_A is greater than Qe_reg then the register Reg_A takes the value of Qe_reg, otherwise we add the probability Qe_reg to the register Reg_C. The condition of inversion of the intervals is always checked. If a SWITCH is required, the direction of the MPS will be reversed. The index will take a new value according to the NLPS table and it will change to the Renorme state. 15) Renorme: The contents of Reg_A and Reg_C will be replaced by a simple left shift. This shift repeats until the value of Reg_A is raised above (0x8000). The counter CT containing the number of shifts of Reg_A and Reg_C will then be decremented at each offset. When the counter CT reaches 0 (CT was initially at 13, i.e., 13 left offsets were made at Reg_A and Reg_C), it will pass to byteout1 and if Reg_A is still less than (0 x 8000), we will return to the Renorme state as soon as we have finished with byteout. The optimization of the Renormalization procedure is presented in Fig. 7. 16) Byteout1: This state can be called in two states either in the Renorme state when the shift counter CT becomes equal to zero, or also at the end of the coding when the registers flush. The optimization of the Byteout procedure is presented in Fig. 8. 17) FLUSH: This is the state we reach towards the end of the encoding (in our case if nbpair = 256). The FLUSH procedure contains two calls to Byteout1 and two calls to Setbit; hence, the idea of subdividing it into three states: the first is flush_1 which ends with a call to byteout1, the second is flush_2 (same principle of fulsh_1) and the third is flush_3. a) Flush_1: This state contains two sub-states, the first is the Setbit, the second is byteout1. First we make a call to the state Setbit then we apply an offset to the register Reg_C, then we make a call to byteout1 and we end by making a call to flush_2. i) Setbit: In this state, the Reg_C register shift is automatically changed to byteout1, regardless of the Reg_C register value (i.e., Reg_C is lower or higher than TEMPC). b) Fluch_2: This state has the same principle of the state flush_1 but this time we pass to state flush_3. c) Flush_3: This state contains the end of the flush, when the first end marker 0xFF has to be inserted. The next state will be rest. Fig. 7. (a) Original RENORME architecture (b) Optimized RENORME architecture. Fig. 8. (a) Original BYTEOUT architecture (b) Optimized BYTEOUT architecture. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 170 | P a g e www.ijacsa.thesai.org V. EXPERIMENTAL RESULTS A. Simulation Simulation of the proposed design, using VHDL HDL, is carried out with Mentor graphic. ByteOutReg bytes coincide with those in column B and the parameters IDCX, MPS, Qe_reg, Reg_A, Reg_C and Reg_CT have evolved appropriately. This result has been well verified and we have chosen to take a sequence to visualize it in simulation and explain it in parallel. For the sake of clarity in Fig. 9, we have chosen to display some signals in the simulation flow that are the following: the compressed data Byteout_Reg, the index IDCX, the counter Reg_CT, the probability Qe_reg and states. Table 1 summarizes the simulation results of the MQ encoder: either from decision n° 28 to decision n° 34. B. Synthesis Results Implementation of the proposed design was made on Xilinx Virtex Family Platforms: XC6SLX75T, XC5LX30T and XC4VLX80 devices. We have used the Xilinx ISE tools version 14.1 .The synthesis results of the architecture is shown in Table 2. The proposed MQ encoder design gives the best result, in terms of hardware resources such as (the number of LUTs consumed, slices and Flip-Flop) and frequency of operation when implemented on a platform Virtex 6. Concerning the frequencies obtained, we note that our architecture meets the criteria real-time. The design has a maximum frequency of 423.2MHz on the Virtex 6 (XC6SLX75T) device. C. Comparison A comparative study with other existing designs in the literature has been made. The Virtex 4 XC4VFX140 platform is used for this comparison. The performance comparison of our design with the architecture proposed in [16] is shown Table 3. Our proposed design codes frames in real time at a frequency of 244.475 MHz and requires only 455 slices. The throughput of some architecture of MQ coders compared with our proposed architecture is presented in Table 4. It is calculated from the reported symbol consumption rate and operating frequency. It is found that our architecture encodes frames with a frequency 3.31 and 2.29 times higher than that of architecture [16] and [17] respectively. Table 5 shows the comparisons of logic area, memory requirement, and estimated memory area of several previous works [5], [7], [15]-[20]. The total area of the proposed architecture is less than that of each previous work. However, the hardware cost of the word based architecture is larger than the proposed architecture. The proposed design can code 40 frames per second for high definition TV of 1920p at 254.84 Mhz on Stratix II. Fig. 9. Wave simulation for MQ coder architecture. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 171 | P a g e www.ijacsa.thesai.org TABLE. I. EXAMPLE EXTRACTED FROM THE SIMULATION OF THE MQ ENCODER Symbole D IDCX Reg_A Reg_C CT Byteout_Reg 10 1 28 0xE80E 0x000C6D52 7 0x84 11 0 26 0x9008 0x00636A90 4 0x84 12 0 27 0xF40E 0x00C70122 3 0x84 13 0 27 0xE00D 0x00C71523 3 0x84 14 1 27 0xCC0C 0x00C72924 3 0xC7 25 0 25 0xA008 0x00014920 8 0xC7 26 0 25 0x8807 0x00016121 8 0xC7 TABLE. II. RESULTS OF SYNTHESIS Used Platform XC6SLX75T XC5VLX50T XC4VLX80 Maximum Frequency (MHz) 423.2 336,304 264.2 No. of 4 input LUTs 540/343680 523/28800 766/71680 Total used slices 251/687360 247/28800 396/35840 Total FF slices 177/693 176/659 247/71680 TABLE. III. PERFORMANCE COMPARISON Used FPGA XC4VFX140 Architecture Proposed Architecture [15] Max. Frequency (MHz) 244.475 185.43 Used slices 455 495 Used FF slices 292 392 Used 4 input LUTs 865 893 Used BRAMs 1 2 TABLE. IV. THE THROUGHPUT OF SOME DESIGNS TESTED ON VIRTEX 4 X4VFX140 Used FPGA XC4VFX140 (Virtex4) Design Number of pairs Frequency (Mhz) Throughput (MS/s) Design [7] 2 50.1 100.2 Design [21] 1 185.43 185.43 Design [18] 1.23 53.92 66.38 Design [19] 2 48.3 96.6 Proposed 1 244.475 244.475 VI. CONCLUSION This paper discussed the problems in the real-time implementation of FPGA-based MQ coder architecture. The MQcoder utilization in both encoding and decoding stages performs the probability estimation of coefficients and optimization of Renorme and Byteout modules. The increase in computational overhead required more power and energy consumption. This paper provides the reduction in the bitwise computation to reduce the number of computational steps. The minimization in computational steps decrease the power and time delay. The proposed PET architecture reduced the bitwise representation from 13 bit to 12 bit that provided the reduction in memory elements from 416 to 348 compared to the existing MQ-coder architecture. Therefore, the size of PET ROM is 1376 bits. An embedded architecture of MQ Coder for JPEG2000 is designed and implemented in this paper. The implementations carried out during this work allowed us to know that the proposed architecture of the MQ encoder operates with a frequency of 423.2 MHz on Virtex6 XC6SLX4 device and that it can code 40 frames per second for the high-definition TV application. The proposed architecture is easily expandable to 2048×1080 resolution video at 45 fps. It can be used in several applications such as Internet broadcasting, medical imaging and digital photography. Moreover, the processing time was improved by about 13.6% in comparison with well-known architectures from literature. (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 6, 2017 172 | P a g e www.ijacsa.thesai.org ACKNOWLDGMENTS The authors wish to acknowledge the approval and the support of this research study by the grant N° CIT-2016-1-6- F-5718 from the Deanship of the Scientific Research in Northern Border University, Arar, KSA. TABLE I. COMPARISON WITH OTHER MQ CODER ARCHITECTURES Architecture FAGA family Device used Clk (MHz) No. of LEs Symbol/Clk Throughput (MS/s) [5] APEX20K EP20K600EFC672-3. 37.27 1256 2 74.54 [7] Stratix N/A 50.10 1596 2 100.2 [15] Stratix N/A 40.53 12649 2 81.6 [20] Stratix II EP2S15F484C3 106.2 1321 2 212.4 [22] APEX20K EP20K1000EFC672-1X. 9.25 14711 1 9.25 [23] Stratix N/A 27.05 761 1 57.05 [24] Stratix N/A 106.02 1267 2 210 [16] Stratix EP2S90F1020I4. 58.56 1488 2 117 [17] Stratix EP1S10B672C6. 145.9 824 1 145.9 Proposed Stratix II EP2S15F484C3 254.84 603 1 254.84 REFERENCES [1] JPEG 2000 image coding system, ISO/IEC International Standard 15444-1. ITU Recommendation T.800, (2000). [2] D. S. Taubman and M. W. Marcellin. JPEG2000 Image Compression Fundamentals, Standards, and Practice (2002). [3] T. Acharya and P. Tsai, JPEG2000 Standard for Image Compression: Concepts, Algorithms and VLSI Architectures, J. Wiley & sons (2005). [4] JASPER Software Reference Manual, ISO/IEC/JTC1/SC29/WG1N2415. [5] K. Mei, N. Zheng, C. Huang, Y. Liu, Q. Zeng, VLSI design of a high- speed and area-efficient JPEG 2000 encoder, IEEE Transactions on Circuits and Systems for Video Technology 17 (8) (2007) 1065–1078. [6] Horrigue, L., Saidani, T., Ghodhbani, R., Dubois, J., Miteran, J.,Atri, M.: An efficient hardware implementation of MQ decoder of the JPEG2000. Microprocess. Microsyst. 38, 659–668 (2014) [7] L. Liu, N. Chen, H. Meng, L. Zhang, Z. Wang, H. Chen, A VLSI architecture of JPEG 2000 encoder, IEEE Journal of Solid-State Circuits 39 (11) (2004) 2032–2040. [8] Jiangyi Shi, Jie Pang, Zhixiong D Yunsong Li. A Novel Implementation of JPEG2000 MQ-Coder Based on Prediction, International Symposium on Distributed Computing and Applications to Business, Engineering and Science, 2011. pp:179-182. [9] Kishor Sarawadekar and Swapna Banerjee, VLSI design of memory- efficient, high-speed baseline MQ coder for JPEG 2000, Integration, the VLSI Journal, Elseiver. Vol 45, January 2012, Pages 1-8. DOI: 10.1016/j.vlsi.2011.07.004. [10] J. Hunter. Digital cinema reels from motion JPEG 2000 advances, janvier 2003. http ://www.eetimes.com/story/OEG20030106S0034. [11] C. Le Barz and D. Nicholson. Real time implementationof JPEG 2000 . june 2002. [12] C. Bonaldi and Y. Renard. Conception et réealisation d’un codeur JPEG 2000 sur une carte Virtex-ARM . Laboratoire de Microélectronique (DICE), UCL, june 2001. [13] Amphion. CS6590 JPEG 2000 codec preliminary product brief , October 2002. http ://www.amphion.com. [14] D.Taubman and M.W.Marcellin, JPEG2000 – Image Compression Fundamentals, Standards and Practice, Kluwer Academic Publishers, Nov. 2001. [15] K. Liu, Y. Zhou, Y. Song Li, J.F. Ma, A high performance MQ encoder architecture in JPEG2000, Integration, the VLSI Journal 43 (3) (2010) 305–317. [16] P. Zhou, Z. Bao-jun, High-throughout hardware architecture of MQ arithmetic coder, in: 10th IEEE International Conference on Signal Processing (ICSP), 2010. [17] K. Sarawadekar, S. Banerjee, An Efficient Pass-Parallel Architecture for Embedded Block Coder in JPEG 2000 . IEEE Trans. Circuits Systems. Video Technology, 22 (6) (2011) 825-836. [18] Michael Dyer, David Taubman and Saeid Nooshabadi. Concurrency Techniques for Arithmetic Coding in JPEG2000. IEEE Transactions on Circuits and Systems for Video Technology, 2006, vol.53, pp. 1203– 1213. [19] Kishor Sarawadekar and Swapna Banerjee, ―LOW-COST, HIGHPERFORMANCE VLSI DESIGN OF AN MQ CODER FOR JPEG 2000‖ ICSP2010, 2010, pp.397-400. [20] Nandini Ramesh Kumar · Wei Xiang · Yafeng Wang, Two-Symbol FPGA Architecture for Fast Arithmetic Encoding in JPEG 2000, Journal of Signal Processing Systems, 69(2)( 2012)213–224. [21] Saidani, T., Atri, M., Khriji, L., Tourki, R.: An efficient hardware implementation of parallel EBCOT algorithm for JPEG2000. J. Real- Time Image Process. 11, 1–12 (2013). [22] Varma,H.Damecharla,A.Bell,J.Carletta,G.Back,A fast JPEG2000 encoder that preserves coding efficiency:the splitarithmetic encoder, IEEE Transactions on Circuits and Systems—Part I:RegularPapers55(11)(2008) 3711–3722. [23] M. Dyer, S. Nooshabadi, D. Taubman, Design and analysis of system on a chip encoder for JPEG 2000, IEEE Transactions on Circuits and Systems for Video Technology 19 (2) (2009) 215–225. [24] N.R. Kumar, W. Xiang, Y. Wang, An FPGA-based fast two-symbol processing architecture for JPEG 2000 arithmetic coding, in: IEEE International Con- ference on Acoustics Speech and Signal Processing (ICASSP) 2010, 2010, pp. 1282–1285.