key: cord-1043437-ja0xjjur authors: Garz'on, Esteban; Golman, Roman; Jahshan, Zuher; Hanhan, Robert; Vinshtok-Melnik, Natan; Lanuzza, Marco; Teman, Adam; Yavits, Leonid title: Hamming Distance Tolerant Content-Addressable Memory (HD-CAM) for Approximate Matching Applications date: 2021-11-18 journal: IEEE Access DOI: 10.1109/access.2022.3158305 sha: bbb34e63378bf5bb8db19400bf0c1606ab6e73c2 doc_id: 1043437 cord_uid: ja0xjjur We propose a novel Hamming distance tolerant content-addressable memory (HD-CAM) for energy-efficient in memory approximate matching applications. HD-CAM implements approximate search using matchline charge redistribution rather than its rise or fall time, frequently employed in state of-the-art solutions. HD-CAM was designed in a 65 nm 1.2 V CMOS technology and evaluated through extensive Monte Carlo simulations. Our analysis shows that HD-CAM supports robust operation under significant process variations and changes in the design parameters, enabling a wide range of mismatch threshold (tolerable Hamming distance) levels and pattern lengths. HD-CAM was functionally evaluated for virus DNA classification, which makes HD-CAM suitable for hardware acceleration of genomic surveillance of viral outbreaks such as Covid-19 pandemics. C ONTENT-ADDRESSABLE memories (CAMs) offer outstanding performance in applications where highspeed searching is critical [1] , [2] . In addition to well-studied applications, such as network routers, digital signal processing, analytics, and reconfigurable computing [1] , [3] , CAMs can be used in variety of emerging compare-intensive big data workloads [4] , machine learning applications [5] , [6] , as well as genomic analysis [7] - [9] . In particular, genomic analysis, which has experienced exponential growth of data in recent years [10] , is an active research field and the basis for different kinds of applications, such as monitoring environmental ecosystems, sustainable agriculture, Earth's environment monitoring, and personalized healthcare [11] - [14] . Many of the those applications benefit from approximate rather than exact search, where a certain Hamming distance, i.e., several mismatching characters between a query pattern and the dataset stored in CAM, is tolerated. This work proposes a novel Hamming distance tolerant CAM (HD-CAM), designed to perform exact and approximate matching, capable of tolerating very large Hamming distances (e.g., 60% of the pattern length). Our design is based on the observation that if every mismatching bit results in a certain constant electrical charge reduction on a precharged matchline, then the total matchline voltage drop is proportional to the Hamming distance between the query pattern and a data word. HD-CAM exploits the charge redistribution of the matchline as a measure of Hamming distance. This is the main contribution of our proposal compared to state-of-theart approximate CAM designs, that use the matchline rise (charge) or fall (discharge) time as an equivalence of Hamming distance [15] - [17] . Another major contribution of HD-CAM is the ability to tolerate, and differentiate between patterns with, very large Hamming distances, as detailed below. HD-CAM applications include text processing [18] , DNA classification, DNA read mapping and several other genomic analysis workloads [14] , [19] , ECC-enabled fault-tolerant CAM, as well as any other workload that requires approximate rather than exact search. We performed a comprehensive design space exploration, evaluating our design in different process corners through extensive Monte-Carlo simulations. Our study was carried out at the circuit-level using a commercial 65 nm 1.2 V CMOS technology with Cadence Spectre. Circuit simulation results were applied to testing HD-CAM as a real-time DNA classifier, using Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) and several other virus DNA from the National Center for Biotechnology Information (NCBI) online datasets [20] . To evaluate HD-CAM performance, we use sensitivity and specificity (defined in Section IV.A and Section V) as figures of merit. To summarize, our work provides the following contribu- tions: • HD-CAM, an approximate search CAM, that uses matchline charge redistribution as a measure of Hamming distance; • HD-CAM tolerates, and differentiates between patterns with, very large Hamming distances with very high sensitivity; • HD-CAM is relatively insensitive to sampling time variation; • We comprehensively evaluate our design using commercial 65 nm process, covering all local variations around TT-, SS-, and FF-corners, as well as susceptibility to variations in design parameters. To the best of our knowledge, the HD-CAM represents the first design that can carry out approximate search, while tolerating very large Hamming distances. Moreover, it does not require data transformation such as error correction codes [21] , [22] or local sensitivity hashing [23] , [24] . The rest of the paper is organized as follows: Section II presents the background of our work; Section III discusses the proposed HD-CAM design and operation; Section IV details evaluation and design space exploration, while Section V shows how HD-CAM can be used for virus DNA classification, and discusses the results; Finally, Section VI concludes our work and presents ideas for future research. A. Conventional Content-Addressable Memory (CAM) Fig. 1(a) shows the architecture of a conventional n × m CAM (n being the number of rows and m the number of columns). It allows comparing a query pattern to the data stored in the bitcells. Each word stored in the CAM row has its own matchline (ML), which is connected to a sense amplifier (SA). A pair of searchlines (SLs), i.e., SL and SL, are connected to all the bitcells belonging to a column. An n-bit CAM word is shown in Fig. 1(b) , where the pre-charge (PC) transistors (i.e., ML pre-charge (PC-ML) and SL pre-charge (PC-SL)) are used to pre-discharge/charge the SLs/ML. The matchline sense-amplifier (MLSA) is used to sense the state of the ML. A typical NOR-type CAM bitcell is illustrated in Fig. 1 (c) 1 . It is based on a pair of cross-coupled inverters for storing the data. The bitcell is accessed for write and read similarly to a standard six-transistor static random access memory (6T-SRAM) cell, by using the word line (WL) to enable the row access, and driving SL and SL to opposite values for write, or pre-charging them for read. The associative search operation is implemented using the M C1 -M C3 transistors. At first, the SLs should be pre-discharged (i.e., pulled to ground), thus avoiding any possible ML discharge. While keeping the SLs discharged, the ML is pre-charged to the V DD voltage level. Then, the search word is loaded onto the SLs, and the PC-ML transistor is turned off (i.e., PC-ML = V DD ). If the value stored in the cell matches the value on the SLs (i.e., if the SL matches D), M C1 and M C2 keep the gate of M C3 low, cutting off the ML discharge path. In consequence, the ML remains high, which represents a match. On the contrary, when the SL value differs from the value in the storage cell, M C3 turns on and discharges the ML, which yields a mismatch. When the entire n-bit word is considered (see Fig. 1 (b)), the ML will remain high only in the case that all the storage cells match the search pattern, resulting in a word match. Conversely, a single bit mismatch is enough to discharge the ML, resulting in a word mismatch. In this work, we modify the NOR-type CAM bitcell to support approximate matching, as presented hereafter. Many ternary and binary NOR-and NAND-based CAM cell designs have been proposed in recent years, including CMOSbased [25] - [39] , as well as emerging memory based [40]- [44] solutions. Several CAM designs offer soft-error tolerance using error correcting coding (which requires memory redundancy) and replacing the matchline sense amplifier with an analog comparator [21] , [22] . Those designs typically tolerate only a limited Hamming distance (1-4 bits). Another class of approximate search CAMs uses local sensitivity hashing of stored data and query patterns [23] , [24] . While such schemes potentially tolerate large Hamming distances, they require hashing of data prior to storage and search. Additionally, large Hamming distance does not always result in low similarity of hashed data sketches [45] , which leads to false negative results and hence lower sensitivity. A CAM for minimum Hamming distance search that uses digital circuitry for bit comparison, as well as winner-take-all functionality is proposed in [46] . Several emerging memory (memristor crossbar) based designs for Hamming distance approximation have also been proposed [8] , [47] . NCAM [48] uses near-memory logic to calculate the sum of squares of data word differences (which measures the similarity between data vectors). PPAC [49] calculates Hamming similarity by performing a population count, by tallying the number of ones over all XNOR outputs of the CAM bitcells of a word. A variety of approximate search CAM designs use timing (i.e., score signal delay, or the speed of the matchline discharge) as a measure of Hamming distance. A Hamming distance search CAM, where the score signal is delayed every time a bit mismatch occurs, is proposed in [15] . In this design, the delay of the score signal is proportional to the Hamming distance between the search and stored patterns. In the approximate search enabled CAM for energy efficient GPUs, proposed in [16] , a small Hamming distance (≤ 2 bits) is tolerated through meticulous timing of the matchline discharge. In [17] , Hamming distance of (≤ 4 bits) is tolerated by using delay lines at the clock inputs of four separate sense amplifiers on each matchline. These tunable sampling time techniques require very precise device and circuit sizing, and suffer from false negatives (false mismatches) as well as false positives (multiple false matches) [16] , leading to limited efficiency of the approximate search technique. Tuning the sampling time is a complex task, which would require almost perfect skew balancing between all ML timing circuits and would be very sensitive to jitter. These issues are exacerbated by process variations. DNA sequencing is used for genomic surveillance and variant classification during the ongoing Covid-19 pandemic. DNA sequencing is a process of determining the bases of a DNA chain, which are referred to as Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). Contemporary highthroughput DNA sequencers can sequence multiple DNA samples in parallel [50] . DNA sequencing process, along with the genomic analysis, is carried out in several steps [51] : (1) sample preparation; (2) DNA sequencing that generates multiple DNA fragments called DNA reads; and (3) DNA classification, DNA read alignment, genome assembly, variant analysis, etc. Typically, tools like Kraken and Kraken2 [52] , [53] are used to detect and classify unknown DNA. However, Kraken operation is based on exact matching of sequenced DNA patterns in DNA database. Therefore, it requires relatively high coverage (high percentage of the target DNA in a sample) to perform with sufficient sensitivity. In this work, we propose a fast and highly sensitive approximate matching-based DNA classification scheme, implemented by HD-CAM. Our design allows tolerating very large Hamming distances (for example, up to 60% of the pattern length), while providing very high sensitivity and specificity, as detailed in Section V. The goal of our design is providing highly confident match and mismatch for dynamically configurable (by user) mismatch threshold (i.e., the Hamming distance, or the number of mismatching bits that can be tolerated), while making the proposed HD-CAM resilient to the process and design parameters variation. In other words, we want to ensure a definite match if the number of mismatching bits in an HD-CAM row is ≤ k, and a certain mismatch if the number of mismatching bits in an HD-CAM row is ≥ l, where k and l are integer numbers and k