key: cord-1043437-ja0xjjur
authors: Garz'on, Esteban; Golman, Roman; Jahshan, Zuher; Hanhan, Robert; Vinshtok-Melnik, Natan; Lanuzza, Marco; Teman, Adam; Yavits, Leonid
title: Hamming Distance Tolerant Content-Addressable Memory (HD-CAM) for Approximate Matching Applications
date: 2021-11-18
journal: IEEE Access
DOI: 10.1109/access.2022.3158305
sha: bbb34e63378bf5bb8db19400bf0c1606ab6e73c2
doc_id: 1043437
cord_uid: ja0xjjur

We propose a novel Hamming distance tolerant content-addressable memory (HD-CAM) for energy-efficient in memory approximate matching applications. HD-CAM implements approximate search using matchline charge redistribution rather than its rise or fall time, frequently employed in state of-the-art solutions. HD-CAM was designed in a 65 nm 1.2 V CMOS technology and evaluated through extensive Monte Carlo simulations. Our analysis shows that HD-CAM supports robust operation under significant process variations and changes in the design parameters, enabling a wide range of mismatch threshold (tolerable Hamming distance) levels and pattern lengths. HD-CAM was functionally evaluated for virus DNA classification, which makes HD-CAM suitable for hardware acceleration of genomic surveillance of viral outbreaks such as Covid-19 pandemics.

C ONTENT-ADDRESSABLE memories (CAMs) offer outstanding performance in applications where highspeed searching is critical [1] , [2] . In addition to well-studied applications, such as network routers, digital signal processing, analytics, and reconfigurable computing [1] , [3] , CAMs can be used in variety of emerging compare-intensive big data workloads [4] , machine learning applications [5] , [6] , as well as genomic analysis [7] - [9] . In particular, genomic analysis, which has experienced exponential growth of data in recent years [10] , is an active research field and the basis for different kinds of applications, such as monitoring environmental ecosystems, sustainable agriculture, Earth's environment monitoring, and personalized healthcare [11] - [14] . Many of the those applications benefit from approximate rather than exact search, where a certain Hamming distance, i.e., several mismatching characters between a query pattern and the dataset stored in CAM, is tolerated.

This work proposes a novel Hamming distance tolerant CAM (HD-CAM), designed to perform exact and approximate matching, capable of tolerating very large Hamming distances (e.g., 60% of the pattern length). Our design is based on the observation that if every mismatching bit results in a certain constant electrical charge reduction on a precharged matchline, then the total matchline voltage drop is proportional to the Hamming distance between the query pattern and a data word. HD-CAM exploits the charge redistribution of the matchline as a measure of Hamming distance. This is the main contribution of our proposal compared to state-of-theart approximate CAM designs, that use the matchline rise (charge) or fall (discharge) time as an equivalence of Hamming distance [15] - [17] . Another major contribution of HD-CAM is the ability to tolerate, and differentiate between patterns with, very large Hamming distances, as detailed below.

HD-CAM applications include text processing [18] , DNA classification, DNA read mapping and several other genomic analysis workloads [14] , [19] , ECC-enabled fault-tolerant CAM, as well as any other workload that requires approximate rather than exact search.

We performed a comprehensive design space exploration, evaluating our design in different process corners through extensive Monte-Carlo simulations. Our study was carried out at the circuit-level using a commercial 65 nm 1.2 V CMOS technology with Cadence Spectre. Circuit simulation results were applied to testing HD-CAM as a real-time DNA classifier, using Severe Acute Respiratory Syndrome coronavirus-2 (SARS-CoV-2) and several other virus DNA from the National Center for Biotechnology Information (NCBI) online datasets [20] . To evaluate HD-CAM performance, we use sensitivity and specificity (defined in Section IV.A and Section V) as figures of merit.

To summarize, our work provides the following contribu- tions:

• HD-CAM, an approximate search CAM, that uses matchline charge redistribution as a measure of Hamming distance; • HD-CAM tolerates, and differentiates between patterns with, very large Hamming distances with very high sensitivity; • HD-CAM is relatively insensitive to sampling time variation; • We comprehensively evaluate our design using commercial 65 nm process, covering all local variations around TT-, SS-, and FF-corners, as well as susceptibility to variations in design parameters. To the best of our knowledge, the HD-CAM represents the first design that can carry out approximate search, while tolerating very large Hamming distances. Moreover, it does not require data transformation such as error correction codes [21] , [22] or local sensitivity hashing [23] , [24] .

The rest of the paper is organized as follows: Section II presents the background of our work; Section III discusses the proposed HD-CAM design and operation; Section IV details evaluation and design space exploration, while Section V shows how HD-CAM can be used for virus DNA classification, and discusses the results; Finally, Section VI concludes our work and presents ideas for future research.

A. Conventional Content-Addressable Memory (CAM) Fig. 1(a) shows the architecture of a conventional n × m CAM (n being the number of rows and m the number of columns). It allows comparing a query pattern to the data stored in the bitcells. Each word stored in the CAM row has its own matchline (ML), which is connected to a sense amplifier (SA). A pair of searchlines (SLs), i.e., SL and SL, are connected to all the bitcells belonging to a column. An n-bit CAM word is shown in Fig. 1(b) , where the pre-charge (PC) transistors (i.e., ML pre-charge (PC-ML) and SL pre-charge (PC-SL)) are used to pre-discharge/charge the SLs/ML. The matchline sense-amplifier (MLSA) is used to sense the state of the ML.

A typical NOR-type CAM bitcell is illustrated in Fig. 1 (c) 1 . It is based on a pair of cross-coupled inverters for storing the data. The bitcell is accessed for write and read similarly to a standard six-transistor static random access memory (6T-SRAM) cell, by using the word line (WL) to enable the row access, and driving SL and SL to opposite values for write, or pre-charging them for read. The associative search operation is implemented using the M C1 -M C3 transistors. At first, the SLs should be pre-discharged (i.e., pulled to ground), thus avoiding any possible ML discharge. While keeping the SLs discharged, the ML is pre-charged to the V DD voltage level. Then, the search word is loaded onto the SLs, and the PC-ML transistor is turned off (i.e., PC-ML = V DD ). If the value stored in the cell matches the value on the SLs (i.e., if the SL matches D), M C1 and M C2 keep the gate of M C3 low, cutting off the ML discharge path. In consequence, the ML remains high, which represents a match. On the contrary, when the SL value differs from the value in the storage cell, M C3 turns on and discharges the ML, which yields a mismatch. When the entire n-bit word is considered (see Fig. 1 (b)), the ML will remain high only in the case that all the storage cells match the search pattern, resulting in a word match. Conversely, a single bit mismatch is enough to discharge the ML, resulting in a word mismatch.

In this work, we modify the NOR-type CAM bitcell to support approximate matching, as presented hereafter. 

Many ternary and binary NOR-and NAND-based CAM cell designs have been proposed in recent years, including CMOSbased [25] - [39] , as well as emerging memory based [40]- [44] solutions. Several CAM designs offer soft-error tolerance using error correcting coding (which requires memory redundancy) and replacing the matchline sense amplifier with an analog comparator [21] , [22] . Those designs typically tolerate only a limited Hamming distance (1-4 bits). Another class of approximate search CAMs uses local sensitivity hashing of stored data and query patterns [23] , [24] . While such schemes potentially tolerate large Hamming distances, they require hashing of data prior to storage and search. Additionally, large Hamming distance does not always result in low similarity of hashed data sketches [45] , which leads to false negative results and hence lower sensitivity. A CAM for minimum Hamming distance search that uses digital circuitry for bit comparison, as well as winner-take-all functionality is proposed in [46] . Several emerging memory (memristor crossbar) based designs for Hamming distance approximation have also been proposed [8] , [47] . NCAM [48] uses near-memory logic to calculate the sum of squares of data word differences (which measures the similarity between data vectors). PPAC [49] calculates Hamming similarity by performing a population count, by tallying the number of ones over all XNOR outputs of the CAM bitcells of a word.

A variety of approximate search CAM designs use timing (i.e., score signal delay, or the speed of the matchline discharge) as a measure of Hamming distance. A Hamming distance search CAM, where the score signal is delayed every time a bit mismatch occurs, is proposed in [15] . In this design, the delay of the score signal is proportional to the Hamming distance between the search and stored patterns. In the approximate search enabled CAM for energy efficient GPUs, proposed in [16] , a small Hamming distance (≤ 2 bits) is tolerated through meticulous timing of the matchline discharge. In [17] , Hamming distance of (≤ 4 bits) is tolerated by using delay lines at the clock inputs of four separate sense amplifiers on each matchline. These tunable sampling time techniques require very precise device and circuit sizing, and suffer from false negatives (false mismatches) as well as false positives (multiple false matches) [16] , leading to limited efficiency of the approximate search technique. Tuning the sampling time is a complex task, which would require almost perfect skew balancing between all ML timing circuits and would be very sensitive to jitter. These issues are exacerbated by process variations.

DNA sequencing is used for genomic surveillance and variant classification during the ongoing Covid-19 pandemic. DNA sequencing is a process of determining the bases of a DNA chain, which are referred to as Adenine (A), Guanine (G), Cytosine (C), and Thymine (T). Contemporary highthroughput DNA sequencers can sequence multiple DNA samples in parallel [50] . DNA sequencing process, along with the genomic analysis, is carried out in several steps [51] : (1) sample preparation; (2) DNA sequencing that generates multiple DNA fragments called DNA reads; and (3) DNA classification, DNA read alignment, genome assembly, variant analysis, etc.

Typically, tools like Kraken and Kraken2 [52] , [53] are used to detect and classify unknown DNA. However, Kraken operation is based on exact matching of sequenced DNA patterns in DNA database. Therefore, it requires relatively high coverage (high percentage of the target DNA in a sample) to perform with sufficient sensitivity.

In this work, we propose a fast and highly sensitive approximate matching-based DNA classification scheme, implemented by HD-CAM. Our design allows tolerating very large Hamming distances (for example, up to 60% of the pattern length), while providing very high sensitivity and specificity, as detailed in Section V.

The goal of our design is providing highly confident match and mismatch for dynamically configurable (by user) mismatch threshold (i.e., the Hamming distance, or the number of mismatching bits that can be tolerated), while making the proposed HD-CAM resilient to the process and design parameters variation. In other words, we want to ensure a definite match if the number of mismatching bits in an HD-CAM row is ≤ k, and a certain mismatch if the number of mismatching bits in an HD-CAM row is ≥ l, where k and l are integer numbers and k<l. Throughout the paper, we refer to the region between k and l as uncertainty region, where a false match or a false mismatch may occur. Fig. 2 (a) shows the schematic of the HD-CAM bitcell, which is capable of performing exact as well as approximate search operations. The HD-CAM cell is based on the NOR-type CAM bitcell of Fig. 1 (c) with the addition of an evaluation transistor (M eval ) that is used to control the ML discharge rate, based on the level of the evaluation voltage (V eval ). When the M eval transistor is driven by full voltage level (i.e., V eval = V DD ), a conventional exact match CAM operation is performed. Approximate matching is enabled when V eval < V DD . The layout of the HD-CAM bitcell is shown in Fig. 2 (b). Note, the HD-CAM cell is laid out using Metal-1 to Metal-3 only, and can be further improved by allowing higher Metal layers. Fig. 3 shows the timing diagram of the HD-CAM, organized as a single 8-bit word, for the cases where V eval = V DD (exact match) and V eval < V DD (approximate match). In the simulation, each cycle is 2 ns long made up of 1 ns of precharge and 1 ns of evaluation time (t eval ). For both cases, the operating principle is the same: at first, SL and SL are driven low. Then, the ML is pre-charged to V DD and subsequently, the search pattern is loaded to the SLs for evaluating the ML. If the ML stays above a certain evaluation threshold level (defined in Section III-B), we achieve a match; otherwise there is a mismatch. For circuit simulation purposes, we pre-loaded the 8-bit HD-CAM word with D = '1' (see the D signal in Fig. 3) ; then, in each subsequent cycle we applied a new query pattern to the SLs. In the case of the exact search operation (see Fig. 3 , 0-5 ns time-frame), a single bit mismatch between D and SL results in a full discharge of the ML, while a match will keep ML high. In the case of approximate search (see Fig. 3 , 5-10 ns time-frame), the ML remains high even with a Unlike the exact matching operation, the approximate matching task depends on the conductivity of the M eval transistor, which plays a major role. When V eval is low enough, a single mismatching cell is not able to discharge the ML capacitance during the evaluation time (see Fig. 3 , 6-7 ns timeframe). This leads to longer evaluation time in contrast to the exact match operation (see 2-3 ns and 8-9 ns time-frame in Fig. 3 ) due to the inherently lower conductivity of the M eval transistor when the asserted V eval is lower than V DD , causing the ML to discharge slowly. When taking into account HD-CAM arrays with longer words (patterns), e.g., 128-bit or 256-bit, the inherently larger ML capacitance will impact the approximate match operation, and several mismatching bitcells may be required to discharge the ML.

HD-CAM exhibits certain delay overhead compared to conventional NOR CAM schemes with no evaluation transistor. As we show in Section IV, such delay overhead is the direct result of HD-CAM ability to enable very large mismatch threshold, and to maintain very high efficiency across the entire range of mismatch threshold levels.

The matchline sense amplifier (MLSA) is based on a classical StrongARM design [54] . The ML sensing design and circuitry is shown in Fig. 4 , as follows: a Sample and Hold circuit (S&H) (Fig. 4(a) ), a timing unit ( Fig. 4(b) ), a crosscoupled amplifier (Fig. 4(c) ) which, in an open loop configuration works as a comparator, and a RS latch (Fig. 4(d) ). While ML is being evaluated, the cross-coupled amplifier is in a PC phase dictated by M3A and M3B transistors controlled by a delayed PC-ML (i.e., PC-ML-D), where the outputs are charged to V DD . Upon completion of the evaluation phase, a pulse (Sample) triggers the charging of a small capacitor to the ML voltage -labeled ML-Sampled (ML-S). The comparator evaluates the ML-S compared to an evaluation threshold (V evalth ), which is the tolerance reference voltage, yielding a match or mismatch when the ML voltage is above or below V evalth , respectively. The result is fed into the RS latch, which compensates for metastable behavior in the MLSA PC phase. Sampling the value of ML onto a capacitor frees the mutual dependence of the evaluation time of the MLSA response and the PC of the ML for the next evaluation. This allows the creation of two interlocked cycles -while ML is precharged, the S&H result is being evaluated, and while the SA is being precharged, the ML is being evaluated. This results in zero-downtime of the system, and every PC cycle has a corresponding evaluation cycle done in parallel as shown in Fig. 4 (e). In addition to the sampling device, an inverse device exists, in order to discharge the ML-S node for the next sample.

HD-CAM evaluation and design space exploration are carried out using extensive Monte Carlo (MC) simulations. We employ Cadence Spectre, using transistor models provided by a commercial 65 nm CMOS technology featuring a nominal supply voltage V DD of 1.2 V.

HD-CAM design space is defined by the tuple [t eval , V eval , V evalth ], which we interchangeably refer to as design space variables or parameters. In the following subsections, we show how adjusting those parameters allow us to tune the desired mismatch threshold as well as HD-CAM efficiency levels for different process corners.

Table I provides a comparison, at circuit-level, between a conventional CAM and the proposed HD-CAM in terms of energy consumption-per-bit and bitcell area. We assume a memory word of 256-bits with a single operation cycle of 2 ns (PC + evaluation time, as shown previously in Fig. 3) . Additionally, for HD-CAM we have examined three V eval values: 0.4 V, 0.5 V, and 0.6 V. For exact match operations, both the HD-CAM (with V eval =1.2 V) and conventional CAM have the same energy consumption. However, in comparison to conventional CAM, HD-CAM consumes less energy per bit when performing an approximate search operation, due to the partial (rather than complete) discharge of the matchline. The results show that for the various Hamming distance levels, energy consumption per bit grows as V eval increases. Bitcell area comparison is shown in Table I . HD-CAM employs an additional transistor (see Fig. 2 (b)) leading to a bitcell area overhead of about 10%.

The first figure of merit is Sensitivity, which measures the probability of correctly detecting the similarity existing between the query pattern and any of the data words stored in HD-CAM. It is defined as:

where T P and F N are true positive and false negative results, respectively. True positive (TP) result is obtained when a compare of two patterns with Hamming distance below the mismatch threshold results in a match; False negative (FN) result is obtained when a compare of two patterns with Hamming distance below the mismatch threshold results in a mismatch.

More plainly, sensitivity measures the probability of correctly detecting the similarity that actually exists between the query pattern and any of the data words stored in HD-CAM. The second figure of merit is Specificity, which measures the probability of correctly rejecting a query pattern which is not sufficiently similar to any of the data words stored in HD-CAM. It is defined as:

where T N and F P are true negative and false positive results, respectively. True negative (TN) result is obtained when a compare of two patterns with Hamming distance above the mismatch threshold results in a mismatch; False positive (FN) result is obtained when a compare of two patterns with Hamming distance above the mismatch threshold results in a match. Specificity measures the probability of correctly rejecting a query pattern which is not sufficiently similar to any of the data words stored in HD-CAM. 

We apply extensive MC simulations to examine local variations (i.e., mismatches 2 ) around Fast-Fast (FF), TT, and Slow-Slow (SS) corners. In order to assess the entire design space of HD-CAM, three main design parameters that affect the ML signal are examined: the V eval voltage applied at the gate of the M eval transistor (see Fig. 2 ), the evaluation time t eval , and the evaluation threshold V evalth . Each combination of [t eval , V eval , V evalth ] corresponds to an individual mismatch threshold level. In other words, mismatch threshold can be adjusted by changing the values of [t eval , V eval , V evalth ].

We simulate the HD-CAM approximate search sensitivity and specificity for different process corners and different values of V evalth and V eval , as presented in Fig. 5 . Here, t eval is set at 1 ns (following the pre-charge stage). Lastly, we vary the Hamming distance, between the query pattern and the stored data word, from 1 bit to 9 bits. Due to process variations, different HD-CAM memory rows may require different numbers of mismatching bits to keep the ML above the V evalth level (producing a match), as well as to discharge the ML below the V evalth level (yielding a mismatch). In other words, HD-CAM sensitivity and specificity are affected by process variations. We differentiate between three sensitivity/specificity regions (see color-map of Fig. 5 ):

(1) 100% specificity, i.e., mismatches are always true; (2) an uncertainty region with sensitivity and specificity both below 100% (meaning matches and mismatches could be true and false); and (3) 100% sensitivity, i.e., matches are always true. Overall, Fig. 5(a-c) show that uncertainty regions are relatively small, whereas 100% sensitivity and 100% specificity regions (plateaus) are accordingly very large.

Some applications, such as DNA classification presented in Section V, tolerate uncertainty (i.e., sensitivity and specificity under 100%) due to their intrinsic inexact nature. Other applications, that require 100% sensitivity and 100% specificity, may rely on advanced coding techniques that ensure that any two valid codewords should have Hamming distance larger than the worst case uncertainty region (2) . In such case, a valid codeword is always differentiated from any other valid codeword with probability of 100% (due to 100% specificity), while small changes (caused for example by soft errors) that fall into the blue region (3) are tolerated with 100% sensitivity. cation requires the mismatch threshold of 2 bits with 100% sensitivity. For TT corner, this requirement is satisfied by setting V eval at 0.55V and V evalth at 55% V DD . This scenario is marked in fluorescent yellow in Fig. 5(b) . However, if the HD-CAM chip targeted for this application is manufactured in FF corner, the required mismatch threshold level can still be maintained by reducing V eval to 0.45V and keeping the V evalth under 45% V DD . Similarly, if the HD-CAM chip is manufactured in SS corner, the required mismatch threshold can be sustained by increasing V eval to 0.65V and keeping the V evalth under 70% V DD . This example shows that HD-CAM can successfully endure significant process variation and keep the target mismatch threshold by dynamically adjusting the [t eval , V eval , V evalth ] tuple.

State of the art approximate search techniques that use tunable sampling time to measure similarity, support limited pattern length, typically up to 64 or 96 bits [16] . While sufficient in several applications, this might be quite prohibitive in genomic analysis applications.

We extend our analysis on the HD-CAM to evaluate its behavior for different pattern length (memory word sizes): 128-bit, 256-bit, and 512-bit. We assume the mismatch threshold of zero and conduct 1000 MC simulations for Hamming distance ranging from 1 to 9 bits. Fig. 6 presents the sensitivity and specificity vs. Hamming distance, for different word lengths, for the TT-corner with V eval =0.60 V, V evalth =0.6·V DD , and t eval =1 ns.

We observe that for the fixed values of t eval , V eval and V evalth , larger memory words allow higher mismatch threshold levels at the cost of slightly wider uncertainty regions (regions where the sensitivity and specificity are lower than 100%).

Mismatch threshold defines the tolerable Hamming distance. State of the art approximate search techniques that use tunable sampling time to measure similarity, are limited to tolerating very small Hamming distances (typically, 1-4 bits), and demonstrate limited sensitivity and precision [16] . While suitable for some applications, it might be quite prohibitive in applications such as text processing and genomic analysis, that require similarity search with insertions and deletions.

Insertions are edits in which extra character(s) or DNA basepair(s) are inserted into existing text string or DNA sequence. Deletions are edits in which character(s) or DNA basepair(s) are deleted from existing text string or DNA sequence. A single insertion or deletion may result in a very large Hamming distance, so that a query pattern with a single insertion or deletion will not match, thus reducing the sensitivity of the approximate search technique.

The proposed HD-CAM is capable of tolerating, and differentiating between patterns with, very large Hamming distances. Fig. 7 shows the sensitivity and specificity of HD-CAM for several mismatch threshold levels, spanning almost the entire pattern length.

Based on sensitivity and specificity results of Fig. 7 , we can make the following observations:

• HD-CAM supports very large mismatch thresholds, up to 155 bits out of 256 bits, or above 60% of the pattern length in this study (well above the typical limit of 4 bits in state of the art tunable sampling time schemes). • HD-CAM exhibits sensitivity (specificity), typically around 55%-60%, when the Hamming distance between the query pattern and a stored word is as high as mismatch threshold -1 (+1) bits. • HD-CAM sensitivity (specificity) reaches 100% relatively quickly as the distance between the query pattern and a stored word drops (rises). • Sensitivity and specificity rise slower as the mismatch threshold increases. However, even for mismatch threshold of 155 bits (60% of the pattern length), the sensitivity and specificity reach 100% when the Hamming distance between the query pattern and a stored data word rises (drops) 24 bits above (below) the mismatch threshold of 155 bits. In summary, HD-CAM is not limited to tolerating of, and differentiating between patterns with, small Hamming distances, which is prohibitive in several similarity search applications. HD-CAM can extend the mismatch threshold to at least 60% of the pattern length. Note, in this study we only change the V evalth to tune the mismatch threshold. Higher mismatch thresholds and narrower uncertainty regions can potentially be obtained if we additionally adjust other design parameters, such as V eval .

State of the art approximate search techniques that use tunable sampling time to measure similarity, are strongly susceptible to time variation and jitter. In contrast, HD-CAM is specifically designed to withstand time variation. In order to quantitatively evaluate the sensitivity of HD-CAM to the sampling time t eval variation, we perform 1000 MC simulations with random variable t eval , normally distributed with the mean of 1 ns and standard deviations of 30 ps, 50 ps, 75 ps and 100 ps. Sensitivity and specificity of the HD-CAM approximate search under t eval variation are presented in Fig. 8 , for a large mismatch threshold of 70 bits. The MC results demonstrate that t eval variation impact is limited to a certain expansion of the uncertainty region.

We showed that by adjusting design space parameters t eval , V eval and V evalth , we are able to: 1) Successfully and efficiently calibrate the mismatch threshold, above 60% of the pattern length and potentially higher; 2) Ensure the sensitivity and specificity of 100%; 3) Successfully tolerate different process corners. For example, if a target application requires a certain mismatch threshold assuming TT-corner, but the specific HD-CAM chip is manufactured at FF-corner or SS-corner, we would be able to maintain the required mismatch threshold by adjusting V eval and V evalth .

Genomic analysis can be used for diagnosis of viral diseases alongside traditional testing methods such as polymerase chain reaction (PCR) [55] - [57] . It provides much higher sensitivity (i.e., much lower false negative rate) and precision (i.e., much lower false positive rate) than PCR for virus detection in living organisms [55] , [58] .

Covid-19 has emerged as a worldwide pandemic, causing the loss of millions of lives and tens of trillions of dollars. For a rapid response to pandemic outbreak, a fast, cost-effective, and reliable pathogen classification is strongly needed [55] , [57] . Particularly, accurate diagnostic is extremely important for the governments to establish appropriate control measurements and guidelines [59] . If genomic surveillance of the entire population combined with efficient and affordable analysis of sequenced DNA were available, this goal would be more achievable.

Virus DNA classification, whose aim is to overcome the shortcomings of existing testing techniques [56] , [57] , can strongly benefit from the approximate matching capabilities of the proposed HD-CAM. Fig. 9(a) illustrates the HD-CAM as a component in a virus DNA classification-by-sequencing pipeline, where for the sake of simplicity, intermediate steps for the DNA sample preparation and sequencing were omitted.

The DNA of the target virus (designated as the reference DNA) is known in advance, and represented by a set of short DNA fragments (k basepairs long) named k-mers [60] . All possible k-mers in the reference DNA are extracted as follows (refer to Fig. 9(b) ): the first k-mer is the reference DNA fragment from position 0 to position k − 1, the second k-mer is the reference DNA fragment from position 1 to position k, and so on. The reference DNA database is generated by storing k-mers in the HD-CAM prior to the classification operation, where a unique k-mer is stored in a separate HD-CAM row, as shown in Fig. 9(b) . The number of k-mers in the HD-CAM is bounded by N − k + 1, where N is the length of the virus DNA. In the case of SARS-CoV-2 virus, N is equal to 29,903, where some k-mers may appear more than once, therefore the actual number of k-mers in the HD-CAM could be lower.

A sequenced sample (output of the sequencer in Fig. 9 (a)) typically consists of a large set of DNA fragments (refer to Fig. 9(a) ), called DNA reads, sourced from DNAs of different organisms (e.g., bacteria and viruses) presented in the sample. To classify the virus DNA in the sequenced sample, each DNA read is searched in the HD-CAM (refer to query pattern in Fig. 9(b) ). In other words, a DNA read is a query pattern and is compared against the entire reference DNA database simultaneously. Thus, ideally, reads that belong to the target virus DNA, should match exactly in the HD-CAM. However, DNA reads contain sequencing errors [61] of the following three types: replacement or substitution (where a certain basepair is called incorrectly, i.e., replaced by a wrong one), insertion (where a basepair is inserted into existing DNA sequence), and deletion (where a basepair is deleted from existing DNA sequence). Another source of difference between the target DNA reads and the reference DNA are genetic variations, which may occur in mutations, such as UK, South African or Delta variants of SARS-CoV-2. Such variations result in a nonzero Hamming distance between a read and a reference fragment (k-mer) that would otherwise match exactly.

The ability of HD-CAM to tolerate large numbers of mismatching bits enables DNA classification, even when the target DNA reads have multiple sequencing errors or genetic variations. Moreover, by allowing the programmable mismatch threshold, HD-CAM supports a wide variety of sequencing error profiles.

The reads sourced from DNAs of other organisms presented in the sample, are expected to exhibit significant difference vs. the target virus DNA, such that the Hamming distance between such reads and the reference k-mers should typically be higher than the mismatch threshold, which in turn allows classifying those reads as not SARS-CoV-2.

The target virus is SARS-CoV-2, downloaded from NCBI online data sets [20] . We encode the basepairs using one-hot encoding (A=0001, C=0010, G=0100, T=1000), as presented in Fig. 9(b) . By using this coding scheme, we ensure that any basepair difference results in a Hamming distance of two bits, regardless of the basepair value (A, T, G or C). Another alternative is to apply Gray coding. By selecting 3-bit Gray code values that are two positions apart (e.g., A=000, C=011, G=110 and T=101), we ensure that the Hamming distance between any two basepair values is a constant two bits. For a 256-bit wide HD-CAM, the k-mer length is set at k = 64. Accordingly, the size of the reference DNA database in the HD-CAM is 29,903×256 bits. We test three types of DNA samples. The first type is created by extracting reads from random positions in the reference SARS-CoV-2 DNA and injecting random errors into those reads as follows: replacement rate = 3.6%, insertion rate = 0.2%, deletion rate = 0.2% [62] . It should be noted that while the effect of replacement rate on the Hamming distance is straightforward, even a single insertion or deletion may create a very large Hamming distance between the query k-mer and the reference k-mers.

The second type of sample is formed by extracting reads from the DNA of SARS-CoV-2 UK variant.

Finally, the samples of the third type are created using DNA reads from other human coronaviruses (SARS, MERS, alpha, and beta), and Papilloma, which is not a coronavirus.

We expect the DNA reads of the first two types to be classified as SARS-CoV-2, while the reads of the third type should be classified as not SARS-CoV-2.

Similarly to the evaluation methodology introduced in Section IV, we use sensitivity and specificity to evaluate the efficiency of virus classification by HD-CAM. Note that from this point and throughout the rest of the paper, sensitivity and specificity measure the probability of correct classification of a DNA read (as SARS-CoV-2 or not SARS-CoV-2) rather than correct match or mismatch in general.

Here, sensitivity = T P/(T P +F N ), where T P (true positive) means that target virus DNA read is correctly classified as SARS-CoV-2, and F N (false negative) means that target virus DNA read is wrongly rejected. We use sensitivity to evaluate the efficiency of HD-CAM in analyzing the DNA of the first and the second type, because in these tests, no negative results are expected, hence all positive results are true and all negative results are false.

Specif icity = T N/(T N +F P ), where T N (true negative) means that not SARS-CoV-2 read is correctly rejected, and F P (false positive) means that unrelated DNA read is wrongly classified as SARS-CoV-2. We use specificity to evaluate the efficiency of HD-CAM in processing the DNA of the third type, because in such a test, no positive results are expected, hence all negative results are true and all positive results are false.

We tested several DNA samples using different HD-CAM mismatch threshold values, i.e., the Hamming distance (in basepairs) that the HD-CAM is configured to tolerate.

The resulting HD-CAM sensitivity (of SARS-CoV-2 and its UK variant classification) and specificity (while processing DNA of other organisms) are presented in Fig. 10 . For comparison, we also show the sensitivity and specificity of state of the art DNA classification tool Kraken2 [53] .

We make the following observations:

• HD-CAM DNA classification sensitivity grows with the mismatch threshold, reaching 98% when mismatch threshold is set at 16 basepairs, providing 2.2× improvement vs. Kraken2; • Kraken2 provides the lowest DNA classification sensitivity, because it employs exact rather than approximate search; • The 4 bits [17] limitation on mismatch threshold results in limited DNA classification sensitivity (green bar in Fig. 10 ); • Insertions and deletions significantly reduce the sensitivity of Kraken2 as well as state of the art tunable sampling time schemes [16] , [17] , and prevent HD-CAM from reaching 100% sensitivity;

• HD-CAM and Kraken2 exhibit similar specificity of nearly 100% except for the case of SARS-CoV-1 (a virus very similar to SARS-CoV-2) and mismatch threshold of 16 basepairs, where the specificity drops due to the increased sensitivity.

In this paper we present a novel content addressable memory, HD-CAM, which enables a single cycle approximate search with programmable mismatch threshold (tolerable Hamming distance between the query and the stored pattern). HD-CAM makes use of NOR-type bitcells that have been modified to allow approximate search by matchline charge redistribution. Our design was implemented and evaluated with a 65 nm commercial CMOS technology at different process corners and under process variation conditions through extensive Monte Carlo simulations. Results demonstrate that HD-CAM exhibits very high sensitivity and specificity in a very wide range of dynamically-configurable mismatch threshold levels. We perform a detailed design space exploration and show that HD-CAM has very low susceptibility to process and sampling time variation. We also show that HD-CAM enables a wide range of pattern lengths and mismatch threshold levels, unsupported by state of the art approximate search CAM designs.

HD-CAM is applied to DNA classification, which is one of the critical steps in genomic surveillance, employed throughout the world to combat Covid-19. Specifically, we use HD-CAM to classify SARS-CoV-2 in a large genomic samples. HD-CAM is shown to outperform state of the art DNA classification tool in terms of sensitivity and specificity.

Finally, we would like to point out another potential application for HD-CAM: an ECC enabled CAM. Typically, using ECC in CAM is not trivial; if even a single bit of a memory block that stores an ECC-encoded search pattern fails, the search pattern would not match and a query would incorrectly result in a mismatch. HD-CAM potentially makes ECC protection easy. If the memory content is encoded such that the Hamming distance between any two valid codewords is larger than the worst case uncertainty region (where false matches and false mismatches are possible), then any error where the number of failed bits is below such uncertainty period, can be tolerated with 100% guarantee. Hence, the number of tolerable errors (failed bits) in a single memory block can be varied by programming the mismatch threshold. This application of our proposal will be evaluated in the future.

Content-addressable memory (CAM) circuits and architectures: a tutorial and survey

A Low Power Content Addressable Memory Using Low Swing Search Lines

Emerging Trends in Design and Applications of Memory-Based Computing and Content-Addressable Memories

PIM-WEAVER: A High Energy-efficient, General-purpose Acceleration Architecture for String Operations in Big Data Processing

Proceedings of the ACM/IEEE International Symposium on Low Power Electronics and Design, ser. ISLPED '20

ACAM: Approximate Computing Based on Adaptive Associative Memory with Online Learning

A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

Approximate memristive in-memory Hamming distance circuit

BioSEAL: In-Memory Biological Sequence Alignment Accelerator for Large-Scale Genomic Data

Big data: astronomical or genomical?

Microbial indicators of environmental perturbations in coral reef ecosystems

Crop microbiome and sustainable agriculture

Bayesian modeling reveals host genetics associated with rumen microbiota jointly influence methane emission in dairy cows

Accelerating genome analysis: a primer on an ongoing journey

A Low-Power Associative Processor with the R-th Nearest-Match Hamming-Distance Search Engine Employing Time-Domain Techniques

Approximate associative memristive memory for energy-efficient GPUs

Exploring hyperdimensional associative memory

Evaluating MapReduce for Multi-core and Multiprocessor Systems

RASSA: Resistive Prealignment Accelerator for Approximate DNA Long Read Mapping

Bethesda (MD): National Library of Medicine (US), National Center for Biotechnology Information

A Soft-Error Tolerant Content-Addressable Memory (CAM) Using An Error-Correcting-Match Scheme

Error-Correcting Codes for Ternary Content Addressable Memories

Ferroelectric ternary contentaddressable memory for one-shot learning

Camsure: Secure contentaddressable memory for approximate search

Hybrid-Type CAM Design for Both Power and Performance Efficiency

Design of low-power content-addressable memory cell

A High Speed Low Power CAM With a Parity Bit and Power-Gated ML Sensing

design of a powerefficient cam using automated background checking scheme for small match line swing

Design and performance analysis of a CNFET-based TCAM cell with dual-chirality selection

A 128×128b high-speed wide-and match-line content addressable memory in 32nm CMOS

Design and analysis of power efficient binary content addressable memory (PEBCAM) core cells

Match-Line Division and Control to Reduce Power Dissipation in Content Addressable Memory

High speed, low matchline voltage swing and search line activity TCAM cell array design in 14 nm FinFET technology

A 9-T 833-MHz 1.72-fJ/Bit/Search Quasi-Static Ternary Fully Associative Cache Tag With Selective Matchline Evaluation for Wire Speed Applications

1.4Gsearch/s 2-Mb/mm2 TCAM Using Two-Phase-Pre-Charge ML Sensing and Power-Grid Pre-Conditioning to Reduce Ldi/dt Power-Supply Noise by 50%

0.4V Reconfigurable Near-Threshold TCAM in 28nm High-k Metal-Gate CMOS Process

Computer Architecture with Associative Processor Replacing Last-Level Cache and SIMD Accelerator

A 4 + 2T SRAM for Searching and In-Memory Computing With 0.3-V V DDmin

Precharge-Free, Low-Power Content-Addressable Memory

Monolithic 3D+-IC Based Massively Parallel Compute-in-Memory Macro for Accelerating Database and Machine Learning Primitives

GIRAF: General Purpose In-Storage Resistive Associative Framework

Resistive Associative Processor

PRINS: Processing-in-Storage Acceleration of Machine Learning

A Resistive CAM Processing-in-Storage Architecture for DNA Sequence Alignment

Localitysensitive hashing for the edit distance

Compact associativememory architecture with fully parallel search capability for the minimum Hamming distance

Hamming network circuits based on CMOS/memristor hybrid design

PPAC: A versatile in-memory accelerator for matrix-vector-product-like operations

Ncam: Near-data processing for nearest neighbor search

Illumina -DNA Sequencing

GRIM-Filter: Fast seed location filtering in DNA read mapping using processing-in-memory technologies

Kraken: ultrafast metagenomic sequence classification using exact alignments

Improved metagenomic analysis with Kraken 2

A current-mode latch sense amplifier and a static power saving input buffer for low-power architecture

Swab-Seq: A high-throughput platform for massively scaled up SARS-CoV-2 testing

Research techniques made simple: polymerase chain reaction (PCR)

Pathogenic viruses: Molecular detection and characterization

An Uncertainty-Aware Transfer Learning-Based Framework for COVID-19 Diagnosis

Covid-19: Government buried negative data on its favoured antibody test

dna2vec: Consistent vector representations of variable-length k-mers

Read Mapping Near Non-Volatile Memory

PacBio sequencing and its applications