id author title date pages extension mime words sentences flesch summary cache txt cord-011565-8ncgldaq Elworth, R A Leo To Petabytes and beyond: recent advances in probabilistic and signal processing algorithms and their application to metagenomics 2020-06-04 .txt text/plain 12960 717 53 For instance, in (1) a comprehensive review was performed covering probabilistic algorithms and data structures such as MinHash (6) and Locality Sensitive Hashing (LSH) (7) , Count-Min Sketch (CMS) (8) , HyperLogLog (9) and Bloom filters (10) . A more in depth discussion of many of these topics can also be found in (3, 4) includes a thorough review of compressed string indexes, LSH via sketches, CMS, Bloom filters, and minimizers (13) , with accompanying applications in genomics for each. With this approach, RAMBO can determine which datasets contain a given k-mer or sequence using far fewer Bloom filter queries, yielding a very fast sublinear-time sequence search algorithm (68) . One of the recent breakthroughs in the area of large-scale biological sequence comparison is in the use of localitysensitive hashing, or specifically MinHash and Minimizers, for efficient average nucleotide identity estimation, clustering, genome assembly, and metagenomic similarity analyses. ./cache/cord-011565-8ncgldaq.txt ./txt/cord-011565-8ncgldaq.txt