Th Topic: search algorithms

Quote: fast constant-space substring search via Boyer-Moore and probabilistic q-gram matching; highly parametric algorithms over reduced alphabets; q-slicing [hakoH_1999]
Quote: survey of online exact string matching; 50 new algorithms categorized by alphabet size and pattern length [faroS2_2013]
Quote: string matching algorithms -- Boyer-Moore, Horspool, Quick-Search, Backward-Oracle-Matching, Backward Nondeterministic Dawg Match [faroS2_2013]
Quote: algorithms for string search and sort; radix quicksort and ternary search trees [bentJL1_1997]
Quote: back-tracking is rarely a problem so KMP search is similar to naive search for most files [fenwP7_2001]
Quote: average case for 2 string matching algorithms, Boyer-Moore-Horspool faster than hardware character match for most pattern lengths [baezRA8_1989]
Quote: Boyer and Moore's string matching algorithm is more efficient than Knuth et al and Karp and Rabin [daviG6_1986]
Quote: speed up the Boyer-Moore and reverse factor algorithms by remembering the last matched segment; for binary alphabets Turbo_RF is fastest [crocM10_1994]
Quote: using a pattern shift table, can search for substrings faster than the Boyer-Moore algorithm [sundDM8_1990]
Quote: fast string searching by variants of Boyer-Moore; toolkit for skip loop, match and shift components [humeA11_1991]
Quote: Boyer-Moore string matching requires roughly 3n comparisons and less than 4n comparisons [coleR10_1994]
Quote: simple code for an efficient version of Boyer-Moore-Horspool string matching algorithm [baezRA8_1989]
Quote: string-matching variant combines Knuth-Morris-Pratt (fewer comparisons) and Boyer-Moore-Horspool (better in worst case and preprocessing) [baezRA8_1989]

Subtopic: regular expression search

Quote: DotStar for regular expressions is 10x to 100x faster than Boost and Regexlib; linear scalability across multiple cores [paseD3_2010]
Quote: the DotStar NFA is as fast as a DFA for regular expressions; avoids state explosion with subexpression counting and status bits; compiles DotStar regex to Aho-Corasick automatons; rewrites other regex into DotStar [paseD3_2010]
Quote: sublinear search algorithm for regular expressions on preprocessed text; uses a Patricia tree for the index [baezRA11_1996]
Quote: identify search records by shortest-substrings of regular expressions; e.g., all blocks [clarCL5_1997]
Quote: the leftmost longest match rule does not work for regular expressions in structured text [clarCL5_1997]
Quote: use the leftmost shortest match rule for regular expressions in structured text without predefined retrieval units [clarCL5_1997]
Quote: skip text runs during regular expression searching by recognizing shorter, reverse prefixes; performance similar to DFA [navaG7_1999]
Quote: first solution for regular expression searching in compressed text; faster than decompression plus search [navaG7_2001]

Subtopic: goal-directed search

Quote: create a general-purpose goal-directed search engine from a refutation-based automatic theorem prover; search anything that can be specified in its declarative input language [joshR6_2002]

Subtopic: overlapping n-gram

Quote: plagiarism detection with overlapping n-gram tokens, BM25 similarity measure, and post-processing with multiple local alignment; BM25 was better than cosine, identity, and PageRank [burrS2_2007]

Subtopic: printable strings from binary data

Quote: efficient extraction of printable English strings from binary files; a stringlish segment is 4 or more characters with a vowel and an English trigram without repeating patterns [aycoJ11_2015]

Subtopic: longest-match search

Quote: compared longest match search for Ziv-Lempel compression; lists for each character, best overall [bellT7_1993]
Quote: a maximal repeating pattern is as long as possible or occurs independently; use position trees for quadratic algorithm [siocAC10_1991]

Subtopic: Hamming distance

Quote: fast and compact index for Hamming distance and approximate dictionary queries; fixed-size keys [gogS7_2016]
Quote: for patterns that fit in a computer word, fast, bit-parallel string matching based on the Shift-Or algorithm; optimal average and worst case running times; variation for Shift-Add algorithm and Hamming distance [fredK11_2005]

Subtopic: bit-parallel search

Quote: for patterns that fit in a computer word, fast, bit-parallel string matching based on the Shift-Or algorithm; optimal average and worst case running times; variation for Shift-Add algorithm and Hamming distance [fredK11_2005]

Subtopic: repeated search

Quote: use diagrams, filters, and preprocessing for fast, frequent searches; much faster with initialization overhead of 5-10 conventional searches [fenwP7_2001]

Subtopic: similarity search

Quote: identify similar procedures in stripped binaries from different compilers; similarity by composition for fragments; Esh has high accuracy, e.g., for vulnerabilities [daviY6_2016]

Subtopic: concurrent search

Quote: asynchronized concurrency (ASCY) for concurrent search that resembles sequential search; fast, portable, and scalable across different hardware platforms; exhaustive evaluation [daviT3_2015]
Quote: 10x to 100x faster string search on multicore Cell; branchless data flow of the Aho-Corasick fgrep algorithm [scarDP4_2008]

Subtopic: fuzzy, interactive search

Quote: interactive, fuzzy full-text search that handles query errors and document errors with word- and prefix-based similarity [bastH5_2013]

Subtopic: search UI

Quote: the right mouse button indicates arguments to a generalized search facility that locates files, file locations, and text [pikeR1_1994]
Quote: leap search is three times faster than mouse; leap to matching text, creep forward if tapped [alzoD_1985]

Subtopic: priority queue

Quote: constant-time insertion and find-min for priority queues; black box trades delete time for insert time [alstS7_2005]
Quote: simple, optimal, external sort using sampling and a priority queue; exactly two reads and writes per record [leuFC2_2000]

Subtopic: multi-dimensional search

Quote: skip quadtrees and octrees for multi-dimensional data; efficient insert, delete, location, approximate range, approximate nearest neighbor; logarithmic-height search and update [eppsD6_2005]

Subtopic: self-adjusting trees

Quote: large splay trees have good cache performance; large trees faster than binary search; top-down splaying is better [leeEK12_2007]
Quote: a retrieval in a splay tree moves the retrieved node to the root, halves the length of the access path, and slightly increases other access paths [tarjRE3_1987]
Quote: conjecture that splay trees are optimal for binary search under amortized costs [tarjRE3_1987]
Quote: a median split tree is a balanced tree with the most frequent key at the root [shelBA11_1978]
Quote: self-adjusting algorithm for searching a linear dictionary list; online, fast successful and unsuccessful search, two extra bits per key [huiLC11_1993]

Subtopic: binary vs. self-adjusting

Quote: self-adjusting binary search trees are slower than AVL or binary trees; by CPU time, binary trees are often best [bellJ4_1993]

Subtopic: xml search

Quote: survey of efficient querying of structural XML data; the twig query join, labeling schemes, storage, indices, cost-based optimization, and selectivity estimates [bacaR9_2017]
Quote: a twig pattern query (TPQ) is a tree of query nodes and edges; query nodes represent XML nodes; e.g., XPath query, twig query joins, and structural joins [bacaR9_2017]

Subtopic: ternary search

Quote: ternary search trees better than binary trees, tries, and hashing; dynamic, fast operations including unsuccessful search and sort [bentJ4_1998]

Subtopic: randomized trees

Quote: randomized binary search trees based on size of subtrees; O(log n); access by rank [martC3_1998]

Subtopic: suffix search

Quote: with suffix trie, search time independent of text length [nelsMR8_1996]
Quote: use compressed suffix arrays and trees to search for short strings in O(1) time with O(n) space [grosR5_2000]
Quote: simple, space efficient suffix trees using balanced parentheses; O(n lg n) space and O(m) search [munrJI5_2001]

Subtopic: skip graph

Quote: skip graph for peer-to-peer networks; logarithmic time and space, no hashing of resource keys, range queries, node failures, permanent crashes [aspnJ11_2007]
Quote: skip graphs store nlogn links to each level in the entire network; reduce cost with a family tree and periodic rebuilds; like SkipNet [aspnJ11_2007]

Subtopic: skip list

Quote: skip list of nodes with k forwarding pointers; levels chosen randomly then never changes; simple updates [pughW6_1990]
Quote: skip list is a randomized data structure for balanced trees; simple and space efficient (1.5 pointers per node) [pughW6_1990]
Quote: skip quadtrees and octrees for multi-dimensional data; efficient insert, delete, location, approximate range, approximate nearest neighbor; logarithmic-height search and update [eppsD6_2005]

Subtopic: Burrows Wheeler Transform

Quote: Burrows Wheeler Transform reversibly reorders data; models data by the string following a character [nelsMR9_1996]
Quote: use Burrows-Wheeler block-sorted data for dynamic searching; compressed, rapid searches; groups similar substrings together [willK12_2003]

Subtopic: benchmarks

Quote: for search, speed and character comparisons may not be related; e.g., diagram search is 386x better by comparisons and 240x to 13x better by speed [fenwP7_2001a]

Group: algorithms
Group: information retrieval
-
Topic: adaptive hash table
Topic: approximate string matching and pattern matching with errors
Topic: b-trees
Topic: collection class
Topic: external search and sort
Topic: graphs
Topic: hash tables and applications
Topic: pattern matching
Topic: pattern specification
Topic: perfect hash table
Topic: searching compressed data
Topic: sort algorithms
Topic: string operations
Topic: suffix trie and suffix array

Subtopics

benchmarks
binary vs. self-adjusting
bit-parallel search
Burrows Wheeler Transform
concurrent search
fuzzy, interactive search
goal-directed search
Hamming distance
longest-match search
multi-dimensional search
overlapping n-gram
printable strings from binary data
priority queue
randomized trees
regular expression search
repeated search
search UI
self-adjusting trees
similarity search
skip graph
skip list
substring search
suffix search
ternary search
xml search

Topic: search algorithms

Related

Subtopics