Topic: search algorithms

programming > Group: patterns

information retrieval
adaptive hash table
approximate string matching and pattern matching with errors
collection class
external search and sort
hash tables and applications
pattern matching
pattern specification
perfect hash table
searching compressed data
sort algorithms
string operations
suffix trie and suffix array

topics s-z
topics to process
Subtopic: substring search up

Quote: fast constant-space substring search via Boyer-Moore and probabilistic q-gram matching; highly parametric algorithms over reduced alphabets; q-slicing [hakoH_1999]
Quote: survey of online exact string matching; 50 new algorithms categorized by alphabet size and pattern length [faroS2_2013]
Quote: string matching algorithms -- Boyer-Moore, Horspool, Quick-Search, Backward-Oracle-Matching, Backward Nondeterministic Dawg Match [faroS2_2013]
Quote: algorithms for string search and sort; radix quicksort and ternary search trees [bentJL1_1997]
Quote: back-tracking is rarely a problem so KMP search is similar to naive search for most files [fenwP7_2001]
Quote: average case for 2 string matching algorithms, Boyer-Moore-Horspool faster than hardware character match for most pattern lengths [baezRA8_1989]
Quote: Boyer and Moore's string matching algorithm is more efficient than Knuth et al and Karp and Rabin [daviG6_1986]
Quote: speed up the Boyer-Moore and reverse factor algorithms by remembering the last matched segment; for binary alphabets Turbo_RF is fastest [crocM10_1994]
Quote: using a pattern shift table, can search for substrings faster than the Boyer-Moore algorithm [sundDM8_1990]
Quote: fast string searching by variants of Boyer-Moore; toolkit for skip loop, match and shift components [humeA11_1991]
Quote: Boyer-Moore string matching requires roughly 3n comparisons and less than 4n comparisons [coleR10_1994]
Quote: simple code for an efficient version of Boyer-Moore-Horspool string matching algorithm [baezRA8_1989]
Quote: string-matching variant combines Knuth-Morris-Pratt (fewer comparisons) and Boyer-Moore-Horspool (better in worst case and preprocessing) [baezRA8_1989]

Subtopic: regular expression search up

Quote: DotStar for regular expressions is 10x to 100x faster than Boost and Regexlib; linear scalability across multiple cores [paseD3_2010]
Quote: the DotStar NFA is as fast as a DFA for regular expressions; avoids state explosion with subexpression counting and status bits; compiles DotStar regex to Aho-Corasick automatons; rewrites other regex into DotStar [paseD3_2010]
Quote: sublinear search algorithm for regular expressions on preprocessed text; uses a Patricia tree for the index [baezRA11_1996]
Quote: identify search records by shortest-substrings of regular expressions; e.g., all blocks [clarCL5_1997]
Quote: the leftmost longest match rule does not work for regular expressions in structured text [clarCL5_1997]
Quote: use the leftmost shortest match rule for regular expressions in structured text without predefined retrieval units [clarCL5_1997]
Quote: skip text runs during regular expression searching by recognizing shorter, reverse prefixes; performance similar to DFA [navaG7_1999]
Quote: first solution for regular expression searching in compressed text; faster than decompression plus search [navaG7_2001]

Subtopic: goal-directed search up

Quote: create a general-purpose goal-directed search engine from a refutation-based automatic theorem prover; search anything that can be specified in its declarative input language [joshR6_2002]

Subtopic: overlapping n-gram up

Quote: plagiarism detection with overlapping n-gram tokens, BM25 similarity measure, and post-processing with multiple local alignment; BM25 was better than cosine, identity, and PageRank [burrS2_2007]

Subtopic: printable strings from binary data up

Quote: efficient extraction of printable English strings from binary files; a stringlish segment is 4 or more characters with a vowel and an English trigram without repeating patterns [aycoJ11_2015]

Subtopic: longest-match search up

Quote: compared longest match search for Ziv-Lempel compression; lists for each character, best overall [bellT7_1993]
Quote: a maximal repeating pattern is as long as possible or occurs independently; use position trees for quadratic algorithm [siocAC10_1991]

Subtopic: Hamming distance up

Quote: fast and compact index for Hamming distance and approximate dictionary queries; fixed-size keys [gogS7_2016]
Quote: for patterns that fit in a computer word, fast, bit-parallel string matching based on the Shift-Or algorithm; optimal average and worst case running times; variation for Shift-Add algorithm and Hamming distance [fredK11_2005]

Subtopic: bit-parallel search up

Quote: for patterns that fit in a computer word, fast, bit-parallel string matching based on the Shift-Or algorithm; optimal average and worst case running times; variation for Shift-Add algorithm and Hamming distance [fredK11_2005]

Subtopic: repeated search up

Quote: use diagrams, filters, and preprocessing for fast, frequent searches; much faster with initialization overhead of 5-10 conventional searches [fenwP7_2001]

Subtopic: similarity search up

Quote: identify similar procedures in stripped binaries from different compilers; similarity by composition for fragments; Esh has high accuracy, e.g., for vulnerabilities [daviY6_2016]

Subtopic: concurrent search up

Quote: asynchronized concurrency (ASCY) for concurrent search that resembles sequential search; fast, portable, and scalable across different hardware platforms; exhaustive evaluation [daviT3_2015]
Quote: 10x to 100x faster string search on multicore Cell; branchless data flow of the Aho-Corasick fgrep algorithm [scarDP4_2008]

Subtopic: fuzzy, interactive search up

Quote: interactive, fuzzy full-text search that handles query errors and document errors with word- and prefix-based similarity [bastH5_2013]

Subtopic: search UI up

Quote: the right mouse button indicates arguments to a generalized search facility that locates files, file locations, and text [pikeR1_1994]
Quote: leap search is three times faster than mouse; leap to matching text, creep forward if tapped [alzoD_1985]

Subtopic: priority queue up

Quote: constant-time insertion and find-min for priority queues; black box trades delete time for insert time [alstS7_2005]
Quote: simple, optimal, external sort using sampling and a priority queue; exactly two reads and writes per record [leuFC2_2000]

Subtopic: multi-dimensional search up

Quote: skip quadtrees and octrees for multi-dimensional data; efficient insert, delete, location, approximate range, approximate nearest neighbor; logarithmic-height search and update [eppsD6_2005]

Subtopic: self-adjusting trees up

Quote: large splay trees have good cache performance; large trees faster than binary search; top-down splaying is better [leeEK12_2007]
Quote: a retrieval in a splay tree moves the retrieved node to the root, halves the length of the access path, and slightly increases other access paths [tarjRE3_1987]
Quote: conjecture that splay trees are optimal for binary search under amortized costs [tarjRE3_1987]
Quote: a median split tree is a balanced tree with the most frequent key at the root [shelBA11_1978]
Quote: self-adjusting algorithm for searching a linear dictionary list; online, fast successful and unsuccessful search, two extra bits per key [huiLC11_1993]

Subtopic: binary vs. self-adjusting up

Quote: self-adjusting binary search trees are slower than AVL or binary trees; by CPU time, binary trees are often best [bellJ4_1993]

Subtopic: xml search up

Quote: survey of efficient querying of structural XML data; the twig query join, labeling schemes, storage, indices, cost-based optimization, and selectivity estimates [bacaR9_2017]
Quote: a twig pattern query (TPQ) is a tree of query nodes and edges; query nodes represent XML nodes; e.g., XPath query, twig query joins, and structural joins [bacaR9_2017]

Subtopic: ternary search up

Quote: ternary search trees better than binary trees, tries, and hashing; dynamic, fast operations including unsuccessful search and sort [bentJ4_1998]

Subtopic: randomized trees up

Quote: randomized binary search trees based on size of subtrees; O(log n); access by rank [martC3_1998]

Subtopic: suffix search up

Quote: with suffix trie, search time independent of text length [nelsMR8_1996]
Quote: use compressed suffix arrays and trees to search for short strings in O(1) time with O(n) space [grosR5_2000]
Quote: simple, space efficient suffix trees using balanced parentheses; O(n lg n) space and O(m) search [munrJI5_2001]

Subtopic: skip graph up

Quote: skip graph for peer-to-peer networks; logarithmic time and space, no hashing of resource keys, range queries, node failures, permanent crashes [aspnJ11_2007]
Quote: skip graphs store nlogn links to each level in the entire network; reduce cost with a family tree and periodic rebuilds; like SkipNet [aspnJ11_2007]

Subtopic: skip list up

Quote: skip list of nodes with k forwarding pointers; levels chosen randomly then never changes; simple updates [pughW6_1990]
Quote: skip list is a randomized data structure for balanced trees; simple and space efficient (1.5 pointers per node) [pughW6_1990]
Quote: skip quadtrees and octrees for multi-dimensional data; efficient insert, delete, location, approximate range, approximate nearest neighbor; logarithmic-height search and update [eppsD6_2005]

Subtopic: Burrows Wheeler Transform up

Quote: Burrows Wheeler Transform reversibly reorders data; models data by the string following a character [nelsMR9_1996]
Quote: use Burrows-Wheeler block-sorted data for dynamic searching; compressed, rapid searches; groups similar substrings together [willK12_2003]

Subtopic: benchmarks up

Quote: for search, speed and character comparisons may not be related; e.g., diagram search is 386x better by comparisons and 240x to 13x better by speed [fenwP7_2001a]

Related up

Group: algorithms
Group: information retrieval
Topic: adaptive hash table
Topic: approximate string matching and pattern matching with errors
Topic: b-trees
Topic: collection class
Topic: external search and sort
Topic: graphs
Topic: hash tables and applications
Topic: pattern matching
Topic: pattern specification
Topic: perfect hash table
Topic: searching compressed data
Topic: sort algorithms
Topic: string operations
Topic: suffix trie and suffix array

Subtopics up

binary vs. self-adjusting
bit-parallel search
Burrows Wheeler Transform
concurrent search
fuzzy, interactive search
goal-directed search
Hamming distance
longest-match search
multi-dimensional search
overlapping n-gram
printable strings from binary data
priority queue
randomized trees
regular expression search
repeated search
search UI
self-adjusting trees
similarity search
skip graph
skip list
substring search
suffix search
ternary search
xml search

Updated barberCB 10/05
Copyright © 2002-2023 by C.B. Barber
Thesa, Avev, and thid-... are trademarks of C.B. Barber