Computer Science vs Reality
- Posted by Michał ‘mina86’ Nazarewicz on 23rd of May 2021
- Share on Bluesky
- Cite

Some years ago, during a friendly discussion about C++, a colleague challenged me with a question: what’s the best way to represent a sequence of numbers if delete operation is one that needs to be supported. I argued in favour of a linked list suggesting that with sufficiently large number of elements, it would be much preferred.
In a twist of fate, I’ve been recently discussing an algorithm which reminded my of that conversation. Except this time I was the one arguing against a node-based data structure. Rather than ending things at a conversation, I’ve decided to benchmark a few solutions to make sure which approach is the best.
The problem
The task at hand is simple. Design a data structure which stores a set of words, all of the same length, and offers lookup operation which returns all words matching globs in the form ‘prefix*suffix’. That is, words which start with a given prefix and end with a given suffix. Either part of the pattern may be empty and their concatenation is never longer than length of the words in the collection. Initialisation time and memory footprint are not a concern. Complexity of returning a result can be assumed to be constant.
For example, supposing we were given words ‘foo’, ‘bar’ and ‘baz’, the following table lists results for look-ups of various patterns:
Pattern | Result | Notes |
---|---|---|
‘*’ | ‘foo’, ‘bar’, ‘baz’ | Prefix and suffix can be empty. Sole asterisk matches everything. |
‘b*’ | ‘bar’, ‘baz’ | |
‘b*r’ | ‘bar’ | |
‘f*r’ | ∅ | |
‘bar*’ | ‘bar’ | An asterisk may also match an empty substring if concatenation of prefix and suffix has length equal that of the words in the set. |
‘ba*r’ | ‘bar’ |
Vector solution
The simplest approach is to store all the words in a list and compare each with the pattern. With
An obvious improvement is sorting the list. With the words ordered, a binary search makes it possible to narrow down in logarithmic time the search space to a subset of words which match pattern’s prefix. This changes the complexity to
Another approach is to have two sorted lists, one called forward
containing the given words and another called backward
containing the words in their reversed forms, and a fwd_to_bck
array which would specify position in backward
list corresponding to word in forward
. In other words, for any valid index i, forward[i] = reverse_word(backward[fwd_to_bck[i]])
. Following up with the example of a set with foo
, bar
and baz
words, the three lists would have the following contents:
Index | forward | fwd_to_bck | backward |
---|---|---|---|
0 | bar | 1 | oof |
1 | baz | 2 | rab |
2 | foo | 0 | zab |
With such data structure prepared, lookup algorithm becomes two binary searches followed by a scan through fwd_to_bck
to check if word in forward
list falls within a range of indexes in backward
list identified through the binary search. Complexity of this solution is
Trie
There is another way though. One involving a rarely used data structure, the humble prefix tree. In a trie, which is another name for the data type, edges are labelled by individual characters. In order to check presence of a key, the tree is traversed following connections corresponding to characters of the word. Since there’s a constant upper-limit to number of children a node can have, lookup operation takes time linear in the length of the key,
If all the words are put into a trie, finding all matching a prefix*suffix
pattern becomes a three step process. Firstly, follow edges in the trie matching characters in the prefix. Secondly, fan out the tree traversing
Complexity of the first and last step are simple to derive; they are
Overall, the lookup ends up having
Benchmarks
To check the theoretical analysis I’ve prepared a benchmark to compare aforementioned approaches. Three different vector-based implementations and four trie-based ones were measured.
Putting all of that data on a graph isn’t feasible. The run time depends on four parameters — number of words, length of each word, length of the prefix in the pattern and length of the suffix in the pattern — which doesn’t map well to a two-dimensional canvas. To wrangle the measurements, I’ve first reduced the data set. For each quadruple of parameters I’ve taken the best result of a trie-based implementation and the worst of a vector-based one and divided them. This gives a ratio
That still leaves quite a few dimensions so I’ve further grouped results by word length and prefix length (discarding number of words and suffix length). Figure below shows all the results collected into separate bands, one for each word length the benchmark was run. Within each band separate columns indicate runs with a different prefix length starting from and empty prefix and ending at a pattern which consists of the full word. Vertical axis is logarithmic and is the aforementioned execution time ratio
A few things immediately jump out. First of all, as predicted, the longer the prefix the better tries fare. This is shown by the ratio falling down between columns within a single vertical band. For example, for word length of ten characters, if pattern has one-character prefix trie’s execution time (shown in the middle column of the first band) varies from the same as vector-based algorithm to over ten thousand times slower; but a simple word lookup (situation where prefix length is equal word length which is shown in the last column), prefix tree is over ten times faster at times.
Second of all,, tries’ performance is regrettable compared to much simpler vector-based structures. Starting at word length of a thousand, tries are ten times slower than using a sorted array of the keys. (I haven’t made any measurements for length between those two sizes). Furthermore, the longer the words in the set, the worst prefix tree becomes. But even with short words, depending on type of queries vector-based algorithm can be thousands times faster. There were only a handful of tests in which prefix tree bode better than an array:
n | k | p | s | Worst Vector [µs] | Best Trie [µs] | r |
---|---|---|---|---|---|---|
1 | 10 | 10 | 0 | 0.452 | 0.145 | 0.32 |
10 | 10 | 0 | 0.548 | 0.135 | 0.25 | |
100 | 1 | 1 | 1.779 | 1.747 | 0.98 | |
100 | 10 | 0 | 0.896 | 0.134 | 0.15 | |
1000 | 10 | 0 | 1.194 | 0.134 | 0.11 | |
10000 | 10 | 0 | 1.585 | 0.135 | 0.09 | |
100000 | 10 | 0 | 1.874 | 0.134 | 0.07 | |
1000000 | 10 | 0 | 2.337 | 0.159 | 0.07 | |
10000 | 100 | 10 | 1 | 8.537 | 4.687 | 0.55 |
100000 | 10 | 1 | 132.68 | 4.373 | 0.03 | |
100000 | 10 | 10 | 4.116 | 3.919 | 0.95 |
And let’s not forget that the cards were stacked in favour of the trees. Not only the best trie was pitted against the worst vector solution, but the alphabet consisted of only 26 symbols (specifically lower case letters) which also benefited trie.
Conclusion
The big O notation ‘hides constants’. Any programmer recognises that distinct algorithm with the same complexity can perform vastly differently. It’s also not news that due to asymptotic nature of the big O, algorithms with seemingly better complexity may run slower than ones with worse ones. This is why you’d use insertion sort to sort ten elements rather than using a heap sort or why Coppersmith–Winograd algorithm is never used in practice.
It’s good to sometimes remind ourselves in practice of those facts. Sometimes a naïve, almost brute-force, approach may be faster because it uses tighter loops or has better cache-locality. And cache-locality is something node-based data structures (such as linked lists or trees) don’t have.
In other words, it’s nearly never a linked-list and rarely a tree.