Introduction

There are times when an application may require a very large index of strings for some specific purpose, and you desire the ability to search for strings that begin with given characters.  A simple spell check, or syntax matching routine might want to check to see if each word entered by a user exists in a collection of known words that contains thousands of entries, and support auto-complete functionality.  A research assistant application might want to store every word in a series of books or articles and allow the user to find any given word or series of starting characters in any paragraph (or otherwise defined section) of the content.

While there are certainly existing ways to store a collection of strings and quickly find a given entry in the collection, the difficulty lies in quickly finding a string that begins with a given set of characters.

Collection Classes in .Net

The generic collection classes provided by the .Net Framework are well thought-out and perform quite well for the great majority of a developer’s collection handling needs. The only real issues arise when the contents of a collection get too large and you want to use a method like Contains to see if an element exists in the collection.  The Contains method is an O(n) operation, meaning that it takes proportionally longer to execute the larger the collection becomes.

Many times the solution to the problem is to use a collection backed by a HashSet. So long as all of the items in the collection are unique, an index can be built and organized using the hash value for each element in the collection making the Contains method of a HashSet an O(1) operation (it takes the same amount of time to execute regardless of the number of elements in the collection).

However, the HashSet can have its limitations as well.  Depending on the objects being used it may be possible to generate a hash collision (two objects in the collection produce the same hash value), though this generally won’t happen unless you are using custom types.   There’s also the fact that there is only one hash value per element to search on.

For a collection of Strings, in particular, this could be a problem.  In the case of strings, you may want the ability to search for a full string in the collection, or just the starting characters to find all strings that begin with those characters.  A HashSet would not be suitable for this because there is no way to search for the substrings without enumerating the entire collection.

A More Capable Solution for Strings

There is another solution to fast searches over a large collection of strings, but it comes with a cost.  It is possible to build a collection that offers performance approaching O(1) for any number of strings in a collection; however, to do so we must be willing to sacrifice memory consumption for performance.  If we are willing to consume more memory than the raw contents of the collection requires, we can create a collection of strings backed by a character dictionary of character dictionaries.

A String is, by definition, an Array of Characters.  So the string “Hello World” is also an array containing the characters:

(0) = ‘H’
(1) = ‘e’
(2) = ‘l’
(3) = ‘l’
(4) = ‘o’
(5) = ‘ ‘
(6) = ‘W’
(7) = ‘o’
(8) = ‘r’
(9) = ‘l’
(10) = ‘d’

Instead of using one array to hold the characters of the string, we can reorganize a string such that each letter is contained in a Dictionary(Of Char, Dictionary(Of Char)) creating a tree-like structure of characters.  For example, if we added the words “Hello” and “Help” to the collection, we would have a tree of dictionaries like:

(0) = ‘H’
   (0) = ‘e’
      (0) = ‘l’
         (0) = ‘l’
            (0) = ‘o’
         (1) = ‘p’

So the letter “H” is the first key in the root dictionary.  The value for the key “H” is another Dictionary(Of Char) and its first key is the letter “e”, and so on to the “o” in “Hello”.  For the word “Help”, the “H”, “e”, and “l” are already in the tree, so the first “l” now gets a second child of the letter “p”.

By this logic, we also have the word “Hell” in our collection, but we didn’t add that word… so how do we tell the difference between words explicitly added to the collection and those added accidentally?  We can use Chr(0) as a terminator character to indicate the end of an intended string.  In this way, a letter with a child of Chr(0) will indicate the end of a complete word.  So our previous example then becomes:

(0) = ‘H’
   (0) = ‘e’
      (0) = ‘l’
         (0) = ‘l’
            (0) = ‘o’
               (0) = Chr(0)
         (1) = ‘p’
            (0) = Chr(0)

As you can begin to see, with an extra character on each string and a new dictionary for each new letter in a word, the amount of memory consumed is much greater than is required to merely represent the characters in the string.  But at the same time, because we have organized the characters into a tree structure, the greatest number of iterations required to find any word in the collection is equal to the longest word in the collection; so searches become very fast.  And if we choose to execute a search that does not look for the termination character, we can quickly find all words in the collection that start with a given set of letters.

Implementing the Collection

Although the basic premise of the collection is fairly straightforward, the code implementation can be a bit tricky in that our tree of characters must still be able to expose itself as a collection of complete strings.  To achieve this, we need to create a custom enumerator for our collection that knows how to find each complete string within the tree of characters.

We can begin by creating a “SearchableStringCollection” class which implements ICollection(Of String) and declares a backing store of Dictionary(Of Char, SearchableStringCollection):

Public Class SearchableStringCollection
    Implements ICollection(Of String)
 
    Private _ListCore As New Dictionary(Of Char, SearchableStringCollection)
 
End Class

There are a few simple class members that we can implement without first defining our actual character storage and retrieval mechanisms.  First we can implement the Count property by simply defining a private _Count field to hold the number of full strings in the collection.  We’ll want to track this total manually as items are added to and removed from the collection because our backing store will not have this information readily at hand (we would have to enumerate the entire collection to find and count all of the complete strings, and that would have undesirable performance).

Private _Count As Integer
Public ReadOnly Property Count As Integer Implements System.Collections.Generic.ICollection(Of String).Count
    Get
        Return _Count
    End Get
End Property

The IsReadOnly property is easily implemented as the collection is not Read-Only, so the property can always return False.

Public ReadOnly Property IsReadOnly As Boolean Implements System.Collections.Generic.ICollection(Of String).IsReadOnly
    Get
        Return False
    End Get
End Property

The Clear method is also straightforward and can simply clear the entire backing store, resetting the _Count field to zero.

Public Sub Clear() Implements System.Collections.Generic.ICollection(Of String).Clear
    _ListCore.Clear()
    _Count = 0
End Sub

The CopyTo method is also simple in that it only needs to enumerate each string in the collection and copy it to the specified array.  This method will be an O(n) operation and will take proportionally longer to execute based on the size of the provided array.  However, this is also the only method in the collection which is not high-performance.

Public Sub CopyTo(array() As String, arrayIndex As Integer) Implements System.Collections.Generic.ICollection(Of String).CopyTo
    For Each entry As String In Me
        array(arrayIndex) = entry
        arrayIndex += 1
        If arrayIndex = array.Length Then Exit For
    Next
End Sub

The primary functionality methods Add, Contains, and Remove will require helper methods to assist in their execution, and we’ll look at each more closely later in this article.  We’ll also want to add a “ContainsStartingWith” method to see if a partial match exists in the collection, as well as a “Find” method which can return all strings beginning with a given set of characters.

The only remaining class members then are the implementations of GetEnumerator.  The implementation coming from IEnumerable can be easily coded to just return the strongly-typed implementation coming from IEnumerable (Of String).  We can also change the access level of this member to Protected so that the class only exposes a single GetEnumerator method.

However, the actual implementation of GetEnumerator from IEnumerable(Of String) will need to return an instance of the custom enumerator class which is capable of finding all complete strings within the tree of characters.

Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of String) Implements System.Collections.Generic.IEnumerable(Of String).GetEnumerator
 
End Function
 
Protected Function GetEnumeratorCore() As System.Collections.IEnumerator Implements System.Collections.IEnumerable.GetEnumerator
    Return GetEnumerator()
End Function

Recursive Helper Methods

Let’s take a look at the definitions of the four primary helper methods used to add, remove, and find strings in the collection:

Protected Overridable Function OnAdd(characters() As Char, index As Integer) As Boolean
 
End Function
 
Protected Overridable Function OnContains(characters() As Char, index As Integer) As Boolean
 
End Function
 
Protected Overridable Function OnFind(characters() As Char, index As Integer) As String()
 
End Function
 
Protected Overridable Function OnRemove(characters() As Char, index As Integer) As Boolean
 
End Function

As you can see, each method takes a character array and an index as parameters.  The reason for this design is to facilitate calling each of these methods in a recursive fashion.   A recursive method is simply a function or subroutine which continues to call itself while some condition is met.  Recursion works well with tree structures of data because you need to “walk the branches” of the tree whenever you work with the data.  This means that the primary worker methods only need to know a starting position within the tree at which they should begin their work.  Each method can then call itself, passing the original character array it received along with an updated index indicating the next position at which processing needs to continue.

Adding Strings to the Collection

The OnAdd method will provide the functionality of walking the character tree, looking for existing characters and adding new branches when needed.  The code for this method consists of four conditional checks with one opportunity for recursion:

Protected Overridable Function OnAdd(characters() As Char, index As Integer) As Boolean
    Dim result As Boolean = False
    If characters.Length > 0 Then
        If Not _ListCore.ContainsKey(characters(index)) Then
            _ListCore.Add(characters(index), New SearchableStringCollection)
            If index = characters.Length - 1 Then result = True
        End If
        If index < characters.Length - 1 Then
            result = _ListCore(characters(index)).OnAdd(characters, index + 1)
        End If
    End If
    Return result
End Function

The method begins by declaring a result variable of the returning type (Boolean).  This value will be returned at the end of the method to indicate whether or not a new item was added to the collection (if the provided complete string already exists in the collection, no new string will be added).  Next it checks the trivial case that the provided character array is empty.  Providing that some characters were supplied, the method then begins to look for existing characters or add new branches to the tree.

The method checks to see if the specified character exists in the local character dictionary.  If that character does not yet exist, a new branch is created and the character is added to the collection.  It is important to note that when this method is first called from an instance of SearchableStringCollection, the _ListCore field is pointing to the “root” dictionary of the tree, however, since each branch of the tree contains another SearchableStringCollection, in subsequent recursive calls the _ListCore field will point to the dictionary for the particular branch of the tree that execution has walked to.

Once the current character has been processed, the method repeats itself for each remaining character in the originally provided string.  However, it is again important to note that the target of this second call to OnAdd is the branch stemming from the root collection; it is not a recursive call into the same object instance.  This is how each helper method recursively walks the character tree.

Removing Strings from the Collection

A string can be removed from the collection by simply finding and removing the termination character (Chr0), however, in order to properly clean up the collection we need to recursively move backward through the characters in the string and remove any which do not have another child other than the character we are analyzing.

Protected Overridable Function OnRemove(characters() As Char, index As Integer) As Boolean
    If _ListCore.ContainsKey(characters(index)) Then
        If index < characters.Length - 1 Then
            If _ListCore(characters(index)).OnRemove(characters, index + 1) Then
                If _ListCore(characters(index))._ListCore.Count = 0 Then
                    _ListCore.Remove(characters(index))
                End If
                Return True
            End If
        Else
            _ListCore.Remove(characters(index))
            Return True
        End If
    End If
    Return False
End Function

To achieve this, the method first checks to see if the current character is in the current dictionary.  If not, there is nothing to remove so the method returns false.  If the character is found, then the method checks to see if it is on the last character of the string.  If so, the character can be removed and the removal of the specified word is complete.  If the current character is not the last character, then the method tries to remove the next character in the string.  If this fails, it means there is no match for the specified word so the method returns false.  If the remove succeeds, then the method checks to see if there are any other existing branches for this character, and if not, it then removes the current character since there are no more words using the character.  If there are other characters in the list, then the method returns true indicating that the specified word was removed.

This can be a little hard to follow, but the important thing to keep in mind is that each time the method calls itself the instance of _ListCore becomes the SearchableStringCollection on the next branch of the dictionary.  If you are having trouble following the logic, try using the previous example of the words "Hello World" and "Help" and write down the steps that the method would take to remove the word "Help".  In the first step, the OnRemove method would be passed the characters() "H, e, l, p" and an index of 0.  So the first line of code would return true because the root _ListCore contains the letter "H".  The index would not yet be on the last character so the method would call OnRemove on the SearchableStringCollection for the key "H" in the current _ListCore.  So step 2 would then begin with the same characters but an index of 1, and _ListCore would now be the SearchableStringCollection for the "H" branch of the tree.  If you continue writing down the logic in this manner, the execution of the method may become clearer.

Finding Strings in the Collection

Searching for a single matching string within the collection is a fairly simple process.  We only need to check to see if each character in the string exists at the corresponding branch of the character tree.  This makes the OnContains method short and simple:

Protected Overridable Function OnContains(characters() As Char, index As Integer) As Boolean
    If _ListCore.ContainsKey(characters(index)) Then
        If index < characters.Length - 1 Then
            Return _ListCore(characters(index)).OnContains(characters, index + 1)
        End If
        Return True
    End If
    Return False
End Function

The method only needs to check the current instance of _ListCore for the current character specified by index.  If the current index is not the last character, then the process repeats for the next branch of the tree.  Matching an exact word, or any starting characters, is simply a matter of including a terminating Chr(0) in the characters() array to match an exact word or excluding to match starting characters.

But if we want to find and return all strings that start with a given set of characters, we are going to need the custom enumerator mentioned earlier in this article.  We'll take a look at the OnFind method and then get to the implementation of the custom enumerator.

Protected Overridable Function OnFind(characters() As Char, index As Integer) As String()
    Dim result As New List(Of String)
    If _ListCore.ContainsKey(characters(index)) Then
        If index < characters.Length - 1 Then
            result.AddRange(_ListCore(characters(index)).OnFind(characters, index + 1))
        Else
            Using enm As New SearchableStringCollectionEnumerator(_ListCore(characters(index))._ListCore.GetEnumerator)
                While enm.MoveNext
                    result.Add(String.Concat(New String(characters), enm.Current))
                End While
            End Using
        End If
    End If
    Return result.ToArray
End Function

The method begins by creating a temporary List(Of String) to store each of the matches found while searching the collection.  The method then checks to see if the current character is in the current dictionary and if so the determines whether to continue looking for starting characters (index is not yet at the last character), or proceeds to enumerate all of the existing strings from the current branch of the character tree.  Once the last of the specified characters is found in the tree, the enumerator takes over doing the work of finding all of the complete strings that begin with the specified characters.

Implementing the SearchableStringCollectionEnumerator

To create a custom string enumerator we only need to define a class which inherits from IEnumerator(Of String).  Our implementation will require a special constructor so that we can create an enumerator at any point in the character tree as well as a list of sub-enumerators used for walking the branches of the tree and a StringBuilder for constructing the current value of an instance of the enumerator.

Public Class SearchableStringCollectionEnumerator
    Implements IEnumerator(Of String)
 
    Private _BaseEnumerator As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection))
    Private _Enumerators As New List(Of IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)))
    Private _Current As New System.Text.StringBuilder
 
    Protected Friend Sub New(baseEnumerator As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)))
        _BaseEnumerator = baseEnumerator
        _Enumerators.Add(_BaseEnumerator)
    End Sub
End Class

Whenever an instance of SearchableStringCollectionEnumerator is created it must be supplied with an instance of IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)).  This is the type returned by a call to GetEnumerator on an instance of our _ListCore dictionary.  When an enumerator is created for the entire SearchableStringCollection, the constructor is passed the enumerator provided by the root dictionary.  When using a method like OnFind, the enumerator is created using the enumerator provided by the SearchableStringCollection belonging to the last character in the specified set of characters to find.

The baseEnumerator variable always points to the first enumerator used to create the class instance.  This enumerator is added to the list of working enumerators being used to walk the character tree.

Implementing IEnumerator means that we must provide three class members: a Current property, and the methods MoveNext and Reset.  The Current property will simply return the characters contained in the StringBuilder.  Since we are implementing a generic interface we must also have an implementation for the base IEnumerator interface, though we can hide this implementation and return the generic one's value just as we did for GetEnumerator in the SearchableStringCollection class.

Public ReadOnly Property Current As String Implements System.Collections.Generic.IEnumerator(Of String).Current
    Get
        Return _Current.ToString
    End Get
End Property
 
Protected ReadOnly Property CurrentCore As Object Implements System.Collections.IEnumerator.Current
    Get
        Return Current
    End Get
End Property

The Reset method is fairly simple as well.  Its only requirement is to return the enumerator to the state it was in when it was created.  This can be accomplished by clearing the StringBuilder and working enumerator list and then adding the base enumerator back into the working list.  We'll clean up any of the working enumerators that were created by calls to MoveNext and reset the base enumerator back to its initial state.

Public Sub Reset() Implements System.Collections.IEnumerator.Reset
    _Current.Clear()
    _Enumerators.Remove(_BaseEnumerator)
    _BaseEnumerator.Reset()
    For Each enm As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)) In _Enumerators
        enm.Dispose()
    Next
    _Enumerators.Clear()
    _Enumerators.Add(_BaseEnumerator)
End Sub

The MoveNext method is quite short, but relies on a "BuildCurrent" helper method to do all of the real work.

Public Function MoveNext() As Boolean Implements System.Collections.IEnumerator.MoveNext
    Dim result As Boolean = BuildCurrent()
    If _Current.Length > 0 Then _Current.Length -= 1
    Return result
End Function

BuildCurrent will return a value indicating whether or not there are any more strings in the enumerator.  Since all strings explicitly added to the collection end with a Chr(0), the MoveNext method strips the last character from the current string before returning its result.  The BuildCurrent helper method cannot strip this character because it is recursive and relies on the contents of the StringBuilder during its recursive execution.

Private Function BuildCurrent() As Boolean
    Dim result As Boolean = False
    If _Enumerators.Count > 0 Then
        If _Enumerators(_Enumerators.Count - 1).MoveNext() Then
            _Current.Append(_Enumerators(_Enumerators.Count - 1).Current.Key)
            If Not _Current(_Current.Length - 1) = Chr(0) Then
                _Enumerators.Add(_Enumerators(_Enumerators.Count - 1).Current.Value._ListCore.GetEnumerator)
                result = BuildCurrent()
            Else
                result = True
            End If
        Else
            _Enumerators.RemoveAt(_Enumerators.Count - 1)
            If _Current.Length > 0 Then _Current.Length -= 1
            result = BuildCurrent()
        End If
    End If
    Return result
End Function

The method begins by creating a result variable initialized to false (meaning there are no more strings in the enumerator).  If there are still enumerators in the working collection, then the method calls MoveNext on the last one.  Keep in mind that these are enumerators from the various instance of _ListCore from each branch of the character tree.  If MoveNext returns true, then that character is appended to the current string.  If that character was not the termination character (Chr0) then the method gets an enumerator from the SearchableStringCollection associated with that character and continues building the string.  If MoveNext returns false, that enumerator is removed from the working collection, the last character (if any) is trimmed from the current string, and BuildCurrent is called again with the previous working enumerator.  The previous enumerator will be looking over words that begin with one less character than the current enumerator was looking at, and this is why we must remove the last character from the current string.

Finally we must implement IDisposable so that our enumerator can be cleaned up when it is used implicitly such as by a For Each statement.

Private disposedValue As Boolean
Protected Overridable Sub Dispose(disposing As Boolean)
    If Not Me.disposedValue Then
        If disposing Then
            For Each enm As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)) In _Enumerators
                enm.Dispose()
            Next
            _Enumerators.Clear()
            _Current.Clear()
            _BaseEnumerator.Dispose()
            _BaseEnumerator = Nothing
        End If
    End If
    Me.disposedValue = True
End Sub
Public Sub Dispose() Implements IDisposable.Dispose
    Dispose(True)
    GC.SuppressFinalize(Me)
End Sub

This method simply disposes any remaining enumerators and clears the working collection and current string.

Tying it All Together

With the custom enumerator in place, we can now implement the primary public worker methods of the SearchableStringCollection class.  And since the custom enumerator does all of the heavy lifting, these method implementations are short and easy.

The Add method will only need to call the OnAdd helper method can then increment the _Count field if the method succeeds:

Public Sub Add(item As String) Implements System.Collections.Generic.ICollection(Of String).Add
    If OnAdd((item & Chr(0)).ToArray, 0) Then
        _Count += 1
    End If
End Sub

Similarly, the Remove method needs only call the OnRemove helper method and decrement the _Count field if the call succeeds:

Public Function Remove(item As String) As Boolean Implements System.Collections.Generic.ICollection(Of String).Remove
    Dim result As Boolean = OnRemove((item & Chr(0)).ToCharArray, 0)
    If result Then _Count -= 1
    Return result
End Function

For the GetEnumerator implementation we can return a new instance of SearchableStringCollectionEnumerator using the root dictionary's enumerator:

Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of String) Implements System.Collections.Generic.IEnumerable(Of String).GetEnumerator
    Return New SearchableStringCollectionEnumerator(Me._ListCore.GetEnumerator)
End Function

Our Contains and ContainsStartingWith methods will pass the characters of the specified word to the OnContains method, with Contains adding a termination character (Chr0) to the specified string:

Public Function Contains(item As String) As Boolean Implements System.Collections.Generic.ICollection(Of String).Contains
    Return OnContains((item & Chr(0)).ToCharArray, 0)
End Function
 
Public Function ContainsStartingWith(startsWith As String) As Boolean
    Return OnContains(startsWith.ToCharArray, 0)
End Function

And finally the Find method need only return the result of a call to OnFind:

Public Function Find(startsWith As String) As String()
    Return OnFind(startsWith.ToCharArray, 0)
End Function

Summary

It is possible to build a high-performance string collection capable of storing a very large number of strings while still providing fast search capability for whole and partial string matches.  But doing so involves a trade-off of performance for memory consumption.  Most any PC-based application will be able to handle this trade-off; however, mobile applications may not be able to do something like this.

It is also possible to expand this concept into a generic dictionary of string keys.  Instead of Implementing ICollection(Of String), you could implement IDictionary(Of String, T) to create a generic SearchableStringDictionary(Of T) class.  You would add a field to the class to hold the associated value and then implement all of the dictionary functionality similar to the collection functionality outlined above.

Although it doesn't take a lot of code to implement a SearchableStringCollection class, the logic of the code can be a bit daunting to follow.  I highly recommend that you copy the sample from Appendix A into Visual Studio and then write some code to fill the collection with strings and exercise its basic methods.  Step through the code and follow the execution of calls to OnAdd, OnRemove, OnContains, and OnFind.  That may be the best way to come to understand what the code is actually doing.

Appendix A: Complete Code Sample

Option Strict On
 
''' <summary>
''' Represents a collection of strings similar in performance to a HashSet(Of String) that offers partial string searches.
''' </summary>
''' <remarks></remarks>
Public Class SearchableStringCollection
    Implements ICollection(Of String)
 
    Private _ListCore As New Dictionary(Of Char, SearchableStringCollection)
 
    ''' <summary>
    ''' Adds the specified string to the collection if it does not already exist.
    ''' </summary>
    ''' <param name="item">The string to add to the collection.</param>
    ''' <remarks></remarks>
    Public Sub Add(item As String) Implements System.Collections.Generic.ICollection(Of String).Add
        If OnAdd((item & Chr(0)).ToArray, 0) Then
            _Count += 1
        End If
    End Sub
 
    ''' <summary>
    ''' Clears all strings from the collection.
    ''' </summary>
    ''' <remarks></remarks>
    Public Sub Clear() Implements System.Collections.Generic.ICollection(Of String).Clear
        _ListCore.Clear()
        _Count = 0
    End Sub
 
    ''' <summary>
    ''' Gets a value indicating whether the collection contains the specified string.
    ''' </summary>
    ''' <param name="item">The exact string to find.</param>
    ''' <returns>True if the collection contains the specified string, otherwise false.</returns>
    ''' <remarks></remarks>
    Public Function Contains(item As String) As Boolean Implements System.Collections.Generic.ICollection(Of String).Contains
        Return OnContains((item & Chr(0)).ToCharArray, 0)
    End Function
 
    ''' <summary>
    ''' Gets a value indicating whether the collection contains any strings that begin with the specified characters.
    ''' </summary>
    ''' <param name="startsWith">The starting characters to search for.</param>
    ''' <returns>True if the collection contains one or more strings beginning with the specified characters, otherwise false.</returns>
    ''' <remarks></remarks>
    Public Function ContainsStartingWith(startsWith As String) As Boolean
        Return OnContains(startsWith.ToCharArray, 0)
    End Function
 
    ''' <summary>
    ''' Copies the contents of the collection to the specified array.
    ''' </summary>
    ''' <param name="array">The array of strings to copy the collection into.</param>
    ''' <param name="arrayIndex">The location in the array at which copying begins.</param>
    ''' <remarks></remarks>
    Public Sub CopyTo(array() As String, arrayIndex As Integer) Implements System.Collections.Generic.ICollection(Of String).CopyTo
        For Each entry As String In Me
            array(arrayIndex) = entry
            arrayIndex += 1
            If arrayIndex = array.Length Then Exit For
        Next
    End Sub
 
    Private _Count As Integer
    ''' <summary>
    ''' Gets the number of complete strings contained in the collection.
    ''' </summary>
    ''' <value></value>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public ReadOnly Property Count As Integer Implements System.Collections.Generic.ICollection(Of String).Count
        Get
            Return _Count
        End Get
    End Property
 
    ''' <summary>
    ''' Gets an array of strings that begin with the specified characters.
    ''' </summary>
    ''' <param name="startsWith">The starting characters to find.</param>
    ''' <returns>An array of strings beginning with the specified characters.</returns>
    ''' <remarks></remarks>
    Public Function Find(startsWith As String) As String()
        Return OnFind(startsWith.ToCharArray, 0)
    End Function
 
    ''' <summary>
    ''' Gets an enumerator that iterates over the strings in the collection.
    ''' </summary>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public Function GetEnumerator() As System.Collections.Generic.IEnumerator(Of String) Implements System.Collections.Generic.IEnumerable(Of String).GetEnumerator
        Return New SearchableStringCollectionEnumerator(Me._ListCore.GetEnumerator)
    End Function
 
    Protected Function GetEnumeratorCore() As System.Collections.IEnumerator Implements System.Collections.IEnumerable.GetEnumerator
        Return GetEnumerator()
    End Function
 
    ''' <summary>
    ''' Returns false indicating that the collection is not read-only.
    ''' </summary>
    ''' <value></value>
    ''' <returns></returns>
    ''' <remarks></remarks>
    Public ReadOnly Property IsReadOnly As Boolean Implements System.Collections.Generic.ICollection(Of String).IsReadOnly
        Get
            Return False
        End Get
    End Property
 
    ''' <summary>
    ''' Removes the specified string from the collection.
    ''' </summary>
    ''' <param name="item">The string to remove from the collection.</param>
    ''' <returns>True if the string is found and removed, otherwise false.</returns>
    ''' <remarks></remarks>
    Public Function Remove(item As String) As Boolean Implements System.Collections.Generic.ICollection(Of String).Remove
        Dim result As Boolean = OnRemove((item & Chr(0)).ToCharArray, 0)
        If result Then _Count -= 1
        Return result
    End Function
 
    Protected Overridable Function OnAdd(characters() As Char, index As Integer) As Boolean
        Dim result As Boolean = False
        If characters.Length > 0 Then
            If Not _ListCore.ContainsKey(characters(index)) Then
                _ListCore.Add(characters(index), New SearchableStringCollection)
                If index = characters.Length - 1 Then result = True
            End If
            If index < characters.Length - 1 Then
                result = _ListCore(characters(index)).OnAdd(characters, index + 1)
            End If
        End If
        Return result
    End Function
 
    Protected Overridable Function OnContains(characters() As Char, index As Integer) As Boolean
        If _ListCore.ContainsKey(characters(index)) Then
            If index < characters.Length - 1 Then
                Return _ListCore(characters(index)).OnContains(characters, index + 1)
            End If
            Return True
        End If
        Return False
    End Function
 
    Protected Overridable Function OnFind(characters() As Char, index As Integer) As String()
        Dim result As New List(Of String)
        If _ListCore.ContainsKey(characters(index)) Then
            If index < characters.Length - 1 Then
                result.AddRange(_ListCore(characters(index)).OnFind(characters, index + 1))
            Else
                Using enm As New SearchableStringCollectionEnumerator(_ListCore(characters(index))._ListCore.GetEnumerator)
                    While enm.MoveNext
                        result.Add(String.Concat(New String(characters), enm.Current))
                    End While
                End Using
            End If
        End If
        Return result.ToArray
    End Function
 
    Protected Overridable Function OnRemove(characters() As Char, index As Integer) As Boolean
        If _ListCore.ContainsKey(characters(index)) Then
            If index < characters.Length - 1 Then
                If _ListCore(characters(index)).OnRemove(characters, index + 1) Then
                    If _ListCore(characters(index))._ListCore.Count = 0 Then
                        _ListCore.Remove(characters(index))
                    End If
                    Return True
                End If
            Else
                _ListCore.Remove(characters(index))
                Return True
            End If
        End If
        Return False
    End Function
 
    ''' <summary>
    ''' Enumerates the elements of a SearchableStringCollection.
    ''' </summary>
    ''' <remarks></remarks>
    Public Class SearchableStringCollectionEnumerator
        Implements IEnumerator(Of String)
 
        Private _Enumerators As New List(Of IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)))
        Private _Current As New System.Text.StringBuilder
        Private _BaseEnumerator As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection))
 
        Protected Friend Sub New(baseEnumerator As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)))
            _BaseEnumerator = baseEnumerator
            _Enumerators.Add(_BaseEnumerator)
        End Sub
 
        ''' <summary>
        ''' Gets the element in the collection at the current position of the enumerator.
        ''' </summary>
        ''' <value></value>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public ReadOnly Property Current As String Implements System.Collections.Generic.IEnumerator(Of String).Current
            Get
                Return _Current.ToString
            End Get
        End Property
 
        Protected ReadOnly Property CurrentCore As Object Implements System.Collections.IEnumerator.Current
            Get
                Return Current
            End Get
        End Property
 
        ''' <summary>
        ''' Advances the enumerator to the next element in the collection.
        ''' </summary>
        ''' <returns></returns>
        ''' <remarks></remarks>
        Public Function MoveNext() As Boolean Implements System.Collections.IEnumerator.MoveNext
            Dim result As Boolean = BuildCurrent()
            If _Current.Length > 0 Then _Current.Length -= 1
            Return result
        End Function
 
        ''' <summary>
        ''' Sets the enumerator to its initial position, which is before the first element in the collection.
        ''' </summary>
        ''' <remarks></remarks>
        Public Sub Reset() Implements System.Collections.IEnumerator.Reset
            _Current.Clear()
            _Enumerators.Remove(_BaseEnumerator)
            _BaseEnumerator.Reset()
            For Each enm As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)) In _Enumerators
                enm.Dispose()
            Next
            _Enumerators.Clear()
            _Enumerators.Add(_BaseEnumerator)
        End Sub
 
        Private Function BuildCurrent() As Boolean
            Dim result As Boolean = False
            If _Enumerators.Count > 0 Then
                If _Enumerators(_Enumerators.Count - 1).MoveNext() Then
                    _Current.Append(_Enumerators(_Enumerators.Count - 1).Current.Key)
                    If Not _Current(_Current.Length - 1) = Chr(0) Then
                        _Enumerators.Add(_Enumerators(_Enumerators.Count - 1).Current.Value._ListCore.GetEnumerator)
                        result = BuildCurrent()
                    Else
                        result = True
                    End If
                Else
                    _Enumerators.RemoveAt(_Enumerators.Count - 1)
                    If _Current.Length > 0 Then _Current.Length -= 1
                    result = BuildCurrent()
                End If
            End If
            Return result
        End Function
 
#Region "IDisposable Support"
        Private disposedValue As Boolean
        Protected Overridable Sub Dispose(disposing As Boolean)
            If Not Me.disposedValue Then
                If disposing Then
                    For Each enm As IEnumerator(Of KeyValuePair(Of Char, SearchableStringCollection)) In _Enumerators
                        enm.Dispose()
                    Next
                    _Enumerators.Clear()
                    _Current.Clear()
                    _BaseEnumerator.Dispose()
                    _BaseEnumerator = Nothing
                End If
            End If
            Me.disposedValue = True
        End Sub
        Public Sub Dispose() Implements IDisposable.Dispose
            Dispose(True)
            GC.SuppressFinalize(Me)
        End Sub
#End Region
 
    End Class
End Class

Appendix B: Performance Results from a Practical Application

As a research utility for my own personal use, I wrote a program that I call "Cross Bible Viewer" which uses the string-keyed dictionary version of the class described in this article.  The program parses seven translations of the Christian Bible, indexing every word by verse.  The program also offers language dictionary look-ups for any word using a number of English dictionaries.  But the relevant part for this article is the performance of the SearchableStringDictionary.

All of the words from seven translations of The Bible (24,946 words), indexing a collection of verses (the value is an instance of VerseLocation containing Translation, Book, Chapter, and Verse information), consume approximately 1.2GB of Windows memory (compared to 34.6MB of source XML data for all seven translations).  While this sounds like a lot (and it is), it is still well within the working limits of modern computers.  If I search for the word "love", it takes longer to display the results in a ListBox then it does to gather them from the collection.  Gathering 2,119 results and loading them into a ListBox (after BeginUpdate and before EndUpdate) takes just 100 milliseconds.  Checking to see if the dictionary contains the word "love" takes 0.087 ms of that total 100.  If I search for all words beginning with the letters "the", it takes 6.97 ms to determine that there are 316,599 verses (across all seven translations) with matching words (it takes a minute and a half to actually display all of those results in the program).

So in the right setting, this kind of performance is well worth the cost of additional RAM.

Acknowledgments


 I've experimented with many kinds of tree collections in the past; sometimes the experiments work and sometimes they don't.  I had thought of trying to create something like this several years ago, but wasn't sure it would turn out to be one of the successful experiments and didn't try to build it.  It wasn't until I had a conversation with another community contributor, CrazyPennie, in which he showed a simple example of what he called a "binary dictionary", that it was confirmed that splitting a string into a tree of characters was a viable thing to do.

So special thanks go to CrazyPennie for showing me that it would be worth the effort of developing a full .Net implementation of a character-tree backed string collection.