119 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			119 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Data Structure Trie
 | |
| ---
 | |
| ## Introduction to Trie
 | |
| 
 | |
| The word trie is an inflix of the word "re**trie**val", because the trie can find a single word in a dictionary with only a prefix of the word.  
 | |
| Trie is an efficient data retrieval data structure, using trie, search complexities can be brought to an optimal limit, i.e. length of the string.  
 | |
| It is a multi-way tree structure useful for storing strings over an alphabet, when we are storing them.  
 | |
| It has been used to store large dictionaries of English, say, words in spell-checking programs.  
 | |
| However, the penalty on tries is the storage requirement.
 | |
| 
 | |
| ## What is a trie?
 | |
| 
 | |
| A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string.  
 | |
| For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information.  
 | |
| Yes, you obviously can get information using a hash table, but, the <a>hash tables</a> can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table.  
 | |
| A trie typically, looks something like this,
 | |
| 
 | |
| 
 | |
| 
 | |
| This is an image of a Trie, which stores the words {assoc, algo, all, also, tree, trie}.
 | |
| 
 | |
| ## How to implement a trie?
 | |
| 
 | |
| Let's implement a trie in python, for storing words with their meanings from english dictionary.
 | |
| 
 | |
|     ALPHABET_SIZE = 26 # For English
 | |
| 
 | |
|     class TrieNode:
 | |
|     	def __init__(self):
 | |
|     		self.edges = [None]*(ALPHABET_SIZE) # Each index respective to each character.
 | |
|     		self.meaning = None # Meaning of the word.
 | |
|     		self.ends_here = False # Tells us if the word ends here.
 | |
| 
 | |
| As you can see, edges are 26 in length, each index referring to each character in the alphabet. 'A' corresponding to 0, 'B' to 1, 'C' to 2 ... 'Z' to 25th index. If the character you are looking for is pointing to `None`, that implies the word is not there in the trie.
 | |
| 
 | |
| A typical Trie should implement at least these two functions:
 | |
| 
 | |
| *   `add_word(word,meaning)`
 | |
| *   `search_word(word)`
 | |
| *   `delete_word(word)`
 | |
| 
 | |
| Additionally, one can also add something like
 | |
| 
 | |
| *   `get_all_words()`
 | |
| *   `get_all_words_with_prefix(prefix)`
 | |
| 
 | |
| #### Adding Word to the trie
 | |
| 
 | |
|     	def add_word(self,word,meaning):
 | |
|     		if len(word)==0:
 | |
|     			self.ends_here = True # Because we have reached the end of the word
 | |
|     			self.meaning = meaning # Adding the meaning to that node
 | |
|     			return
 | |
|     		ch = word[0] # First character
 | |
|     		# ASCII value of the first character (minus) the ASCII value of 'a'-> the first character of our ALPHABET gives us the index of the edge we have to look up.
 | |
|     		index = ord(ch) - ord('a')
 | |
|     		if self.edges[index] == None:
 | |
|     			# This implies that there's no prefix with this character yet.
 | |
|     			new_node = TrieNode()
 | |
|     			self.edges[index] = new_node
 | |
| 
 | |
|     		self.edges[index].add(word[1:],meaning) #Adding the remaining word
 | |
| 
 | |
| #### Retrieving data
 | |
| 
 | |
|     	def search_word(self,word):
 | |
|     		if len(word)==0:
 | |
|     			if self.ends_here:
 | |
|     				return True
 | |
|     			else:
 | |
|     				return "Word doesn't exist in the Trie"
 | |
|     		ch = word[0]
 | |
|     		index = ord(ch)-ord('a')
 | |
|     		if self.edge[index]== None:
 | |
|     			return False
 | |
|     		else:
 | |
|     			return self.edge[index].search_word(word[1:])
 | |
| 
 | |
| The `search_word` function will tell us if the word exists in the Trie or not. Since ours is a dictionary, we need to fetch the meaning as well, now lets declare a function to do that.
 | |
| 
 | |
|     	def get_meaning(self,word):
 | |
|     		if len(word)==0 :
 | |
|     			if self.ends_here:
 | |
|     				return self.meaning
 | |
|     			else:
 | |
|     				return "Word doesn't exist in the Trie"
 | |
|     		ch = word[0]
 | |
|     		index = ord(ch) - ord('a')
 | |
|     		if self.edges[index] == None:
 | |
|     			return "Word doesn't exist in the Trie"
 | |
|     		else:
 | |
|     			return self.edges[index].get_meaning(word[1:])
 | |
| 
 | |
| #### Deleting data
 | |
| 
 | |
| By deleting data, you just need to change the variable `ends_here` to `False`. Doing that doesn't alter the prefixes, but stills deletes the meaning and the existence of the word from the trie.
 | |
| 
 | |
|     	def delete_word(self,word):
 | |
|     		if len(word)==0:
 | |
|     			if self.ends_here:
 | |
|     				self.ends_here = False
 | |
|     				self.meaning = None
 | |
|     				return "Deleted"
 | |
|     			else:
 | |
|     				return "Word doesn't exist in the Trie"
 | |
|     		ch = word[0]
 | |
|     		index = ord(ch) - ord('a')
 | |
|     		if self.edges[index] == None:
 | |
|     			return "Word doesn't exist in the Trie"
 | |
|     		else:
 | |
|     			return self.edges[index].delete_word(word[1:])
 | |
| 
 | |
|  <a href='https://repl.it/CWbr' target='_blank' rel='nofollow'>Run Code</a>
 | |
| 
 | |
| ## Resources
 | |
| 
 | |
| *   For further reading, you can try this <a href='https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/' target='_blank' rel='nofollow'>topcoder</a> tutorial.
 | |
| *   Also, a tutorial from <a href='http://www.geeksforgeeks.org/trie-insert-and-search/' target='_blank' rel='nofollow'>geeksforgeeks</a> |