119 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
		
		
			
		
	
	
			119 lines
		
	
	
		
			5.1 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								title: Data Structure Trie
							 | 
						||
| 
								 | 
							
								---
							 | 
						||
| 
								 | 
							
								## Introduction to Trie
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The word trie is an inflix of the word "re**trie**val", because the trie can find a single word in a dictionary with only a prefix of the word.  
							 | 
						||
| 
								 | 
							
								Trie is an efficient data retrieval data structure, using trie, search complexities can be brought to an optimal limit, i.e. length of the string.  
							 | 
						||
| 
								 | 
							
								It is a multi-way tree structure useful for storing strings over an alphabet, when we are storing them.  
							 | 
						||
| 
								 | 
							
								It has been used to store large dictionaries of English, say, words in spell-checking programs.  
							 | 
						||
| 
								 | 
							
								However, the penalty on tries is the storage requirement.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## What is a trie?
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A trie is a tree like data structure which stores strings, and helps you find the data associated with that string using the prefix of the string.  
							 | 
						||
| 
								 | 
							
								For example, say you plan on building a dictionary to store strings along with their meanings. You must be wondering why can't I simply use a hash table, to get the information.  
							 | 
						||
| 
								 | 
							
								Yes, you obviously can get information using a hash table, but, the <a>hash tables</a> can only find data where the string exactly matches the one we've added. But trie will give us the capability to find strings with common prefixes, a missing character etc in lesser time, in comparison to a hash table.  
							 | 
						||
| 
								 | 
							
								A trie typically, looks something like this,
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								This is an image of a Trie, which stores the words {assoc, algo, all, also, tree, trie}.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## How to implement a trie?
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Let's implement a trie in python, for storing words with their meanings from english dictionary.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    ALPHABET_SIZE = 26 # For English
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    class TrieNode:
							 | 
						||
| 
								 | 
							
								    	def __init__(self):
							 | 
						||
| 
								 | 
							
								    		self.edges = [None]*(ALPHABET_SIZE) # Each index respective to each character.
							 | 
						||
| 
								 | 
							
								    		self.meaning = None # Meaning of the word.
							 | 
						||
| 
								 | 
							
								    		self.ends_here = False # Tells us if the word ends here.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								As you can see, edges are 26 in length, each index referring to each character in the alphabet. 'A' corresponding to 0, 'B' to 1, 'C' to 2 ... 'Z' to 25th index. If the character you are looking for is pointing to `None`, that implies the word is not there in the trie.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								A typical Trie should implement at least these two functions:
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								*   `add_word(word,meaning)`
							 | 
						||
| 
								 | 
							
								*   `search_word(word)`
							 | 
						||
| 
								 | 
							
								*   `delete_word(word)`
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								Additionally, one can also add something like
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								*   `get_all_words()`
							 | 
						||
| 
								 | 
							
								*   `get_all_words_with_prefix(prefix)`
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								#### Adding Word to the trie
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    	def add_word(self,word,meaning):
							 | 
						||
| 
								 | 
							
								    		if len(word)==0:
							 | 
						||
| 
								 | 
							
								    			self.ends_here = True # Because we have reached the end of the word
							 | 
						||
| 
								 | 
							
								    			self.meaning = meaning # Adding the meaning to that node
							 | 
						||
| 
								 | 
							
								    			return
							 | 
						||
| 
								 | 
							
								    		ch = word[0] # First character
							 | 
						||
| 
								 | 
							
								    		# ASCII value of the first character (minus) the ASCII value of 'a'-> the first character of our ALPHABET gives us the index of the edge we have to look up.
							 | 
						||
| 
								 | 
							
								    		index = ord(ch) - ord('a')
							 | 
						||
| 
								 | 
							
								    		if self.edges[index] == None:
							 | 
						||
| 
								 | 
							
								    			# This implies that there's no prefix with this character yet.
							 | 
						||
| 
								 | 
							
								    			new_node = TrieNode()
							 | 
						||
| 
								 | 
							
								    			self.edges[index] = new_node
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    		self.edges[index].add(word[1:],meaning) #Adding the remaining word
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								#### Retrieving data
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    	def search_word(self,word):
							 | 
						||
| 
								 | 
							
								    		if len(word)==0:
							 | 
						||
| 
								 | 
							
								    			if self.ends_here:
							 | 
						||
| 
								 | 
							
								    				return True
							 | 
						||
| 
								 | 
							
								    			else:
							 | 
						||
| 
								 | 
							
								    				return "Word doesn't exist in the Trie"
							 | 
						||
| 
								 | 
							
								    		ch = word[0]
							 | 
						||
| 
								 | 
							
								    		index = ord(ch)-ord('a')
							 | 
						||
| 
								 | 
							
								    		if self.edge[index]== None:
							 | 
						||
| 
								 | 
							
								    			return False
							 | 
						||
| 
								 | 
							
								    		else:
							 | 
						||
| 
								 | 
							
								    			return self.edge[index].search_word(word[1:])
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								The `search_word` function will tell us if the word exists in the Trie or not. Since ours is a dictionary, we need to fetch the meaning as well, now lets declare a function to do that.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    	def get_meaning(self,word):
							 | 
						||
| 
								 | 
							
								    		if len(word)==0 :
							 | 
						||
| 
								 | 
							
								    			if self.ends_here:
							 | 
						||
| 
								 | 
							
								    				return self.meaning
							 | 
						||
| 
								 | 
							
								    			else:
							 | 
						||
| 
								 | 
							
								    				return "Word doesn't exist in the Trie"
							 | 
						||
| 
								 | 
							
								    		ch = word[0]
							 | 
						||
| 
								 | 
							
								    		index = ord(ch) - ord('a')
							 | 
						||
| 
								 | 
							
								    		if self.edges[index] == None:
							 | 
						||
| 
								 | 
							
								    			return "Word doesn't exist in the Trie"
							 | 
						||
| 
								 | 
							
								    		else:
							 | 
						||
| 
								 | 
							
								    			return self.edges[index].get_meaning(word[1:])
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								#### Deleting data
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								By deleting data, you just need to change the variable `ends_here` to `False`. Doing that doesn't alter the prefixes, but stills deletes the meaning and the existence of the word from the trie.
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								    	def delete_word(self,word):
							 | 
						||
| 
								 | 
							
								    		if len(word)==0:
							 | 
						||
| 
								 | 
							
								    			if self.ends_here:
							 | 
						||
| 
								 | 
							
								    				self.ends_here = False
							 | 
						||
| 
								 | 
							
								    				self.meaning = None
							 | 
						||
| 
								 | 
							
								    				return "Deleted"
							 | 
						||
| 
								 | 
							
								    			else:
							 | 
						||
| 
								 | 
							
								    				return "Word doesn't exist in the Trie"
							 | 
						||
| 
								 | 
							
								    		ch = word[0]
							 | 
						||
| 
								 | 
							
								    		index = ord(ch) - ord('a')
							 | 
						||
| 
								 | 
							
								    		if self.edges[index] == None:
							 | 
						||
| 
								 | 
							
								    			return "Word doesn't exist in the Trie"
							 | 
						||
| 
								 | 
							
								    		else:
							 | 
						||
| 
								 | 
							
								    			return self.edges[index].delete_word(word[1:])
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								 <a href='https://repl.it/CWbr' target='_blank' rel='nofollow'>Run Code</a>
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								## Resources
							 | 
						||
| 
								 | 
							
								
							 | 
						||
| 
								 | 
							
								*   For further reading, you can try this <a href='https://www.topcoder.com/community/data-science/data-science-tutorials/using-tries/' target='_blank' rel='nofollow'>topcoder</a> tutorial.
							 | 
						||
| 
								 | 
							
								*   Also, a tutorial from <a href='http://www.geeksforgeeks.org/trie-insert-and-search/' target='_blank' rel='nofollow'>geeksforgeeks</a>
							 |