* fix(curriculum): tests quotes * fix(curriculum): fill seed-teardown * fix(curriculum): fix tests and remove unneeded seed-teardown
		
			
				
	
	
		
			278 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			278 lines
		
	
	
		
			12 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| ---
 | |
| title: Hash join
 | |
| id: 5956795bc9e2c415eb244de1
 | |
| challengeType: 5
 | |
| ---
 | |
| 
 | |
| ## Description
 | |
| <section id='description'>
 | |
| <p>An <a href="https://en.wikipedia.org/wiki/Join_(SQL)#Inner_join" title="wp: Join_(SQL)#Inner_join">inner join</a> is an operation that combines two data tables into one table, based on matching column values. The simplest way of implementing this operation is the <a href="https://en.wikipedia.org/wiki/Nested loop join" title="wp: Nested loop join">nested loop join</a> algorithm, but a more scalable alternative is the <a href="https://en.wikipedia.org/wiki/hash join" title="wp: hash join">hash join</a> algorithm.</p>
 | |
| <p>Implement the "hash join" algorithm, and demonstrate that it passes the test-case listed below.</p><p>You should represent the tables as data structures that feel natural in your programming language.</p>
 | |
| <p>The "hash join" algorithm consists of two steps:</p>
 | |
| Hash phase: Create a <a href="https://en.wikipedia.org/wiki/Multimap" title="wp: Multimap">multimap</a> from one of the two tables, mapping from each join column value to all the rows that contain it.
 | |
|  The multimap must support hash-based lookup which scales better than a simple linear search, because that's the whole point of this algorithm.
 | |
|  Ideally we should create the multimap for the smaller table, thus minimizing its creation time and memory size.
 | |
| Join phase: Scan the other table, and find matching rows by looking in the multimap created before.
 | |
| <p>In pseudo-code, the algorithm could be expressed as follows:</p>
 | |
| <pre>
 | |
| let A = the first input table (or ideally, the larger one)
 | |
| let B = the second input table (or ideally, the smaller one)
 | |
| let j<sub>A</sub> = the join column ID of table A
 | |
| let j<sub>B</sub> = the join column ID of table B
 | |
| let M<sub>B</sub> = a multimap for mapping from single values to multiple rows of table B (starts out empty)
 | |
| let C = the output table (starts out empty)
 | |
| for each row b in table B:
 | |
|   place b in multimap M<sub>B</sub> under key b(j<sub>B</sub>)
 | |
| for each row a in table A:
 | |
|   for each row b in multimap M<sub>B</sub> under key a(j<sub>A</sub>):
 | |
|     let c = the concatenation of row a and row b
 | |
|     place row c in table C</p>
 | |
| </pre>
 | |
| Test-case
 | |
| <p>Input</p>
 | |
| <table>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;">
 | |
| <table style="border:none; border-collapse:collapse;">
 | |
| <tr>
 | |
| <td style="border:none"> <i>A =</i>
 | |
| </td>
 | |
| <td style="border:none">
 | |
| <table>
 | |
| <tr>
 | |
| <th style="padding: 4px; margin: 5px;"> Age </th>
 | |
| <th style="padding: 4px; margin: 5px;"> Name
 | |
| </th></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 27 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 18 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 28 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Glory
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 18 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Popeye
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 28 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan
 | |
| </td></tr></table>
 | |
| </td>
 | |
| <td style="border:none; padding-left:1.5em;" rowspan="2">
 | |
| </td>
 | |
| <td style="border:none"> <i>B =</i>
 | |
| </td>
 | |
| <td style="border:none">
 | |
| <table>
 | |
| <tr>
 | |
| <th style="padding: 4px; margin: 5px;"> Character </th>
 | |
| <th style="padding: 4px; margin: 5px;"> Nemesis
 | |
| </th></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Whales
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Spiders
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Ghosts
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Zombies
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> Glory </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Buffy
 | |
| </td></tr></table>
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="border:none"> <i>j<sub>A</sub> =</i>
 | |
| </td>
 | |
| <td style="border:none"> <i><code>Name</code> (i.e. column 1)</i>
 | |
| </td>
 | |
| <td style="border:none"> <i>j<sub>B</sub> =</i>
 | |
| </td>
 | |
| <td style="border:none"> <i><code>Character</code> (i.e. column 0)</i>
 | |
| </td></tr></table>
 | |
| </td>
 | |
| <td style="padding: 4px; margin: 5px;">
 | |
| </td></tr></table>
 | |
| <p>Output</p>
 | |
| <table>
 | |
| <tr>
 | |
| <th style="padding: 4px; margin: 5px;"> A.Age </th>
 | |
| <th style="padding: 4px; margin: 5px;"> A.Name </th>
 | |
| <th style="padding: 4px; margin: 5px;"> B.Character </th>
 | |
| <th style="padding: 4px; margin: 5px;"> B.Nemesis
 | |
| </th></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 27 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Whales
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 27 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Jonah </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Spiders
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 18 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Ghosts
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 18 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Zombies
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 28 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Glory </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Glory </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Buffy
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 28 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Ghosts
 | |
| </td></tr>
 | |
| <tr>
 | |
| <td style="padding: 4px; margin: 5px;"> 28 </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Alan </td>
 | |
| <td style="padding: 4px; margin: 5px;"> Zombies
 | |
| </td></tr></table>
 | |
| <p></p><p></p><p>The order of the rows in the output table is not significant.</p>
 | |
| <p>If you're using numerically indexed arrays to represent table rows (rather than referring to columns by name), you could represent the output rows in the form <code style="white-space:nowrap">[[27, "Jonah"], ["Jonah", "Whales"]]</code>.</p><hr>
 | |
| </section>
 | |
| 
 | |
| ## Instructions
 | |
| <section id='instructions'>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| ## Tests
 | |
| <section id='tests'>
 | |
| 
 | |
| ```yml
 | |
| tests:
 | |
|   - text: <code>hashJoin</code> is a function.
 | |
|     testString: assert(typeof hashJoin === 'function', '<code>hashJoin</code> is a function.');
 | |
|   - text: '<code>hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])</code> should return <code>[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]</code>'
 | |
|     testString: 'assert.deepEqual(hashJoin(hash1, hash2), res, ''<code>hashJoin([{ age: 27, name: "Jonah" }, { age: 18, name: "Alan" }, { age: 28, name: "Glory" }, { age: 18, name: "Popeye" }, { age: 28, name: "Alan" }], [{ character: "Jonah", nemesis: "Whales" }, { character: "Jonah", nemesis: "Spiders" }, { character: "Alan", nemesis: "Ghosts" }, { character:"Alan", nemesis: "Zombies" }, { character: "Glory", nemesis: "Buffy" }, { character: "Bob", nemesis: "foo" }])</code> should return <code>[{"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Whales"}, {"A_age": 27,"A_name": "Jonah", "B_character": "Jonah", "B_nemesis": "Spiders"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 18,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}, {"A_age": 28,"A_name": "Glory", "B_character": "Glory", "B_nemesis": "Buffy"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Ghosts"}, {"A_age": 28,"A_name": "Alan", "B_character": "Alan", "B_nemesis": "Zombies"}]</code>'');'
 | |
| 
 | |
| ```
 | |
| 
 | |
| </section>
 | |
| 
 | |
| ## Challenge Seed
 | |
| <section id='challengeSeed'>
 | |
| 
 | |
| <div id='js-seed'>
 | |
| 
 | |
| ```js
 | |
| function hashJoin (hash1, hash2) {
 | |
|   // Good luck!
 | |
|   return [];
 | |
| }
 | |
| ```
 | |
| 
 | |
| </div>
 | |
| 
 | |
| 
 | |
| ### After Test
 | |
| <div id='js-teardown'>
 | |
| 
 | |
| ```js
 | |
| const hash1 = [
 | |
|     { age: 27, name: 'Jonah' },
 | |
|     { age: 18, name: 'Alan' },
 | |
|     { age: 28, name: 'Glory' },
 | |
|     { age: 18, name: 'Popeye' },
 | |
|     { age: 28, name: 'Alan' }
 | |
| ];
 | |
| 
 | |
| const hash2 = [
 | |
|     { character: 'Jonah', nemesis: 'Whales' },
 | |
|     { character: 'Jonah', nemesis: 'Spiders' },
 | |
|     { character: 'Alan', nemesis: 'Ghosts' },
 | |
|     { character: 'Alan', nemesis: 'Zombies' },
 | |
|     { character: 'Glory', nemesis: 'Buffy' },
 | |
|     { character: 'Bob', nemesis: 'foo' }
 | |
| ];
 | |
| 
 | |
| const res = [
 | |
|     { A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Whales' },
 | |
|     { A_age: 27, A_name: 'Jonah', B_character: 'Jonah', B_nemesis: 'Spiders' },
 | |
|     { A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
 | |
|     { A_age: 18, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' },
 | |
|     { A_age: 28, A_name: 'Glory', B_character: 'Glory', B_nemesis: 'Buffy' },
 | |
|     { A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Ghosts' },
 | |
|     { A_age: 28, A_name: 'Alan', B_character: 'Alan', B_nemesis: 'Zombies' }
 | |
| ];
 | |
| 
 | |
| const bench1 = [{ name: 'u2v7v', num: 1 }, { name: 'n53c8', num: 10 }, { name: 'oysce', num: 9 }, { name: '0mto2s', num: 1 }, { name: 'vkh5id', num: 4 }, { name: '5od0cf', num: 8 }, { name: 'uuulue', num: 10 }, { name: '3rgsbi', num: 9 }, { name: 'kccv35r', num: 4 }, { name: '80un74', num: 9 }, { name: 'h4pp3', num: 6 }, { name: '51bit', num: 7 }, { name: 'j9ndf', num: 8 }, { name: 'vf3u1', num: 10 }, { name: 'g0bw0om', num: 10 }, { name: 'j031x', num: 7 }, { name: 'ij3asc', num: 9 }, { name: 'byv83y', num: 8 }, { name: 'bjzp4k', num: 4 }, { name: 'f3kbnm', num: 10 }];
 | |
| const bench2 = [{ friend: 'o8b', num: 8 }, { friend: 'ye', num: 2 }, { friend: '32i', num: 5 }, { friend: 'uz', num: 3 }, { friend: 'a5k', num: 4 }, { friend: 'uad', num: 7 }, { friend: '3w5', num: 10 }, { friend: 'vw', num: 10 }, { friend: 'ah', num: 4 }, { friend: 'qv', num: 7 }, { friend: 'ozv', num: 2 }, { friend: '9ri', num: 10 }, { friend: '7nu', num: 4 }, { friend: 'w3', num: 9 }, { friend: 'tgp', num: 8 }, { friend: 'ibs', num: 1 }, { friend: 'ss7', num: 6 }, { friend: 'g44', num: 9 }, { friend: 'tab', num: 9 }, { friend: 'zem', num: 10 }];
 | |
| ```
 | |
| 
 | |
| </div>
 | |
| 
 | |
| </section>
 | |
| 
 | |
| ## Solution
 | |
| <section id='solution'>
 | |
| 
 | |
| 
 | |
| ```js
 | |
| function hashJoin (hash1, hash2) {
 | |
|   const hJoin = (tblA, tblB, strJoin) => {
 | |
|     const [jA, jB] = strJoin.split('=');
 | |
|     const M = tblB.reduce((a, x) => {
 | |
|       const id = x[jB];
 | |
|       return (
 | |
|         a[id] ? a[id].push(x) : (a[id] = [x]),
 | |
|         a
 | |
|       );
 | |
|     }, {});
 | |
| 
 | |
|     return tblA.reduce((a, x) => {
 | |
|       const match = M[x[jA]];
 | |
|       return match ? (
 | |
|                 a.concat(match.map(row => dictConcat(x, row)))
 | |
|             ) : a;
 | |
|     }, []);
 | |
|   };
 | |
| 
 | |
|   const dictConcat = (dctA, dctB) => {
 | |
|     const ok = Object.keys;
 | |
|     return ok(dctB).reduce(
 | |
|             (a, k) => (a[`B_${k}`] = dctB[k]) && a,
 | |
|             ok(dctA).reduce(
 | |
|                 (a, k) => (a[`A_${k}`] = dctA[k]) && a, {}
 | |
|             )
 | |
|         );
 | |
|   };
 | |
| 
 | |
|   return hJoin(hash1, hash2, 'name=character');
 | |
| }
 | |
| 
 | |
| 
 | |
| ```
 | |
| 
 | |
| </section>
 |