freeCodeCamp/curriculum/challenges/chinese/10-coding-interview-prep/rosetta-code/tokenize-a-string-with-escaping.md

---
id: 594faaab4e2a8626833e9c3d
title: Tokenize a string with escaping
challengeType: 5
forumTopicId: 302338
dashedName: tokenize-a-string-with-escaping
---

# --description--

Write a function or program that can split a string at each non-escaped occurrence of a separator character.

It should accept three input parameters:

<ul>
  <li>The <strong>string</strong></li>
  <li>The <strong>separator character</strong></li>
  <li>The <strong>escape character</strong></li>
</ul>

It should output a list of strings.

Rules for splitting:

<ul>
  <li>The fields that were separated by the separators, become the elements of the output list.</li>
  <li>Empty fields should be preserved, even at the start and end.</li>
</ul>

Rules for escaping:

<ul>
  <li>"Escaped" means preceded by an occurrence of the escape character that is not already escaped itself.</li>
  <li>When the escape character precedes a character that has no special meaning, it still counts as an escape (but does not do anything special).</li>
  <li>Each occurrences of the escape character that was used to escape something, should not become part of the output.</li>
</ul>

Demonstrate that your function satisfies the following test-case:

Given the string

<pre>one^|uno||three^^^^|four^^^|^cuatro|</pre>

and using `|` as a separator and `^` as escape character, your function should output the following array:

<pre>  ['one|uno', '', 'three^^', 'four^|cuatro', '']
</pre>

# --hints--

`tokenize` should be a function.

```js
assert(typeof tokenize === 'function');
```

`tokenize` should return an array.

```js
assert(typeof tokenize('a', 'b', 'c') === 'object');
```

`tokenize('one^|uno||three^^^^|four^^^|^cuatro|', '|', '^')` should return `['one|uno', '', 'three^^', 'four^|cuatro', '']`

```js
assert.deepEqual(tokenize(testStr1, '|', '^'), res1);
```

`tokenize('a@&bcd&ef&&@@hi', '&', '@')` should return `['a&bcd', 'ef', '', '@hi']`

```js
assert.deepEqual(tokenize(testStr2, '&', '@'), res2);
```

# --seed--

## --after-user-code--

```js
const testStr1 = 'one^|uno||three^^^^|four^^^|^cuatro|';
const res1 = ['one|uno', '', 'three^^', 'four^|cuatro', ''];

// TODO add more tests
const testStr2 = 'a@&bcd&ef&&@@hi';
const res2 = ['a&bcd', 'ef', '', '@hi'];
```

## --seed-contents--

```js
function tokenize(str, sep, esc) {
  return true;
}
```

# --solutions--

```js
// tokenize :: String -> Character -> Character -> [String]
function tokenize(str, charDelim, charEsc) {
  const dctParse = str.split('')
    .reduce((a, x) => {
      const blnEsc = a.esc;
      const blnBreak = !blnEsc && x === charDelim;
      const blnEscChar = !blnEsc && x === charEsc;

      return {
        esc: blnEscChar,
        token: blnBreak ? '' : (
          a.token + (blnEscChar ? '' : x)
        ),
        list: a.list.concat(blnBreak ? a.token : [])
      };
    }, {
      esc: false,
      token: '',
      list: []
    });

  return dctParse.list.concat(
    dctParse.token
  );
}
```
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			`---`
			`id: 594faaab4e2a8626833e9c3d`
chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`title: Tokenize a string with escaping`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			`challengeType: 5`
chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`forumTopicId: 302338`
feat(curriculum): restore seed + solution to Chinese (#40683) * feat(tools): add seed/solution restore script * chore(curriculum): remove empty sections' markers * chore(curriculum): add seed + solution to Chinese * chore: remove old formatter * fix: update getChallenges parse translated challenges separately, without reference to the source * chore(curriculum): add dashedName to English * chore(curriculum): add dashedName to Chinese * refactor: remove unused challenge property 'name' * fix: relax dashedName requirement * fix: stray tag Remove stray `pre` tag from challenge file. Signed-off-by: nhcarrigan <nhcarrigan@gmail.com> Co-authored-by: nhcarrigan <nhcarrigan@gmail.com> 2021-01-13 03:31:00 +01:00			`dashedName: tokenize-a-string-with-escaping`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			`---`

chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`# --description--`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`Write a function or program that can split a string at each non-escaped occurrence of a separator character.`

			`It should accept three input parameters:`

			`<ul>`
			`<li>The <strong>string</strong></li>`
			`<li>The <strong>separator character</strong></li>`
			`<li>The <strong>escape character</strong></li>`
			`</ul>`

			`It should output a list of strings.`

			`Rules for splitting:`

			`<ul>`
			`<li>The fields that were separated by the separators, become the elements of the output list.</li>`
			`<li>Empty fields should be preserved, even at the start and end.</li>`
			`</ul>`

			`Rules for escaping:`

			`<ul>`
			`<li>"Escaped" means preceded by an occurrence of the escape character that is not already escaped itself.</li>`
			`<li>When the escape character precedes a character that has no special meaning, it still counts as an escape (but does not do anything special).</li>`
			`<li>Each occurrences of the escape character that was used to escape something, should not become part of the output.</li>`
			`</ul>`

			`Demonstrate that your function satisfies the following test-case:`

			`Given the string`

			`<pre>one^\|uno\|\|three^^^^\|four^^^\|^cuatro\|</pre>`

			and using `\|` as a separator and `^` as escape character, your function should output the following array:

			`<pre> ['one\|uno', '', 'three^^', 'four^\|cuatro', '']`
			`</pre>`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`# --hints--`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`tokenize` should be a function.
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			```js
			`assert(typeof tokenize === 'function');`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			```

chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`tokenize` should return an array.
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
			```js
chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`assert(typeof tokenize('a', 'b', 'c') === 'object');`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			```

chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`tokenize('one^\|uno\|\|three^^^^\|four^^^\|^cuatro\|', '\|', '^')` should return `['one\|uno', '', 'three^^', 'four^\|cuatro', '']`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
			```js
chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`assert.deepEqual(tokenize(testStr1, '\|', '^'), res1);`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			```

chore(i8n,learn): processed translations 2021-02-06 04:42:36 +00:00			`tokenize('a@&bcd&ef&&@@hi', '&', '@')` should return `['a&bcd', 'ef', '', '@hi']`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00
			```js
chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`assert.deepEqual(tokenize(testStr2, '&', '@'), res2);`
Add languages Russian, Arabic, Chinese, Portuguese (#18305) 2018-10-10 18:03:03 -04:00			```
fix: insert blank line after ``` search and replace ```\n< with ```\n\n< to ensure there's an empty line before closing tags 2020-08-13 17:24:35 +02:00
feat(curriculum): restore seed + solution to Chinese (#40683) * feat(tools): add seed/solution restore script * chore(curriculum): remove empty sections' markers * chore(curriculum): add seed + solution to Chinese * chore: remove old formatter * fix: update getChallenges parse translated challenges separately, without reference to the source * chore(curriculum): add dashedName to English * chore(curriculum): add dashedName to Chinese * refactor: remove unused challenge property 'name' * fix: relax dashedName requirement * fix: stray tag Remove stray `pre` tag from challenge file. Signed-off-by: nhcarrigan <nhcarrigan@gmail.com> Co-authored-by: nhcarrigan <nhcarrigan@gmail.com> 2021-01-13 03:31:00 +01:00			`# --seed--`

			`## --after-user-code--`

			```js
			`const testStr1 = 'one^\|uno\|\|three^^^^\|four^^^\|^cuatro\|';`
			`const res1 = ['one\|uno', '', 'three^^', 'four^\|cuatro', ''];`

			`// TODO add more tests`
			`const testStr2 = 'a@&bcd&ef&&@@hi';`
			`const res2 = ['a&bcd', 'ef', '', '@hi'];`
			```

			`## --seed-contents--`

			```js
			`function tokenize(str, sep, esc) {`
			`return true;`
			`}`
			```

chore(learn): Applied MDX format to Chinese curriculum files (#40462) 2020-12-16 00:37:30 -07:00			`# --solutions--`

feat(curriculum): restore seed + solution to Chinese (#40683) * feat(tools): add seed/solution restore script * chore(curriculum): remove empty sections' markers * chore(curriculum): add seed + solution to Chinese * chore: remove old formatter * fix: update getChallenges parse translated challenges separately, without reference to the source * chore(curriculum): add dashedName to English * chore(curriculum): add dashedName to Chinese * refactor: remove unused challenge property 'name' * fix: relax dashedName requirement * fix: stray tag Remove stray `pre` tag from challenge file. Signed-off-by: nhcarrigan <nhcarrigan@gmail.com> Co-authored-by: nhcarrigan <nhcarrigan@gmail.com> 2021-01-13 03:31:00 +01:00			```js
			`// tokenize :: String -> Character -> Character -> [String]`
			`function tokenize(str, charDelim, charEsc) {`
			`const dctParse = str.split('')`
			`.reduce((a, x) => {`
			`const blnEsc = a.esc;`
			`const blnBreak = !blnEsc && x === charDelim;`
			`const blnEscChar = !blnEsc && x === charEsc;`

			`return {`
			`esc: blnEscChar,`
			`token: blnBreak ? '' : (`
			`a.token + (blnEscChar ? '' : x)`
			`),`
			`list: a.list.concat(blnBreak ? a.token : [])`
			`};`
			`}, {`
			`esc: false,`
			`token: '',`
			`list: []`
			`});`

			`return dctParse.list.concat(`
			`dctParse.token`
			`);`
			`}`
			```