summaryrefslogtreecommitdiff
path: root/node_modules/js-tokens/README.md
diff options
context:
space:
mode:
authorAlexander Neonxp Kiryukhin <i@neonxp.ru>2024-08-18 13:29:54 +0300
committerAlexander Neonxp Kiryukhin <i@neonxp.ru>2024-08-18 13:29:54 +0300
commitfd70f95224374d23157ee7c0357733102cd0df53 (patch)
treee490c12e021cedaf211b292d5d623baa32a673fc /node_modules/js-tokens/README.md
initialHEADmaster
Diffstat (limited to 'node_modules/js-tokens/README.md')
-rw-r--r--node_modules/js-tokens/README.md240
1 files changed, 240 insertions, 0 deletions
diff --git a/node_modules/js-tokens/README.md b/node_modules/js-tokens/README.md
new file mode 100644
index 0000000..00cdf16
--- /dev/null
+++ b/node_modules/js-tokens/README.md
@@ -0,0 +1,240 @@
+Overview [![Build Status](https://travis-ci.org/lydell/js-tokens.svg?branch=master)](https://travis-ci.org/lydell/js-tokens)
+========
+
+A regex that tokenizes JavaScript.
+
+```js
+var jsTokens = require("js-tokens").default
+
+var jsString = "var foo=opts.foo;\n..."
+
+jsString.match(jsTokens)
+// ["var", " ", "foo", "=", "opts", ".", "foo", ";", "\n", ...]
+```
+
+
+Installation
+============
+
+`npm install js-tokens`
+
+```js
+import jsTokens from "js-tokens"
+// or:
+var jsTokens = require("js-tokens").default
+```
+
+
+Usage
+=====
+
+### `jsTokens` ###
+
+A regex with the `g` flag that matches JavaScript tokens.
+
+The regex _always_ matches, even invalid JavaScript and the empty string.
+
+The next match is always directly after the previous.
+
+### `var token = matchToToken(match)` ###
+
+```js
+import {matchToToken} from "js-tokens"
+// or:
+var matchToToken = require("js-tokens").matchToToken
+```
+
+Takes a `match` returned by `jsTokens.exec(string)`, and returns a `{type:
+String, value: String}` object. The following types are available:
+
+- string
+- comment
+- regex
+- number
+- name
+- punctuator
+- whitespace
+- invalid
+
+Multi-line comments and strings also have a `closed` property indicating if the
+token was closed or not (see below).
+
+Comments and strings both come in several flavors. To distinguish them, check if
+the token starts with `//`, `/*`, `'`, `"` or `` ` ``.
+
+Names are ECMAScript IdentifierNames, that is, including both identifiers and
+keywords. You may use [is-keyword-js] to tell them apart.
+
+Whitespace includes both line terminators and other whitespace.
+
+[is-keyword-js]: https://github.com/crissdev/is-keyword-js
+
+
+ECMAScript support
+==================
+
+The intention is to always support the latest ECMAScript version whose feature
+set has been finalized.
+
+If adding support for a newer version requires changes, a new version with a
+major verion bump will be released.
+
+Currently, ECMAScript 2018 is supported.
+
+
+Invalid code handling
+=====================
+
+Unterminated strings are still matched as strings. JavaScript strings cannot
+contain (unescaped) newlines, so unterminated strings simply end at the end of
+the line. Unterminated template strings can contain unescaped newlines, though,
+so they go on to the end of input.
+
+Unterminated multi-line comments are also still matched as comments. They
+simply go on to the end of the input.
+
+Unterminated regex literals are likely matched as division and whatever is
+inside the regex.
+
+Invalid ASCII characters have their own capturing group.
+
+Invalid non-ASCII characters are treated as names, to simplify the matching of
+names (except unicode spaces which are treated as whitespace). Note: See also
+the [ES2018](#es2018) section.
+
+Regex literals may contain invalid regex syntax. They are still matched as
+regex literals. They may also contain repeated regex flags, to keep the regex
+simple.
+
+Strings may contain invalid escape sequences.
+
+
+Limitations
+===========
+
+Tokenizing JavaScript using regexes—in fact, _one single regex_—won’t be
+perfect. But that’s not the point either.
+
+You may compare jsTokens with [esprima] by using `esprima-compare.js`.
+See `npm run esprima-compare`!
+
+[esprima]: http://esprima.org/
+
+### Template string interpolation ###
+
+Template strings are matched as single tokens, from the starting `` ` `` to the
+ending `` ` ``, including interpolations (whose tokens are not matched
+individually).
+
+Matching template string interpolations requires recursive balancing of `{` and
+`}`—something that JavaScript regexes cannot do. Only one level of nesting is
+supported.
+
+### Division and regex literals collision ###
+
+Consider this example:
+
+```js
+var g = 9.82
+var number = bar / 2/g
+
+var regex = / 2/g
+```
+
+A human can easily understand that in the `number` line we’re dealing with
+division, and in the `regex` line we’re dealing with a regex literal. How come?
+Because humans can look at the whole code to put the `/` characters in context.
+A JavaScript regex cannot. It only sees forwards. (Well, ES2018 regexes can also
+look backwards. See the [ES2018](#es2018) section).
+
+When the `jsTokens` regex scans throught the above, it will see the following
+at the end of both the `number` and `regex` rows:
+
+```js
+/ 2/g
+```
+
+It is then impossible to know if that is a regex literal, or part of an
+expression dealing with division.
+
+Here is a similar case:
+
+```js
+foo /= 2/g
+foo(/= 2/g)
+```
+
+The first line divides the `foo` variable with `2/g`. The second line calls the
+`foo` function with the regex literal `/= 2/g`. Again, since `jsTokens` only
+sees forwards, it cannot tell the two cases apart.
+
+There are some cases where we _can_ tell division and regex literals apart,
+though.
+
+First off, we have the simple cases where there’s only one slash in the line:
+
+```js
+var foo = 2/g
+foo /= 2
+```
+
+Regex literals cannot contain newlines, so the above cases are correctly
+identified as division. Things are only problematic when there are more than
+one non-comment slash in a single line.
+
+Secondly, not every character is a valid regex flag.
+
+```js
+var number = bar / 2/e
+```
+
+The above example is also correctly identified as division, because `e` is not a
+valid regex flag. I initially wanted to future-proof by allowing `[a-zA-Z]*`
+(any letter) as flags, but it is not worth it since it increases the amount of
+ambigous cases. So only the standard `g`, `m`, `i`, `y` and `u` flags are
+allowed. This means that the above example will be identified as division as
+long as you don’t rename the `e` variable to some permutation of `gmiyus` 1 to 6
+characters long.
+
+Lastly, we can look _forward_ for information.
+
+- If the token following what looks like a regex literal is not valid after a
+ regex literal, but is valid in a division expression, then the regex literal
+ is treated as division instead. For example, a flagless regex cannot be
+ followed by a string, number or name, but all of those three can be the
+ denominator of a division.
+- Generally, if what looks like a regex literal is followed by an operator, the
+ regex literal is treated as division instead. This is because regexes are
+ seldomly used with operators (such as `+`, `*`, `&&` and `==`), but division
+ could likely be part of such an expression.
+
+Please consult the regex source and the test cases for precise information on
+when regex or division is matched (should you need to know). In short, you
+could sum it up as:
+
+If the end of a statement looks like a regex literal (even if it isn’t), it
+will be treated as one. Otherwise it should work as expected (if you write sane
+code).
+
+### ES2018 ###
+
+ES2018 added some nice regex improvements to the language.
+
+- [Unicode property escapes] should allow telling names and invalid non-ASCII
+ characters apart without blowing up the regex size.
+- [Lookbehind assertions] should allow matching telling division and regex
+ literals apart in more cases.
+- [Named capture groups] might simplify some things.
+
+These things would be nice to do, but are not critical. They probably have to
+wait until the oldest maintained Node.js LTS release supports those features.
+
+[Unicode property escapes]: http://2ality.com/2017/07/regexp-unicode-property-escapes.html
+[Lookbehind assertions]: http://2ality.com/2017/05/regexp-lookbehind-assertions.html
+[Named capture groups]: http://2ality.com/2017/05/regexp-named-capture-groups.html
+
+
+License
+=======
+
+[MIT](LICENSE).