- Notifications
You must be signed in to change notification settings - Fork4
Parser: jashkenas Coffeescript
===A breakdown of the original Coffeescript parser, to help understand and eventually provide a guide to adding to.Based on the threadhere:
The original Coffeescript Parser. This parser is where our current updates for CS6 are going. While it's not the easiest codebase to work with, it is the current target for our short term goals of getting ES6 Classes and Modules into Coffeescript.
- It gets the job done.
- There’s coffeelint.
- The "standard" to ensure compatibility with.
- Codebase is crufty and convoluted
- Many pull requests are mired and may never be integrated
- Adding new features is complicated
- It is too permissive: It accepts more input code as legal than was intended, while still outputting valid JavaScript. This has led to disagreements on what is legal CoffeeScript and what is not.
For a great explanation on the parser concepts, you should watch thevideo on CS, it's very informative and gives a great background on the jashkenas parser.
There is also a practical example from the conversationhere by lydell:
jashkenas/coffeescript: No preprocessor. This is what it does:
Tokenize a string of CoffeeScript into an array of tokens.Feed that array of tokens to what is called the Rewriter, which inserts fake tokens for implicit parentheses and braces here and there.Feed the “rewritten” array of tokens into a parser generated by Jison from grammar.coffee.
The reason the Rewriter exists is because nobody has been able to be able to write a Jison grammar that handles the implicit parentheses and braces stuff.
This section will eventually have a better breakdown of all of the pieces, however in the short term you can get a lot of insight from this interaction:
Excerpt from CSPull Request by JimPanic regarding GeoffreyBooth's integration of Modules into the CS codebase.
identifierToken basically takes one word or symbol (read: @chunk) at atime, assigns it a name or type and creates a token in the form of a token tuple[ tag, value, offsetInChunk, length, origin ]. This is what the functionstoken and subsequently makeToken create.
In identifierToken there are a few key variables and functions that are needed:
@chunk: the current string to handle, this is split up into [input, id, colon] with the IDENTIFIER regular expression at the bottomid: in case of import, this is literally 'import'@tag(): gets the tag (first value of the token tuple) of the last processed token. When processing foo (as in the second chunk of import 'foo'), @tag() will return 'IMPORT'.@value(): gets the value (second value of the token tuple) of the last processed token. When processing foo (as in the second chunk of import 'foo'), @value() will return import, the very string that was held in id in the last chunk's handling.So basically what I added to identifierToken was the tags IMPORT,IMPORT_AS, IMPORT_FROM as well as the variable @seenImport to knowthat when I encounter an as or a from, this will be from an import andnot a yield or similar. This also means in theory that from can still beused as an identifier as well. We have to test that though. :)
These three tags are then used in grammar.coffee.
There's also code the reset @seenImport when the statement is terminated (inlineToken iirc).
For this part I took a look atthe spec for importsand basically copied the structure from there.
The DSL used here basically mixes and matches tags and named grammar forms. Inthis case the tags are 'IMPORT', 'IMPORT_AS', 'IMPORT_FROM' asreplaced in lexer.coffee's identifierToken. The other parts of thosestrings are just other named grammar forms (ImportsList, OptComma,Identifier, etc.).
The structure builds up through references to other grammar forms and functionsthat create and return data structures, like -> new Import $2. $nvariables are just references to the nth word in the string.
This process leads to an AST that is passed to the Import class defined innodes.coffee.
Off the top of my head this should look as follows:
# import 'foo' will yield something like:newImport(Value {value:'foo' })# import { foo } from 'foo' will yield something like:newImport(Value {value:'foo' },ImportsList {.... })
You can look at this AST quite easily by just prepending a console.logbefore calling new Import:
Import: [ o 'IMPORT String', -> console.log($2); new Import $2 o 'IMPORT ImportClause IMPORT_FROM String', -> console.log($4, $2); new Import $4, $2]
Taking the AST from grammar.coffee, the classes in nodes.coffee aresupposed to create tupels of "code" through @makeCode and compileNodefunctions. I'm not entirely clear on this part yet, but each node is compiled toa string by calling compileNode or compileToFragments. WhatImport.compileNode basically does is just look at the AST and either returnan array of strings passed through @makdeCode directly OR it calls thetoken's compileNode function.
This part is a bit of magic for me still, as there function names and processesdon't line up with my way of thinking it seems.