Using ParseKit to parse nested rules
I'm trying to create a strict CSS parser with ParseKit which supports
nested rules like those found in SASS and LESS. I'm trying to adapt and
learn from the sample CSS and JSON grammar to build my grammar.
the grammar so far:
@symbols = '//';
@singleLineComments = '//';
@multiLineComments = '/*' '*/';
@wordState = '-' '@';
@start
@before {
PKTokenizer *t = self.tokenizer;
// symbols
[t.symbolState add:@"/*"];
[t.symbolState add:@"*/"];
[t.symbolState add:@"//"];
[t.symbolState add:@"url("];
[t.symbolState add:@"URL("];
[t.symbolState add:@"Url("];
// word chars -moz, -webkit, @media, #id, .class, :hover
[t setTokenizerState:t.wordState from:'-' to:'-'];
[t setTokenizerState:t.wordState from:'@' to:'@'];
[t setTokenizerState:t.wordState from:'.' to:'.'];
[t setTokenizerState:t.wordState from:'#' to:'#'];
[t.wordState setWordChars:YES from:'-' to:'-'];
[t.wordState setWordChars:YES from:'@' to:'@'];
[t.wordState setWordChars:YES from:'.' to:'.'];
[t.wordState setWordChars:YES from:'#' to:'#'];
// comments
[t setTokenizerState:t.commentState from:'/' to:'/'];
[t.commentState setFallbackState:t.symbolState from:'/' to:'/'];
[t.commentState addSingleLineStartMarker:@"//"];
[t.commentState addMultiLineStartMarker:@"/*" endMarker:@"*/"];
t.commentState.reportsCommentTokens = YES;
// urls
[t setTokenizerState:t.delimitState from:'u' to:'u'];
[t setTokenizerState:t.delimitState from:'U' to:'U'];
[t.delimitState addStartMarker:@"url(" endMarker:@")"
allowedCharacterSet:nil];
[t.delimitState addStartMarker:@"URL(" endMarker:@")"
allowedCharacterSet:nil];
[t.delimitState addStartMarker:@"Url(" endMarker:@")"
allowedCharacterSet:nil];
}
= ruleset*;
ruleset = selectors openCurly ( decls | selector )
closeCurly;
selectors = selector commaSelector*;
selector = (selectorWord | hashSym | dot | colon | gt |
openBracket | closeBracket | eq | selectorQuotedString | tilde | pipe)+;
selectorWord = Word;
selectorQuotedString = QuotedString;
commaSelector = comma selector;
decls = Empty | actualDecls;
actualDecls = decl decl*;
decl = property colon expr important? semi;
property = Word;
expr = (string | constant | num | url | openParen |
closeParen | comma | nonTerminatingSymbol)+;
url = urlLower | urlUpper;
urlLower = %{'url(', ')'};
urlUpper = %{'URL(', ')'};
nonTerminatingSymbol = {return NE(LS(1), @";") && NE(LS(1), @"!");}?
fwdSlash | Symbol;
important = bang Word;
string = QuotedString;
constant = Word;
openCurly = '{';
closeCurly = '}';
openBracket = '[';
closeBracket = ']';
eq = '=';
comma = ',';
colon = ':';
semi = ';';
openParen = '(';
closeParen = ')';
gt = '>';
tilde = '~';
pipe = '|';
fwdSlash = '/';
hashSym = '#';
dot = '.';
at = '@';
bang = '!';
num = Number;
I thought the key to enabling nested rules was the
ruleset = selectors openCurly ( decls | selector )
closeCurly;
line, allowing a nested selector like the JSON grammar does. But when I
feed in a string like
.myClass1 {
.content {}
}
.myClass2 {}
the assembly stack only shows ['.', 'myClass1', '.', 'content']. It seems
to skip entirely .myClass2.
Why does this grammar stop parsing when it finds a nested selector? How
can I make it correctly parse the entire stylesheet? How can I keep track
of the ancestry of each selector and rule?
info: here is my class which sets up the PKParser and delegate selectors.
No comments:
Post a Comment