I am parsing an old language (PL/I) that uses single quotes for strings, and allows embedded single quotes as "two single" quotes for instance:
'This is a normal string'
'This contains '' embedded quotes ''' // interpreted as "This contains ' embedded quotes'"
This ANTLR4 lexer rule works fine for "normal" quoted strings:
STR_CONSTANT : '\'' (~['\\])* '\'';
...but this one "seems(?)" like it should work for the embedded "two single" quotes but seems to choke and ends up causing a stack overflow when I analyze a sample piece of code...
STR_CONSTANT : '\'' ( '\'\'' | ~['\\] )* '\'' ;
Anybody have different ideas how I might build that lexer rule or if that's a limit or ???
I did think about having a recursive parser rule something like this (using the "original" single quote lexer rule) but that doesn't seem to work either...:
strconstant: STR_CONSTANT
| strconstant STR_CONSTANT
;
I am parsing an old language (PL/I) that uses single quotes for strings, and allows embedded single quotes as "two single" quotes for instance:
'This is a normal string'
'This contains '' embedded quotes ''' // interpreted as "This contains ' embedded quotes'"
This ANTLR4 lexer rule works fine for "normal" quoted strings:
STR_CONSTANT : '\'' (~['\\])* '\'';
...but this one "seems(?)" like it should work for the embedded "two single" quotes but seems to choke and ends up causing a stack overflow when I analyze a sample piece of code...
STR_CONSTANT : '\'' ( '\'\'' | ~['\\] )* '\'' ;
Anybody have different ideas how I might build that lexer rule or if that's a limit or ???
I did think about having a recursive parser rule something like this (using the "original" single quote lexer rule) but that doesn't seem to work either...:
strconstant: STR_CONSTANT
| strconstant STR_CONSTANT
;
Share
Improve this question
asked Mar 3 at 14:06
Andrew ClarkAndrew Clark
213 bronze badges
2
|
1 Answer
Reset to default 0I took an PL/I solution from codeproject
Basic Encryption
as example file.
I added a little bit of code and text to demonstrate the reception of strings containing double single apostrophes.
This is a dirty solution (the grammar) that will not scale for a complete PL/I grammar.
expr is also used for simple parameter list of procedures.
The key points to make the string recognition work are the line-by-line lexing and parsing of the code.
The recognition is greedy in general so it had to stop at some point. Which is just the end of line. This should be a suitable approach for PL/I in general.
The solution will work as long there is a single string per line only. At the moment the syntax allows two strings to appear in a single line some delimiter has to be treated as stop item as done with the newline in case of single possible string.
In output please validate string recognition for line
PUT SKIP LIST ('--- PROGRAM TERMINATED ''now''---');
Then inspect representation of line
PUT SKIP LIST ('--- PROGRAM STARTS ''now''---' || '''or'' never!');
You will notice that the string concatenation operator || will be recognized as part of a single string.
Now in grammar change the STRING definition from
STRING : '\'' ~('\r'|'\n')* '\'' ;
to
STRING : '\'' ~('\r'|'\n'|'|')* '\'' ;
and check the statement recognition again.
Finally there should be a simple solution to get rid of the notification,
line 200:59 missing LINEEND at '<EOF>'
(in case you copied the example PL/I program exactly as is).
Grammar file PL1.g4
grammar PL1;
allCode : pgmLine+ ;
pgmLine : COMMENT LINEEND
| stat LINEEND
| expr LINEEND
| LINEEND
;
stat : ID ':' 'PROC' 'OPTIONS' '(' ID ')' 'REORDER'
| ID ':' 'PROC' '(' ID ')' 'RETURNS' '(' TYPE expr ')'
| 'DCL' ID 'FIXED'? TYPE expr ('INIT' expr )?
| 'DO' ID '=' NUM 'TO' expr
| 'PUT SKIP LIST' expr
| 'END' ID?
| 'GET' ('SKIP(0) LIST'|'EDIT') expr ('(' TYPE expr ')')?
| 'IF' expr 'THEN' 'DO'?
| 'DO'
| 'ELSE' 'IF' expr 'THEN' 'DO'?
| 'ELSE' 'DO'?
| ID '=' 'SUBSTR' '(' ID ',' ID ',' NUM ')'
| 'SUBSTR' '(' ID ',' ID ',' NUM ')' '=' ID
| 'WHEN' '(' expr ')' ID '=' expr
;
expr : '(' expr ')'
| expr '||' expr
| expr '+' expr
| expr '-' expr
| expr '<' expr
| ID expr
| ID '=' expr expr?
| ID
| STRING
| NUM
;
TYPE : 'DEC'
| 'CHAR'
| 'A'
;
ID : [a-zA-Z_][a-zA-Z_0-9]* ;
NUM : [0-9]+ ;
STRING : '\'' ~('\r'|'\n')* '\'' ;
COMMENT : '/*' ~('\r'|'\n')* '*/' ;
WS : [ \t]+ -> skip ;
LINEEND : ';'? '\r'? '\n' ;
PL/I example file pgm.pl1
/* PROCEDURE ENCRYPTOR */
SKAKOS1: PROC OPTIONS(MAIN) REORDER;
DCL ENCRYPT_KEY DEC(2) INIT (1);
DCL DECRYPT_KEY DEC(2) INIT (1);
DCL USER_TEXT CHAR(50) INIT (' ');
DCL RESULT CHAR(100) INIT (' ');
DCL HUO0 CHAR(1) INIT (' ');
DCL HUO1 CHAR(1) INIT (' ');
DCL EXECMODE FIXED DEC (1);
DCL STOPPROG FIXED DEC (1);
DCL I FIXED DEC (2);
ENCRYPT_KEY = 1;
DECRYPT_KEY = 1;
/*MAIN*/
DO I = 1 TO 25;
PUT SKIP LIST('');
END;
/* Next line inserted by MR */
PUT SKIP LIST ('--- PROGRAM STARTS ''now''---' || '''or'' never!');
PUT SKIP LIST ('HUO ENCRYPTOR - CIA Version (128 bit)');
PUT SKIP LIST(' ');
PUT SKIP LIST('SELECT PROGRAM MODE (1 or 2)');
PUT SKIP LIST('----------------------------');
PUT SKIP LIST('OPTION 1: ENCRYPT');
PUT SKIP LIST('OPTION 2: DECRYPT');
PUT SKIP LIST(' ');
GET SKIP(0) LIST(EXECMODE);
PUT SKIP LIST(' ');
PUT SKIP LIST('ENTER TEXT:');
GET EDIT (USER_TEXT)(A(50));
IF (EXECMODE = 1) THEN
DO;
RESULT = ENCRYPT(USER_TEXT);
END;
ELSE IF (EXECMODE = 2) THEN
DO;
RESULT = DECRYPT(USER_TEXT);
END;
PUT SKIP LIST(' ');
PUT SKIP LIST ('RESULT: ' || RESULT);
/* Next line changed by MR */
PUT SKIP LIST ('--- PROGRAM TERMINATED ''now''---');
/*-------------------------------------------------------*/
ENCRYPT:PROC(INPUT_TEXT) RETURNS(CHAR(50));
DCL INPUT_TEXT CHAR(50);
DCL OUTPUT_TEXT CHAR(50);
DCL I DEC(2);
DCL HUO0 CHAR(1);
DCL HUO1 CHAR(1);
OUTPUT_TEXT = INPUT_TEXT;
/*PUT SKIP LIST('INPUT LENGTH: ',LENGTH(INPUT_TEXT));*/
DO I = 1 TO LENGTH(INPUT_TEXT);
HUO0 = SUBSTR(INPUT_TEXT,I,1);
IF HUO0 = ' ' THEN DO;
HUO1 = ' ';
END;
ELSE DO;
HUO1 = ASCII_TO_CHAR((CHAR_TO_ASCII(HUO0) + ENCRYPT_KEY));
END;
SUBSTR(OUTPUT_TEXT,I,1) = HUO1;
/*PUT SKIP LIST('I = ' || I);*/
END;
RETURN(OUTPUT_TEXT);
END ENCRYPT;
/*-------------------------------------------------------*/
/*-------------------------------------------------------*/
DECRYPT:PROC(INPUT_TEXT) RETURNS(CHAR(50));
DCL INPUT_TEXT CHAR(50);
DCL OUTPUT_TEXT CHAR(50);
DCL I DEC(2);
DCL HUO0 CHAR(1);
DCL HUO1 CHAR(1);
OUTPUT_TEXT = INPUT_TEXT;
DO I = 1 TO LENGTH(INPUT_TEXT);
HUO0 = SUBSTR(INPUT_TEXT,I,1);
IF HUO0 = ' ' THEN DO;
HUO1 = ' ';
END;
ELSE DO;
HUO1 = ASCII_TO_CHAR((CHAR_TO_ASCII(HUO0) - DECRYPT_KEY));
END;
SUBSTR(OUTPUT_TEXT,I,1) = HUO1;
END ;
RETURN(OUTPUT_TEXT);
END DECRYPT;
/*-------------------------------------------------------*/
/*-------------------------------------------------------*/
CHAR_TO_ASCII:PROC(INPUT_CHAR) RETURNS(DEC(2));
DCL INPUT_CHAR CHAR(1);
DCL OUTPUT_NUM DEC(2);
DCL DEBUG_NUM DEC(1);
SELECT (INPUT_CHAR);
WHEN ('A') OUTPUT_NUM = 1;
WHEN ('B') OUTPUT_NUM = 2;
WHEN ('C') OUTPUT_NUM = 3;
WHEN ('D') OUTPUT_NUM = 4;
WHEN ('E') OUTPUT_NUM = 5;
WHEN ('F') OUTPUT_NUM = 6;
WHEN ('G') OUTPUT_NUM = 7;
WHEN ('H') OUTPUT_NUM = 8;
WHEN ('I') OUTPUT_NUM = 9;
WHEN ('J') OUTPUT_NUM = 10;
WHEN ('K') OUTPUT_NUM = 11;
WHEN ('L') OUTPUT_NUM = 12;
WHEN ('M') OUTPUT_NUM = 13;
WHEN ('N') OUTPUT_NUM = 14;
WHEN ('O') OUTPUT_NUM = 15;
WHEN ('P') OUTPUT_NUM = 16;
WHEN ('Q') OUTPUT_NUM = 17;
WHEN ('R') OUTPUT_NUM = 18;
WHEN ('S') OUTPUT_NUM = 19;
WHEN ('T') OUTPUT_NUM = 20;
WHEN ('U') OUTPUT_NUM = 21;
WHEN ('V') OUTPUT_NUM = 22;
WHEN ('W') OUTPUT_NUM = 23;
WHEN ('X') OUTPUT_NUM = 24;
WHEN ('Y') OUTPUT_NUM = 25;
WHEN ('Z') OUTPUT_NUM = 26;
OTHERWISE OUTPUT_NUM = 99;
END;
/*PUT SKIP LIST('DEBUG CHECKPOINT 4');*/
RETURN(OUTPUT_NUM);
END CHAR_TO_ASCII;
/*-------------------------------------------------------*/
/*-------------------------------------------------------*/
ASCII_TO_CHAR:PROC(INPUT_ASCII) RETURNS(CHAR(1));
DCL INPUT_ASCII DEC(2);
DCL OUTPUT_CHAR CHAR(1);
IF INPUT_ASCII = 27 THEN DO;
INPUT_ASCII = 1;
END;
IF INPUT_ASCII < 0 THEN DO;
INPUT_ASCII = 26;
END;
SELECT (INPUT_ASCII);
WHEN (99) OUTPUT_CHAR = ' ';
WHEN (1) OUTPUT_CHAR = 'A';
WHEN (2) OUTPUT_CHAR = 'B';
WHEN (3) OUTPUT_CHAR = 'C';
WHEN (4) OUTPUT_CHAR = 'D';
WHEN (5) OUTPUT_CHAR = 'E';
WHEN (6) OUTPUT_CHAR = 'F';
WHEN (7) OUTPUT_CHAR = 'G';
WHEN (8) OUTPUT_CHAR = 'H';
WHEN (9) OUTPUT_CHAR = 'I';
WHEN (10) OUTPUT_CHAR = 'J';
WHEN (11) OUTPUT_CHAR = 'K';
WHEN (12) OUTPUT_CHAR = 'L';
WHEN (13) OUTPUT_CHAR = 'M';
WHEN (14) OUTPUT_CHAR = 'N';
WHEN (15) OUTPUT_CHAR = 'O';
WHEN (16) OUTPUT_CHAR = 'P';
WHEN (17) OUTPUT_CHAR = 'Q';
WHEN (18) OUTPUT_CHAR = 'R';
WHEN (19) OUTPUT_CHAR = 'S';
WHEN (20) OUTPUT_CHAR = 'T';
WHEN (21) OUTPUT_CHAR = 'U';
WHEN (22) OUTPUT_CHAR = 'V';
WHEN (23) OUTPUT_CHAR = 'U';
WHEN (24) OUTPUT_CHAR = 'X';
WHEN (25) OUTPUT_CHAR = 'Y';
WHEN (26) OUTPUT_CHAR = 'Z';
OTHERWISE OUTPUT_CHAR = ' ';
END;
RETURN(OUTPUT_CHAR);
END ASCII_TO_CHAR;
/*-------------------------------------------------------*/
发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745090967a4610691.html
STR_CONSTANT : '\'' ( '\'\'' | ~['\\] )* '\'' ;
is correct. If it doesn't work, please add a small example class/script including example input that shows the problem. – Bart Kiers Commented Mar 3 at 19:23