论文标题
岛分析的湖符号
Lake symbols for island parsing
论文作者
论文摘要
上下文:一个岛屿解析器读取输入文本,并仅在文本中使用的编程构造构建解析(或抽象语法)树。这些结构称为岛屿,其余文本称为水,解析器忽略并跳过。由于岛屿解析器不必解析输入的所有细节,因此通常很容易开发,但对于许多软件工程工具来说仍然足够有用。当使用解析器发生器时,开发人员可以通过仅描述少数语法规则(例如在解析表达语法(PEG))来实现岛屿解析器。 询问:但是,实际上,语法规则通常很复杂,因为开发商必须定义岛内的水。否则,岛屿解析将不会减少语法规则的总数。在描述此类水的语法规则时,开发人员必须考虑其他规则并列举一组符号,我们称之为替代符号。由于这一困难,尽管岛屿解析在许多应用中有用,但如今似乎并未广泛使用。 方法:本文提出了解决岛屿解析器的困难的湖泊符号。它还为支撑湖符号的PEG提供了扩展。湖符号自动化了岛上水的替代符号的枚举。本文提出了一种用于将扩展钉转化为普通钉的算法,该算法可以基于PEG将其提供给现有的解析器发生器。 知识:用户可以使用湖符号来定义水,而无需指定每个替代符号。我们的算法可以根据语法中使用的湖泊符号来计算湖符号的所有替代符号。 接地:我们实施了一个解析器,接受了我们的扩展钉,并为python实施了36个岛屿解析器和20个岛屿解析器。我们的实验表明,湖泊符号减少了Java语法规则的42%,平均降低了Python规则的89%,不包括岛屿是表达式的情况。 重要性:这项工作减轻了岛屿解析的使用。湖符号使用户能够比以前更简单地定义岛内的水。在岛内定义水对于将岛屿解析应用于实用的编程语言至关重要。
Context: An island parser reads an input text and builds the parse (or abstract syntax) tree of only the programming constructs of interest in the text. These constructs are called islands and the rest of the text is called water, which the parser ignores and skips over. Since an island parser does not have to parse all the details of the input, it is often easy to develop but still useful enough for a number of software engineering tools. When a parser generator is used, the developer can implement an island parser by just describing a small number of grammar rules, for example, in Parsing Expression Grammar (PEG). Inquiry: In practice, however, the grammar rules are often complicated since the developer must define the water inside the island; otherwise, the island parsing will not reduce the total number of grammar rules. When describing the grammar rules for such water, the developer must consider other rules and enumerate a set of symbols, which we call alternative symbols. Due to this difficulty, island parsing seems to be not widely used today despite its usefulness in many applications. Approach: This paper proposes the lake symbols for addressing this difficulty in developing an island parser. It also presents an extension to PEG for supporting the lake symbols. The lake symbols automate the enumeration of the alternative symbols for the water inside an island. The paper proposes an algorithm for translating the extended PEG to the normal PEG, which can be given to an existing parser generator based on PEG. Knowledge: The user can use lake symbols to define water without specifying each alternative symbol. Our algorithms can calculate all alternative symbols for a lake symbol, based on where the lake symbol is used in the grammar. Grounding: We implemented a parser generator accepting our extended PEG and implemented 36 island parsers for Java and 20 island parsers for Python. Our experiments show that the lake symbols reduce 42 % of grammar rules for Java and 89 % of rules for Python on average, excluding the case where islands are expressions. Importance: This work eases the use of island parsing. Lake symbols enable the user to define the water inside the island simpler than before. Defining water inside the island is essential to apply island parsing for practical programming languages.