617 lines
22 KiB
Markdown
617 lines
22 KiB
Markdown
|
# A dead simple parser package for Go
|
||
|
<a id="markdown-a-dead-simple-parser-package-for-go" name="a-dead-simple-parser-package-for-go"></a>
|
||
|
|
||
|
[![PkgGoDev](https://pkg.go.dev/badge/github.com/alecthomas/participle/v2)](https://pkg.go.dev/github.com/alecthomas/participle/v2) [![GHA Build](https://github.com/alecthomas/participle/actions/workflows/ci.yml/badge.svg)](https://github.com/alecthomas/participle/actions)
|
||
|
[![Go Report Card](https://goreportcard.com/badge/github.com/alecthomas/participle/v2)](https://goreportcard.com/report/github.com/alecthomas/participle/v2) [![Slack chat](https://img.shields.io/static/v1?logo=slack&style=flat&label=slack&color=green&message=gophers)](https://gophers.slack.com/messages/CN9DS8YF3)
|
||
|
|
||
|
<!-- TOC depthfrom:2 insertanchor:true updateonsave:true -->
|
||
|
|
||
|
- [V2](#v2)
|
||
|
- [Introduction](#introduction)
|
||
|
- [Tutorial](#tutorial)
|
||
|
- [Tag syntax](#tag-syntax)
|
||
|
- [Overview](#overview)
|
||
|
- [Grammar syntax](#grammar-syntax)
|
||
|
- [Capturing](#capturing)
|
||
|
- [Capturing boolean value](#capturing-boolean-value)
|
||
|
- [Streaming](#streaming)
|
||
|
- [Lexing](#lexing)
|
||
|
- [Stateful lexer](#stateful-lexer)
|
||
|
- [Example stateful lexer](#example-stateful-lexer)
|
||
|
- [Example simple/non-stateful lexer](#example-simplenon-stateful-lexer)
|
||
|
- [Experimental - code generation](#experimental---code-generation)
|
||
|
- [Options](#options)
|
||
|
- [Examples](#examples)
|
||
|
- [Performance](#performance)
|
||
|
- [Concurrency](#concurrency)
|
||
|
- [Error reporting](#error-reporting)
|
||
|
- [Limitations](#limitations)
|
||
|
- [EBNF](#ebnf)
|
||
|
- [Syntax/Railroad Diagrams](#syntaxrailroad-diagrams)
|
||
|
|
||
|
<!-- /TOC -->
|
||
|
|
||
|
## V2
|
||
|
<a id="markdown-v2" name="v2"></a>
|
||
|
|
||
|
This is an alpha of version 2 of Participle. It is still subject to change but should be mostly stable at this point.
|
||
|
|
||
|
See the [Change Log](CHANGES.md) for details.
|
||
|
|
||
|
> **Note:** semantic versioning API guarantees do not apply to the [experimental](https://pkg.go.dev/github.com/alecthomas/participle/v2/experimental) packages - the API may break between minor point releases.
|
||
|
|
||
|
It can be installed with:
|
||
|
|
||
|
```shell
|
||
|
$ go get github.com/alecthomas/participle/v2@latest
|
||
|
```
|
||
|
|
||
|
The latest version from v0 can be installed via:
|
||
|
|
||
|
```shell
|
||
|
$ go get github.com/alecthomas/participle@latest
|
||
|
```
|
||
|
|
||
|
## Introduction
|
||
|
<a id="markdown-introduction" name="introduction"></a>
|
||
|
|
||
|
The goal of this package is to provide a simple, idiomatic and elegant way of
|
||
|
defining parsers in Go.
|
||
|
|
||
|
Participle's method of defining grammars should be familiar to any Go
|
||
|
programmer who has used the `encoding/json` package: struct field tags define
|
||
|
what and how input is mapped to those same fields. This is not unusual for Go
|
||
|
encoders, but is unusual for a parser.
|
||
|
|
||
|
## Tutorial
|
||
|
<a id="markdown-tutorial" name="tutorial"></a>
|
||
|
|
||
|
A [tutorial](TUTORIAL.md) is available, walking through the creation of an .ini parser.
|
||
|
|
||
|
## Tag syntax
|
||
|
<a id="markdown-tag-syntax" name="tag-syntax"></a>
|
||
|
|
||
|
Participle supports two forms of struct tag grammar syntax.
|
||
|
|
||
|
The easiest to read is when the grammar uses the entire struct tag content, eg.
|
||
|
|
||
|
```go
|
||
|
Field string `@Ident @("," Ident)*`
|
||
|
```
|
||
|
|
||
|
However, this does not coexist well with other tags such as JSON, etc. and
|
||
|
may cause issues with linters. If this is an issue then you can use the
|
||
|
`parser:""` tag format. In this case single quotes can be used to quote
|
||
|
literals making the tags somewhat easier to write, eg.
|
||
|
|
||
|
```go
|
||
|
Field string `parser:"@ident (',' Ident)*" json:"field"`
|
||
|
```
|
||
|
|
||
|
|
||
|
|
||
|
## Overview
|
||
|
<a id="markdown-overview" name="overview"></a>
|
||
|
|
||
|
A grammar is an annotated Go structure used to both define the parser grammar,
|
||
|
and be the AST output by the parser. As an example, following is the final INI
|
||
|
parser from the tutorial.
|
||
|
|
||
|
```go
|
||
|
type INI struct {
|
||
|
Properties []*Property `@@*`
|
||
|
Sections []*Section `@@*`
|
||
|
}
|
||
|
|
||
|
type Section struct {
|
||
|
Identifier string `"[" @Ident "]"`
|
||
|
Properties []*Property `@@*`
|
||
|
}
|
||
|
|
||
|
type Property struct {
|
||
|
Key string `@Ident "="`
|
||
|
Value *Value `@@`
|
||
|
}
|
||
|
|
||
|
type Value struct {
|
||
|
String *string ` @String`
|
||
|
Number *float64 `| @Float`
|
||
|
}
|
||
|
```
|
||
|
|
||
|
> **Note:** Participle also supports named struct tags (eg. <code>Hello string `parser:"@Ident"`</code>).
|
||
|
|
||
|
A parser is constructed from a grammar and a lexer:
|
||
|
|
||
|
```go
|
||
|
parser, err := participle.Build(&INI{})
|
||
|
```
|
||
|
|
||
|
Once constructed, the parser is applied to input to produce an AST:
|
||
|
|
||
|
```go
|
||
|
ast := &INI{}
|
||
|
err := parser.ParseString("", "size = 10", ast)
|
||
|
// ast == &INI{
|
||
|
// Properties: []*Property{
|
||
|
// {Key: "size", Value: &Value{Number: &10}},
|
||
|
// },
|
||
|
// }
|
||
|
```
|
||
|
|
||
|
## Grammar syntax
|
||
|
<a id="markdown-grammar-syntax" name="grammar-syntax"></a>
|
||
|
|
||
|
Participle grammars are defined as tagged Go structures. Participle will
|
||
|
first look for tags in the form `parser:"..."`. It will then fall back to
|
||
|
using the entire tag body.
|
||
|
|
||
|
The grammar format is:
|
||
|
|
||
|
- `@<expr>` Capture expression into the field.
|
||
|
- `@@` Recursively capture using the fields own type.
|
||
|
- `<identifier>` Match named lexer token.
|
||
|
- `( ... )` Group.
|
||
|
- `"..."` or `'...'` Match the literal (note that the lexer must emit tokens matching this literal exactly).
|
||
|
- `"...":<identifier>` Match the literal, specifying the exact lexer token type to match.
|
||
|
- `<expr> <expr> ...` Match expressions.
|
||
|
- `<expr> | <expr> | ...` Match one of the alternatives. Each alternative is tried in order, with backtracking.
|
||
|
- `~<expr>` Match any token that is _not_ the start of the expression (eg: `@~";"` matches anything but the `;` character into the field).
|
||
|
- `(?= ... )` Positive lookahead group - requires the contents to match further input, without consuming it.
|
||
|
- `(?! ... )` Negative lookahead group - requires the contents not to match further input, without consuming it.
|
||
|
|
||
|
The following modifiers can be used after any expression:
|
||
|
|
||
|
- `*` Expression can match zero or more times.
|
||
|
- `+` Expression must match one or more times.
|
||
|
- `?` Expression can match zero or once.
|
||
|
- `!` Require a non-empty match (this is useful with a sequence of optional matches eg. `("a"? "b"? "c"?)!`).
|
||
|
|
||
|
Notes:
|
||
|
|
||
|
- Each struct is a single production, with each field applied in sequence.
|
||
|
- `@<expr>` is the mechanism for capturing matches into the field.
|
||
|
- if a struct field is not keyed with "parser", the entire struct tag
|
||
|
will be used as the grammar fragment. This allows the grammar syntax to remain
|
||
|
clear and simple to maintain.
|
||
|
|
||
|
## Capturing
|
||
|
<a id="markdown-capturing" name="capturing"></a>
|
||
|
|
||
|
Prefixing any expression in the grammar with `@` will capture matching values
|
||
|
for that expression into the corresponding field.
|
||
|
|
||
|
For example:
|
||
|
|
||
|
```go
|
||
|
// The grammar definition.
|
||
|
type Grammar struct {
|
||
|
Hello string `@Ident`
|
||
|
}
|
||
|
|
||
|
// The source text to parse.
|
||
|
source := "world"
|
||
|
|
||
|
// After parsing, the resulting AST.
|
||
|
result == &Grammar{
|
||
|
Hello: "world",
|
||
|
}
|
||
|
```
|
||
|
|
||
|
For slice and string fields, each instance of `@` will accumulate into the
|
||
|
field (including repeated patterns). Accumulation into other types is not
|
||
|
supported.
|
||
|
|
||
|
For integer and floating point types, a successful capture will be parsed
|
||
|
with `strconv.ParseInt()` and `strconv.ParseFloat()` respectively.
|
||
|
|
||
|
A successful capture match into a `bool` field will set the field to true.
|
||
|
|
||
|
Tokens can also be captured directly into fields of type `lexer.Token` and
|
||
|
`[]lexer.Token`.
|
||
|
|
||
|
Custom control of how values are captured into fields can be achieved by a
|
||
|
field type implementing the `Capture` interface (`Capture(values []string)
|
||
|
error`).
|
||
|
|
||
|
Additionally, any field implementing the `encoding.TextUnmarshaler` interface
|
||
|
will be capturable too. One caveat is that `UnmarshalText()` will be called once
|
||
|
for each captured token, so eg. `@(Ident Ident Ident)` will be called three times.
|
||
|
|
||
|
### Capturing boolean value
|
||
|
<a id="markdown-capturing-boolean-value" name="capturing-boolean-value"></a>
|
||
|
|
||
|
By default a boolean field is used to indicate that a match occurred, which
|
||
|
turns out to be much more useful and common in Participle than parsing true
|
||
|
or false literals. For example, parsing a variable declaration with a
|
||
|
trailing optional syntax:
|
||
|
|
||
|
```go
|
||
|
type Var struct {
|
||
|
Name string `"var" @Ident`
|
||
|
Type string `":" @Ident`
|
||
|
Optional bool `@"?"?`
|
||
|
}
|
||
|
```
|
||
|
|
||
|
In practice this gives more useful AST's. If bool were to be parsed literally
|
||
|
then you'd need to have some alternate type for Optional such as string or a
|
||
|
custom type.
|
||
|
|
||
|
To capture literal boolean values such as `true` or `false`, implement the
|
||
|
Capture interface like so:
|
||
|
|
||
|
```go
|
||
|
type Boolean bool
|
||
|
|
||
|
func (b *Boolean) Capture(values []string) error {
|
||
|
*b = values[0] == "true"
|
||
|
return nil
|
||
|
}
|
||
|
|
||
|
type Value struct {
|
||
|
Float *float64 ` @Float`
|
||
|
Int *int `| @Int`
|
||
|
String *string `| @String`
|
||
|
Bool *Boolean `| @("true" | "false")`
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Streaming
|
||
|
<a id="markdown-streaming" name="streaming"></a>
|
||
|
|
||
|
Participle supports streaming parsing. Simply pass a channel of your grammar into
|
||
|
`Parse*()`. The grammar will be repeatedly parsed and sent to the channel. Note that
|
||
|
the `Parse*()` call will not return until parsing completes, so it should generally be
|
||
|
started in a goroutine.
|
||
|
|
||
|
```go
|
||
|
type token struct {
|
||
|
Str string ` @Ident`
|
||
|
Num int `| @Int`
|
||
|
}
|
||
|
|
||
|
parser, err := participle.Build(&token{})
|
||
|
|
||
|
tokens := make(chan *token, 128)
|
||
|
err := parser.ParseString("", `hello 10 11 12 world`, tokens)
|
||
|
for token := range tokens {
|
||
|
fmt.Printf("%#v\n", token)
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Lexing
|
||
|
<a id="markdown-lexing" name="lexing"></a>
|
||
|
|
||
|
Participle relies on distinct lexing and parsing phases. The lexer takes raw
|
||
|
bytes and produces tokens which the parser consumes. The parser transforms
|
||
|
these tokens into Go values.
|
||
|
|
||
|
The default lexer, if one is not explicitly configured, is based on the Go
|
||
|
`text/scanner` package and thus produces tokens for C/Go-like source code. This
|
||
|
is surprisingly useful, but if you do require more control over lexing the
|
||
|
builtin [`participle/lexer/stateful`](#markdown-stateful-lexer) lexer should
|
||
|
cover most other cases. If that in turn is not flexible enough, you can
|
||
|
implement your own lexer.
|
||
|
|
||
|
Configure your parser with a lexer using the `participle.Lexer()` option.
|
||
|
|
||
|
To use your own Lexer you will need to implement two interfaces:
|
||
|
[Definition](https://pkg.go.dev/github.com/alecthomas/participle/v2/lexer#Definition)
|
||
|
(and optionally [StringsDefinition](https://pkg.go.dev/github.com/alecthomas/participle/v2/lexer#StringDefinition) and [BytesDefinition](https://pkg.go.dev/github.com/alecthomas/participle/v2/lexer#BytesDefinition)) and [Lexer](https://pkg.go.dev/github.com/alecthomas/participle/v2/lexer#Lexer).
|
||
|
|
||
|
### Stateful lexer
|
||
|
<a id="markdown-stateful-lexer" name="stateful-lexer"></a>
|
||
|
|
||
|
Participle's included stateful/modal lexer provides powerful yet convenient
|
||
|
construction of most lexers (notably, indentation based lexers cannot be
|
||
|
expressed).
|
||
|
|
||
|
It is sometimes the case that a simple lexer cannot fully express the tokens
|
||
|
required by a parser. The canonical example of this is interpolated strings
|
||
|
within a larger language. eg.
|
||
|
|
||
|
```go
|
||
|
let a = "hello ${name + ", ${last + "!"}"}"
|
||
|
```
|
||
|
|
||
|
This is impossible to tokenise with a normal lexer due to the arbitrarily
|
||
|
deep nesting of expressions.
|
||
|
|
||
|
To support this case Participle's lexer is now stateful by default.
|
||
|
|
||
|
The lexer is a state machine defined by a map of rules keyed by the state
|
||
|
name. Each rule within the state includes the name of the produced token, the
|
||
|
regex to match, and an optional operation to apply when the rule matches.
|
||
|
|
||
|
As a convenience, any `Rule` starting with a lowercase letter will be elided from output.
|
||
|
|
||
|
Lexing starts in the `Root` group. Each rule is matched in order, with the first
|
||
|
successful match producing a lexeme. If the matching rule has an associated Action
|
||
|
it will be executed. The name of each non-root rule is prefixed with the name
|
||
|
of its group to yield the token identifier used during matching.
|
||
|
|
||
|
A state change can be introduced with the Action `Push(state)`. `Pop()` will
|
||
|
return to the previous state.
|
||
|
|
||
|
To reuse rules from another state, use `Include(state)`.
|
||
|
|
||
|
A special named rule `Return()` can also be used as the final rule in a state
|
||
|
to always return to the previous state.
|
||
|
|
||
|
As a special case, regexes containing backrefs in the form `\N` (where `N` is
|
||
|
a digit) will match the corresponding capture group from the immediate parent
|
||
|
group. This can be used to parse, among other things, heredocs. See the
|
||
|
[tests](https://github.com/alecthomas/participle/blob/master/lexer/stateful/stateful_test.go#L59)
|
||
|
for an example of this, among others.
|
||
|
|
||
|
### Example stateful lexer
|
||
|
<a id="markdown-example-stateful-lexer" name="example-stateful-lexer"></a>
|
||
|
|
||
|
Here's a cut down example of the string interpolation described above. Refer to
|
||
|
the [stateful example](https://github.com/alecthomas/participle/tree/master/_examples/stateful)
|
||
|
for the corresponding parser.
|
||
|
|
||
|
```go
|
||
|
var lexer = stateful.Must(Rules{
|
||
|
"Root": {
|
||
|
{`String`, `"`, Push("String")},
|
||
|
},
|
||
|
"String": {
|
||
|
{"Escaped", `\\.`, nil},
|
||
|
{"StringEnd", `"`, Pop()},
|
||
|
{"Expr", `\${`, Push("Expr")},
|
||
|
{"Char", `[^$"\\]+`, nil},
|
||
|
},
|
||
|
"Expr": {
|
||
|
Include("Root"),
|
||
|
{`whitespace`, `\s+`, nil},
|
||
|
{`Oper`, `[-+/*%]`, nil},
|
||
|
{"Ident", `\w+`, nil},
|
||
|
{"ExprEnd", `}`, Pop()},
|
||
|
},
|
||
|
})
|
||
|
```
|
||
|
|
||
|
### Example simple/non-stateful lexer
|
||
|
<a id="markdown-example-simple%2Fnon-stateful-lexer" name="example-simple%2Fnon-stateful-lexer"></a>
|
||
|
|
||
|
The Stateful lexer is now the only custom lexer supported by Participle, but
|
||
|
most parsers won't need this level of flexibility. To support this common
|
||
|
case, which replaces the old `Regex` and `EBNF` lexers, you can use
|
||
|
`stateful.MustSimple()` and `stateful.NewSimple()`.
|
||
|
|
||
|
eg. The lexer for a form of BASIC:
|
||
|
|
||
|
```go
|
||
|
var basicLexer = stateful.MustSimple([]stateful.Rule{
|
||
|
{"Comment", `(?i)rem[^\n]*`, nil},
|
||
|
{"String", `"(\\"|[^"])*"`, nil},
|
||
|
{"Number", `[-+]?(\d*\.)?\d+`, nil},
|
||
|
{"Ident", `[a-zA-Z_]\w*`, nil},
|
||
|
{"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
|
||
|
{"EOL", `[\n\r]+`, nil},
|
||
|
{"whitespace", `[ \t]+`, nil},
|
||
|
})
|
||
|
```
|
||
|
### Experimental - code generation
|
||
|
<a id="markdown-experimental---code-generation" name="experimental---code-generation"></a>
|
||
|
|
||
|
Participle v2 now has experimental support for generating code to perform
|
||
|
lexing. Use `participle/experimental/codegen.GenerateLexer()` to compile a
|
||
|
`stateful` lexer to Go code.
|
||
|
|
||
|
This will generally provide around a 10x improvement in lexing performance
|
||
|
while producing O(1) garbage.
|
||
|
|
||
|
## Options
|
||
|
<a id="markdown-options" name="options"></a>
|
||
|
|
||
|
The Parser's behaviour can be configured via [Options](https://pkg.go.dev/github.com/alecthomas/participle/v2#Option).
|
||
|
|
||
|
## Examples
|
||
|
<a id="markdown-examples" name="examples"></a>
|
||
|
|
||
|
There are several [examples](https://github.com/alecthomas/participle/tree/master/_examples) included:
|
||
|
|
||
|
Example | Description
|
||
|
--------|---------------
|
||
|
[BASIC](https://github.com/alecthomas/participle/tree/master/_examples/basic) | A lexer, parser and interpreter for a [rudimentary dialect](https://caml.inria.fr/pub/docs/oreilly-book/html/book-ora058.html) of BASIC.
|
||
|
[EBNF](https://github.com/alecthomas/participle/tree/master/_examples/ebnf) | Parser for the form of EBNF used by Go.
|
||
|
[Expr](https://github.com/alecthomas/participle/tree/master/_examples/expr) | A basic mathematical expression parser and evaluator.
|
||
|
[GraphQL](https://github.com/alecthomas/participle/tree/master/_examples/graphql) | Lexer+parser for GraphQL schemas
|
||
|
[HCL](https://github.com/alecthomas/participle/tree/master/_examples/hcl) | A parser for the [HashiCorp Configuration Language](https://github.com/hashicorp/hcl).
|
||
|
[INI](https://github.com/alecthomas/participle/tree/master/_examples/ini) | An INI file parser.
|
||
|
[Protobuf](https://github.com/alecthomas/participle/tree/master/_examples/protobuf) | A full [Protobuf](https://developers.google.com/protocol-buffers/) version 2 and 3 parser.
|
||
|
[SQL](https://github.com/alecthomas/participle/tree/master/_examples/sql) | A *very* rudimentary SQL SELECT parser.
|
||
|
[Stateful](https://github.com/alecthomas/participle/tree/master/_examples/stateful) | A basic example of a stateful lexer and corresponding parser.
|
||
|
[Thrift](https://github.com/alecthomas/participle/tree/master/_examples/thrift) | A full [Thrift](https://thrift.apache.org/docs/idl) parser.
|
||
|
[TOML](https://github.com/alecthomas/participle/tree/master/_examples/toml) | A [TOML](https://github.com/toml-lang/toml) parser.
|
||
|
|
||
|
Included below is a full GraphQL lexer and parser:
|
||
|
|
||
|
```go
|
||
|
package main
|
||
|
|
||
|
import (
|
||
|
"fmt"
|
||
|
"os"
|
||
|
|
||
|
"github.com/alecthomas/kong"
|
||
|
"github.com/alecthomas/repr"
|
||
|
|
||
|
"github.com/alecthomas/participle/v2"
|
||
|
"github.com/alecthomas/participle/v2/lexer"
|
||
|
"github.com/alecthomas/participle/v2/lexer/stateful"
|
||
|
)
|
||
|
|
||
|
type File struct {
|
||
|
Entries []*Entry `@@*`
|
||
|
}
|
||
|
|
||
|
type Entry struct {
|
||
|
Type *Type ` @@`
|
||
|
Schema *Schema `| @@`
|
||
|
Enum *Enum `| @@`
|
||
|
Scalar string `| "scalar" @Ident`
|
||
|
}
|
||
|
|
||
|
type Enum struct {
|
||
|
Name string `"enum" @Ident`
|
||
|
Cases []string `"{" @Ident* "}"`
|
||
|
}
|
||
|
|
||
|
type Schema struct {
|
||
|
Fields []*Field `"schema" "{" @@* "}"`
|
||
|
}
|
||
|
|
||
|
type Type struct {
|
||
|
Name string `"type" @Ident`
|
||
|
Implements string `( "implements" @Ident )?`
|
||
|
Fields []*Field `"{" @@* "}"`
|
||
|
}
|
||
|
|
||
|
type Field struct {
|
||
|
Name string `@Ident`
|
||
|
Arguments []*Argument `( "(" ( @@ ( "," @@ )* )? ")" )?`
|
||
|
Type *TypeRef `":" @@`
|
||
|
Annotation string `( "@" @Ident )?`
|
||
|
}
|
||
|
|
||
|
type Argument struct {
|
||
|
Name string `@Ident`
|
||
|
Type *TypeRef `":" @@`
|
||
|
Default *Value `( "=" @@ )`
|
||
|
}
|
||
|
|
||
|
type TypeRef struct {
|
||
|
Array *TypeRef `( "[" @@ "]"`
|
||
|
Type string ` | @Ident )`
|
||
|
NonNullable bool `( @"!" )?`
|
||
|
}
|
||
|
|
||
|
type Value struct {
|
||
|
Symbol string `@Ident`
|
||
|
}
|
||
|
|
||
|
var (
|
||
|
graphQLLexer = stateful.MustSimple([]stateful.Rule{
|
||
|
{"Comment", `(?:#|//)[^\n]*\n?`, nil},
|
||
|
{"Ident", `[a-zA-Z]\w*`, nil},
|
||
|
{"Number", `(?:\d*\.)?\d+`, nil},
|
||
|
{"Punct", `[-[!@#$%^&*()+_={}\|:;"'<,>.?/]|]`, nil},
|
||
|
{"Whitespace", `[ \t\n\r]+`, nil},
|
||
|
})
|
||
|
parser = participle.MustBuild(&File{},
|
||
|
participle.Lexer(graphQLLexer),
|
||
|
participle.Elide("Comment", "Whitespace"),
|
||
|
participle.UseLookahead(2),
|
||
|
)
|
||
|
)
|
||
|
|
||
|
var cli struct {
|
||
|
EBNF bool `help"Dump EBNF."`
|
||
|
Files []string `arg:"" optional:"" type:"existingfile" help:"GraphQL schema files to parse."`
|
||
|
}
|
||
|
|
||
|
func main() {
|
||
|
ctx := kong.Parse(&cli)
|
||
|
if cli.EBNF {
|
||
|
fmt.Println(parser.String())
|
||
|
ctx.Exit(0)
|
||
|
}
|
||
|
for _, file := range cli.Files {
|
||
|
ast := &File{}
|
||
|
r, err := os.Open(file)
|
||
|
ctx.FatalIfErrorf(err)
|
||
|
err = parser.Parse(file, r, ast)
|
||
|
r.Close()
|
||
|
repr.Println(ast)
|
||
|
ctx.FatalIfErrorf(err)
|
||
|
}
|
||
|
}
|
||
|
```
|
||
|
|
||
|
## Performance
|
||
|
<a id="markdown-performance" name="performance"></a>
|
||
|
|
||
|
One of the included examples is a complete Thrift parser
|
||
|
(shell-style comments are not supported). This gives
|
||
|
a convenient baseline for comparing to the PEG based
|
||
|
[pigeon](https://github.com/PuerkitoBio/pigeon), which is the parser used by
|
||
|
[go-thrift](https://github.com/samuel/go-thrift). Additionally, the pigeon
|
||
|
parser is utilising a generated parser, while the participle parser is built at
|
||
|
run time.
|
||
|
|
||
|
You can run the benchmarks yourself, but here's the output on my machine:
|
||
|
|
||
|
BenchmarkParticipleThrift-12 5941 201242 ns/op 178088 B/op 2390 allocs/op
|
||
|
BenchmarkGoThriftParser-12 3196 379226 ns/op 157560 B/op 2644 allocs/op
|
||
|
|
||
|
On a real life codebase of 47K lines of Thrift, Participle takes 200ms and go-
|
||
|
thrift takes 630ms, which aligns quite closely with the benchmarks.
|
||
|
|
||
|
## Concurrency
|
||
|
<a id="markdown-concurrency" name="concurrency"></a>
|
||
|
|
||
|
A compiled `Parser` instance can be used concurrently. A `LexerDefinition` can be used concurrently. A `Lexer` instance cannot be used concurrently.
|
||
|
|
||
|
## Error reporting
|
||
|
<a id="markdown-error-reporting" name="error-reporting"></a>
|
||
|
|
||
|
There are a few areas where Participle can provide useful feedback to users of your parser.
|
||
|
|
||
|
1. Errors returned by [Parser.Parse*()](https://pkg.go.dev/github.com/alecthomas/participle/v2#Parser.Parse) will be of type [Error](https://pkg.go.dev/github.com/alecthomas/participle/v2#Error). This will contain positional information where available.
|
||
|
2. Participle will make a best effort to return as much of the AST up to the error location as possible.
|
||
|
3. Any node in the AST containing a field `Pos lexer.Position` will be automatically
|
||
|
populated from the nearest matching token.
|
||
|
4. Any node in the AST containing a field `EndPos lexer.Position` will be
|
||
|
automatically populated from the token at the end of the node.
|
||
|
5. Any node in the AST containing a field `Tokens []lexer.Token` will be automatically
|
||
|
populated with _all_ tokens captured by the node, _including_ elided tokens.
|
||
|
|
||
|
These related pieces of information can be combined to provide fairly comprehensive error reporting.
|
||
|
|
||
|
## Limitations
|
||
|
<a id="markdown-limitations" name="limitations"></a>
|
||
|
|
||
|
Internally, Participle is a recursive descent parser with backtracking (see
|
||
|
`UseLookahead(K)`).
|
||
|
|
||
|
Among other things, this means that they do not support left recursion. Left
|
||
|
recursion must be eliminated by restructuring your grammar.
|
||
|
|
||
|
## EBNF
|
||
|
<a id="markdown-ebnf" name="ebnf"></a>
|
||
|
|
||
|
Participle supports outputting an EBNF grammar from a Participle parser. Once
|
||
|
the parser is constructed simply call `String()`.
|
||
|
|
||
|
Participle also [includes a parser](https://pkg.go.dev/github.com/alecthomas/participle/v2/ebnf) for this form of EBNF (naturally).
|
||
|
|
||
|
eg. The [GraphQL example](https://github.com/alecthomas/participle/blob/master/_examples/graphql/main.go#L15-L62)
|
||
|
gives in the following EBNF:
|
||
|
|
||
|
```ebnf
|
||
|
File = Entry* .
|
||
|
Entry = Type | Schema | Enum | "scalar" ident .
|
||
|
Type = "type" ident ("implements" ident)? "{" Field* "}" .
|
||
|
Field = ident ("(" (Argument ("," Argument)*)? ")")? ":" TypeRef ("@" ident)? .
|
||
|
Argument = ident ":" TypeRef ("=" Value)? .
|
||
|
TypeRef = "[" TypeRef "]" | ident "!"? .
|
||
|
Value = ident .
|
||
|
Schema = "schema" "{" Field* "}" .
|
||
|
Enum = "enum" ident "{" ident* "}" .
|
||
|
```
|
||
|
|
||
|
## Syntax/Railroad Diagrams
|
||
|
<a id="markdown-syntax%2Frailroad-diagrams" name="syntax%2Frailroad-diagrams"></a>
|
||
|
|
||
|
Participle includes a [command-line utility]() to take an EBNF representation of a Participle grammar
|
||
|
(as returned by `Parser.String()`) and produce a Railroad Diagram using
|
||
|
[tabatkins/railroad-diagrams](https://github.com/tabatkins/railroad-diagrams).
|
||
|
|
||
|
Here's what the GraphQL grammar looks like:
|
||
|
|
||
|
![EBNF Railroad Diagram](railroad.png)
|