A library for text parsing.
To use this package, put the following dependency into your project's dependencies section:
'parsed' is a text parsing library for D programming language. It allows to combine basic parsers in various ways, creating thus more complex and elaborate ones.
'parsed' consists of two modules:
parsed.extras. First one
contains interfaces, structures and basic parsers used throughout 'parsed',
while the second one contains parsers that are more likely to crop up in real
core publicly, so you don't have to import
core if you use
extras is probably the module you want to use,
unless you enjoy writing most parsers from scratch.
All parsers inherit from Parser class, which is parametric in two things: the
type of a value the parser builds up as it works, and type of the strings the
parser can work with. String type defaults to
string, value type has to be
declared explicitly. Most parser-producing functions are parametric in the same
way, except for those that take character type instead of string type.
To perform a parse, you need a parser and an initial state to feed to that
parser. Parser state is represented by a ParserState struct, which is
parametric in the same way as parsers themselves. Constructing a ParserState
looks like this:
auto state = ParserState!SomeType(someString). To perform a
parse now, you need to
auto res = someParser.run(state). Then you'll want to
test if the parser succeeded (which is done via
.success property of
ParserState) and to get the value (of type
SomeType) built by the parser,
which is done via
.value property of ParserState. To build a value, you'll
have to use
% operator described below. Parsed string can be accesses via
.parsed property of ParserState.
Four operators provide a way to build complex parsers from simpler ones.
*is applied to two parsers. It runs both of them and concatenates together the strings they parsed (if they've succeeded, of course). So if you do
auto p = literal!SomeType("foo") * literal!SomeType("bar");, then
pwill succeed on any string starting with "foobar", and
.parsedwill also be "foobar".
/is also applied to two parsers. It runs both of them and discards first one's parsed string, resulting in only second parser's parsed string being passed down the parser chain. So if you do
auto p = literal!SomeType("foo") / literal!SomeType("bar");, then
pwill succeed on any string starting with "foobar", but
.parsedwill be just "bar".
%is the most important operator. It is applied to a parser on the left side and a function that takes chain's built-up value and this parser's parsed string. The function should return new build-up value, which will replace that of the whole parser chain. So if you do
auto p = literal!int("12") % (res, i) => to!int(i);then on success
.valueof resulting ParserState will be 12.
|is applied to two parsers. It runs the first one, and if it fails, it runs the other one. So
auto p = literal!int("foo") | literal!int("bar");will match both strings starting with "foo" and strings starting with "bar".
For more complex examples, see 'examples' folder. Unittests in the source files also provide several, but they are not very elaborate.
By default all parsers do not perform any lookahead, consuming their input
regardless of whether or not the parsers that follow would match the remaining
input. Sometimes, a different behaviour is desirable, when parsers in the
middle of a chain yield some of the input to the parsers after them. Two
methods of the Parser class accomplish this,
The former creates a copy of the caller parser that performs lookahead and is
reluctant - it'll consume as little input as it can, but may consume more than
possible if it makes the following parsers succeed. The latter works in the
same vein, but the output parser is greedy - it'll consume as much input as it
can, but may consume less than possible if it makes the following parsers
succeed. For examples of these, see unittests below ParserGroup class.
Sometimes a parser needs to be able to operate even if the chain before it has
failed. Normal parsers cannot do that, chaining rules (and, for built-in
parsers, coded-in short-circuiting) prevent such behaviour. To override this
makeOblivious() method of Parser class is used. It returns a new parser that
is oblivious, that is, will work even if the chain before has failed. It may
fail or succeed as usual. The input it'll receive will be the result of the
last successful parse.
These are the parsers that are defined in
auto literal(B, S = string)(S str, bool consumeInput = true, bool caseSensitive = true).
Matches a literal string. Optionally may not consume input and disregard case.
.parsed is the matched string.
auto fail(B, S = string)().
auto succeed(B, S = string)().
auto test(B, S = string)(bool delegate (B, S) test).
Runs built-up value and previous parser's
.parsed through the given
function. Succeeds with
.parsed inherited from the previous parsed if the
true, fails otherwise.
auto build(B, S = string)(B delegate (B, S) dg).
Builds a value just like
% operator does. However,
build can be the first
element in a parser chain, unlike
%, which requires a parser to be in the
chain before it.
auto force(B, S = string)().
Makes a copy of
.left. It can be useful if original text takes
a lot of memory, while only a portion of it is left/parsed. Using 'force' can
hint garbage collector that it's time to collect original text.
auto many(B, S = string)(int min, int max, Parser!(B, S) p).
Uses the given parser between
max times (if
min is negative,
there's no lower limit, if
max is negative, there's no upper limit). Succeeds
if required minimum of successful applications of
p is achieved.
concatenated from each run's
auto absorb(B, B2, S = string)(B delegate (B, B2, S) dg, Parser!(B2, S) subparser).
This parser's job is to make parsing nested structures easier. It allows you to
run a subparser on the current input and then appropriate its resulting
.value into the main parser chain's
.value. The delegate this function
expects as the first argument should perform said appropriation, returning new
.value for the main chain. "struct" example makes use of this function.
auto morph(B, S = string)(S delegate (S) dg).
Takes previous parser's
.parsed and transforms it using the given delegate.
.parsed is the delegate's return value.
auto singleChar(B, C = char)(bool delegate (C) test).
Parses a single character. Succeeds if the character passes the given test, fail on empty string or if the test is not passed.
auto throwOnSuccess(B, S = string)(Exception exc).
Throws a given exception if the chain before has succeeded.
auto throwOnFailure(B, S = string)(Exception exc).
Throws a given exception if the chain before has failed.
auto throwAnyway(B, S = string)(Exception exc).
Throws a given exception.
auto everything(B, S = string)().
Matches until the end of input. Always succeeds.
except (1st overload)
auto except(E, B, S = string)(Parser!(B, S) main, Parser!(B, S) onException).
Catches an exception of type E that might occur in
main. If such an exception
is thrown, runs
onException on the original input.
except (2nd overload)
auto except(E, B, S = string)(Parser!(B, S) main).
Catches an exception of type E that might occur in
main. If such an exception
is thrown, this overload fails.
auto charWhile(B, C = char)(bool delegate (C) test, bool keepTerminator = true).
Parses input one character at a time while a given condition is met.
is concatenation of all parsed characters, optionally without the character
that failed the test. Note that
keepTerminator also affects whether or not
the terminating character remains in the input after the parser is done.
auto charUntil(B, C = char)(bool delegate (C) test, bool keepTerminator = true).
charWhile, but parses until a condition is met. Always succeeds.
auto repeatWhile(B, S = string)(bool delegate (B, S, int) test, Parser!(B, S) p).
Runs the same parser for as long as a condition is met. The condition takes
three arguments: built-up value,
.parsed of the current run, and iteration's
.parsed is concatenation of individual runs'
auto repeatUntil(B, S = string)(bool delegate(B, S, int) test, Parser!(B, S) p).
repeatWhile, but parses until a condition is met. Always succeeds.
These parsers are defined in
auto whitespace(B, C = char)(bool acceptNewline = false).
Parses a single character of whitespace (as defined by
default fails on newline characters.
auto nonwhite(B, C = char)().
Parses a single character of anything but whitespace.
auto alnum(B, C = char)().
Parses a single alphanumeric character (as defined by
auto alpha(B, C = char)().
Parses a single alphabetic character (as defined by
auto digit(B, C = char)().
Parses a single decimal digit.
auto hexdigit(B, C = char)().
Parses a single hexadecimal digit (both lower- and upper-case).
auto newline(B, C = char)().
Parses a single '\n' or '\r'.
auto line(B, C = char)(bool keepTerminator).
Parses a whole line. Note that 'keepTerminator' parameter only affects whether
or not newline is included in the
.parsed, it's always removed from the
input. Always succeeds.
auto someWhite(B, C = char)(bool acceptNewlines).
Parses as many whitespace characters as it can, but no less than one. Optionally parses newlines as well.
auto maybeWhite(B, C = char)(bool acceptNewlines).
Parses as many whitespace characters as it can, maybe even 0. Optionally parses newlines as well.
auto someNewlines(B, C = char)().
Parses as many newline characters as it can, but no less than one.
auto maybeNewlines(B, C = char)().
Parses as many newline characters as it can, maybe even 0.
auto word(B, C = char)(Word type, int minLength = 1, int maxLength = -1).
Parses a word. The definition of a word is determined by the
Word.any then a word is a sequence of non-whitespace characters.
Word.alnum then a word is a sequence of alphanumeric characters.
Word.alpha then a word is a sequence of alphabetic characters.
The sense of
maxLength is the same as for
many. Note that
maxLength cuts a word in two, this parser will still succeed with the
first bit as
auto number(B, C = char)().
Parses a decimal integral number (no floats).
auto hexnum(B, C = char)().
Parser a hexadecimal number (both upper- and lower-case). A number may begin
with "0x", but it's not included in the
auto maybe(B, S = string)(Parser!(B, S) p).
Uses the given parser zero or one time. This is convenience wrapper over
many(0, 1, parser).
balanced (1st overload)
auto balanced(B, C = char)(C left, C right, bool keepPair = false).
Parses text between a balanced pair of symbols. May either include or not the
starting and terminating
right respectively. Note that
has effect only on
.parsed, not on the state of the input, the pair is
always removed from it.
balanced (2nd overload)
auto balanced(B, S = string)(Parser!(B, S) left, Parser!(B, S) right, bool keepPair = false).
Acts just like the first overload, except it uses parsers to determine which bits of text will serve as left and right components of a pair. Note that it will call both parsers on every character in between left and right, so it's going to be extremely slow for longer strings.
auto upTo(B, S = string)(Parser!(B, S) parser, book keepTerminator = false).
Parses text until a given parser matches. Fails if there's no match. Note that it will call the given parser on every character of the string until a portion of it matches, so it can be very slow for longer strings and more complex parsers.
auto multiliteral(B, S = string, R)(R range, bool consumeInput = true, bool caseSensitive = true).
A convenience parser to search for any of the parsers given as a range. Just
literal may optionally not consume input and be case insensitive.
- Registered by Michail Pevnev
- 0.4.0 released 2 years ago
- Copyright © 2017, michail