Each PEG notation describes a set of strings with conditions on the context in which they occur.
Concrete strings: 'string' or "string" literally denotes the 6 character string given.
Classes of characters: [aeiou] describes the set of one character strings which are either
a, e, i, o, or u. Ranges can appear: [a-zA-z] describes the union of the sets of lower case letters and upper case letters, considered as one character strings.
If A and B are PEG notations, (A B) denotes a string of class A followed by a string of class B (in which the string of class A is the preferred string of this class read from the beginning of the source string).
If A and B are PEG notations, (A / B) denotes a string of either class A or a string of class B, with a string of class A being read by preference if possible. The fact that a preference is indicated in alternative lists makes PEG reading deterministic (in a sense, there are no ambiguities for a PEG grammar). The problem corresponding to ambiguity in a BNF grammar is incorrectly ordered lists of alternatives.
If A is a PEG notation, (A)? represents a string of class A (preferred) or an empty string if there is no string of class A: this represents optional appearance of A. (A)* represents zero or more consecutive strings of class A (as many as possible) and (A)+ represents one or more consecutive such strings.
If A is a PEG notation, &(A) represents a length 0 string which is followed by a string of class A, and !(A) represents a length 0 string which is not followed by a string of class A. This gives us powerful lookahead features: for example, ((A)! B represents a string of class B whose beginning is not also the beginning of a string of class
A: it is tempting but not accurate to say that it does not have an initial segment of class A, because detection of a string of class A longer than the string of class B read would cause reading of this class to fail.
The period . represents the class of single characters (so !. is end of text).
New notations are introduced by lines
class_name <- PEG notation:
this is not just an abbreviation facility because such definitions may be mutually recursive.
A PEG notation applied to a source string will give either failure or a uniquely determined initial string of the source (parsed suitably); in a sense PEG is unambiguous. What corresponds as an issue to ambiguity for a BNF grammar is
inappropriate choice of order of alternatives in PEG disjunctions (A / B): what often represents a problem with a grammar is what I call "preemption", where an earlier alternative reads an initial segment of a string where a later alternative could have read more of it.
It's possible to have a PEG go into an infinite loop and fail to produce a parse. My PEG generator has a termination checker, so the Loglan grammar does not have these problems. I have contemplated writing a preemption checker, but this is a rather difficult problem.
Dated updates now to appear here
a note, not reflecting a modification. I'm wondering whether the pause required in [fo tonira] ([fotonira] means something quite
different) should be a mandatory comma pause. It looks as if it might not be hard to implement.
a note: there is a problem with interaction of quoted forms with alien text operators.
1/21/2022 Starting a literate programming exercise: turn this document into HTML while preserving its performance as a PEG grammar.
#Also note that the alternative version is now turned off. The only component present is [gaa] and I do not see a reason for anyone to use it.
#The alternative parser is readily turned back on by changing the line statement1x. This version labels the default stressed syllable in a predicate in the PhoneticComplex parse.
a serious problem with ICA, an actual ambiguity which has existed since the beginning of the language,
hopefully fixed: the fix is that an apparent ICA initial utterance which could without the period be
a continuation of a sentence is read as such. The important point is that there is no audible difference
between comma followed by ICA and period followed by ICA: we solve the problem by reading the latter
as the former where possible.
11/24/2021 KIA, the one "word" deletion operator, is installed. What it actually does is a bit subtle.
2/4/2021 Imposed the rule that two final consonants cannot be consonants from voiced/unvoiced pairs
with different voice. Also forbid second final consonant to be h.
I have further fine-tuning of djifoa gluing in mind.
Allow the -r glue to be expressed as
-rr after all mandatory monosyllables, removing the annoying pronunciation problem?
I was thinking of allowing -hy gluing in other contexts, but it is actually a bad idea.
9/15/2019 installed semantic case tags with order distinctions for use with predicates with more than one argument of the same case.
one solution is beucine, beucito... another is beuzi, beuza, beuzu.
4/28/2019 Various debugging of the new predicate algorithm. Added CVVhy as a glued form for CVV djifoa.
added capitalization of djifoa glue! Confirming my apparent earlier decision that a CVV(h)y djifoa must be followed
by a full predicate complex.
4/26/2019: this incorporates various revisions to the phonetics, correcting errors or clarifying rules,
motivated by my development of the phonetics section of a new grammar document. The one notable
change is that [ci] is now only a name marker if followed by an explicit pause. This only requires
changes in writing in serial names. In speech, it is recommended that one not pause after [ci]
except before a name word. The benefit is that non-serial-name related uses of [ci] no longer
threaten mysterious needs to add explicit pauses before following name words.
I want to add the [zao] proposal of John Cowan. Done, 4/15/2019. the imperative pronoun [koo] has been added though not officially. I should also add [dao] for the dummy argument, but not today (it is in as of 4/18)
#4/25 Making note of the idea that [ci] should not be a name marker unless followed
by a pause. This would require that one pause before ci-marked names and it would
remove some very confusing corrections for the false name marker problem. If we
required the pause to be explicit we would be imposing the expectation that whitespace
after [ci] is not a pause. Otherwise we could encourage writing a juncture after [ci]
to deny presence of a pause, which is reasonable considering the meanings of [ci].
I am implementing the version with explicit pauses between [ci] and names
and the directive not to pause after [ci] without explicit indication. This solution
involves rewriting existing text only in the rare instances where [ci] precedes a name.
4/25/2019 Corrected some instances of (expanded) badstress. Now forbidding (C)VVVV initial predicates. Probably I should use class badstress systematically in defining cmapua.
4/24/2019 Final consonants in syllables cannot be followed by syllabic continuants.
this rationalizes the definition of SyllableA.
4/22 I am thinking of explicitly flagging imperative sentences; not changing
the grammar but making this visible in the parse. This might also have some
effects on logical connections. 4/23 created an imperative class for atomic
imperative sentences; this has no actual effect on parses, just
organizes them in a more enlightening way.
4/17-18 2019: updates commented out which make sentpred linkable with forethought
and afterthought connectives (making some uses of [guu] to share arguments
unnecessary). There are subtleties. Basically, untensed predicates without
argument lists will be linked by A and KA series connectives. Such a linked
set can be tensed as a whole. Such a linked set will share a following termset.
This will probably change many parses in the Visit and other legacy sources.
This required some really subtle adjustments to work right, divinable from
the actual rules given. Definitely experimental.
3/9/2019 further, extended LIU1 to handle [ainoi] and its kin
(actual mod is to class Cmapua) Further, fixing mismatch
between connective and A classes. One does now have to pause
before [ha] and its compounds.
3/9/2019 repaired bugs in negative attitudinals. A pause
in a negative attitudinal of the [no, ui] form will not break
it. [ainoi] didnt work for two reasons: the clauses
in the definition of NOUI were in the wrong order, and
the connective class mistakenly included [noi] so the
phonetics checker was crashing! I had to move N and NOI
earlier to make this work. Not yet installed in the other
version.
1/26/2019 added [vie], JCB's "objective subjunctive" as a PA
class word. I should add this to the other file as well.
12/22/18: just a comment: one does not have to pause before [ha] and its compounds.
I do not know whether to fix this. One did not have to in LIP either. For the moment I will
leave it as it is. As a matter of style, one probably should pause.
10/6/18 minor adjustments, made only in this file. Allow [sujo] (a wicked thing to say). Do not
allow [futo]: suffixed conversion operators must be nu + suffix.
6/2 fixed LIO + alien text. I also fixed some other glitches described in the reference grammar.
5/11 making version without "alternative parser" features. This version allows GAA but it doesn't
do anything: the definitions of argumentA and kin are the only point of difference. Master version:
becomes "alternative" by reinstating alternative definitions of argumentA and kin. Further, made changes
recommended in the reference grammar. ALTERNATIVE -- this is actually my master version. Edit
this and revise the argumentA and kin entries to make the original version.
4/24 discovered and repaired a bug re ci-marked names suffixed to descriptions. Discovered a bug in numerical
descriptions yet to be fixed: [lio] needs to be an alien text marker, maybe taking double quotes. The description-
with-suffixed-name bug was actually quite gruesome. I think it is repaired.
4/23 streamlined definition of descriptn. Shouldn't change anything. It was remarkably tricky though; preserving the old form
in case of further trouble.
4/22 I think this will be the master grammar file, with alternative lines to turn off the
GAA-related features. (1/21/2022, they are now turned off)
4/22 allowing general predicates in gasent1. This removes an extreme oddity in parsing of imperatives.
I do not see any new dangers from this.
4/22 I changed the final element of a keksent to be a sentence (new class uttA0), not a general sentence fragment.
several parse errors in the Visit were uncovered by this.
4/22: note that I still have the obligation to restore the [zao] construction.
4/9/2018 the large subject marker GAA can also be used to defend the beginnings of gasents and imperatives
from absorbing trailing arguments into an unintended statement. In this context [gaa] may be followed by [ga] ;-)
4/8/2018 this is an alternative version in which an argument which starts an SVO sentence will not be accepted
as a trailing argument of a previous sentence. This allows neat termination of [lepo] clauses preceding
a subject, for example. Unlike the previous alternative approach, this seems to involve a single fairly
tidy change: it is all an issue of avoiding needs for explicit closure. Further refinement: SVO sentences
can be marked with GAA (which is not a tense: it appears optionally just before the predicate, or just
before sutori arguments marked with GIO if there are any), the "large subject marker": an argument which
starts an SVO sentence *not marked with GAA* will not be accepted as a trailing argument of a previous
sentence. This is a sufficiently complex grammar change that it requires thought: it is not conservative
in my usual sense. The fact that GAA carries a mandatory stress is virtuous. Its resemblance to the
particle GA when used as a tense is not a bad thing: it would often be used instead of GA to close
a [lepo] clause appearing as a subject, and it is perhaps better for that purpose. Note that GAA can
and often will be followed by a tense. This grammar change depends strongly on the previous ruling that the O in
SOV(O) sentences must be marked with [gio]: S gio O^n V (O^m).
nuu is an atomic A core and there is no nu-affix to A connectives and their kin
1/20/2018 redefined CA cores to include a possible NU prefix. This allows more logically connected tenses, for example.
1/13/2018 reorganized the internals of class PA in a way which should allow more things and not forbid anything legal now.
this is pursuant on an analysis of the classes NI and PA as phrases, rather than words, as I start writing a global lexicography
proposal document. Enforced explicit pauses after PA phrases appearing as arguments with a following modifier with an argument.
12/30/2017 fixed a problem with name markers in the clas NameWord and made a slight change to the new option in NI (names
as dimensions).
12/27/2017 installing an alternative treatment of acronyms under which they are simply names (suffix -n to acronyms in all uses).
supporting this requires no change at all to acronymic name usage (just use the -n versions with the usual rules for names),
and for dimension usage requires [mue] to be a name marker and support for [mue] PreName as an alternative suffix to NI.
12/27/2017 Frivolously fooling with the capitalization conventions. They ought to work better now...but I could have broken something.
the main new idea was to require that a capitalized embedded letteral actually be followed by lowercase if it was preceded by lowercase
(with the obvious exception for a letteral followed by a letteral). Also changed the rules for diphthongs in cmapua to make all-caps
legal for cmapua. The general idea is that one can start with a capital letter and stay capitalized until one hits a lower case letter,
at which point one can jump back up to caps only at a juncture (after which you can remain capitalized) or temporarily for a vowel
after z- (after which lower case resumes) or an embedded literal (after which lowercase resumes). The total effect is that this allows
attested capitalization patterns in Loglan (including capitalization of embedded literals as in possessive articles and acronyms)
and also allows all-caps for individual words (attested in Leith but suppressed in my version) and supports capitalization of components
of names as in [la Beibi-Djein] (by artful use of syllable breaks: Leith just has BeibiDjein, which does not work for me).
12/26/2017 Installed [niu] (quotation of phonetically legal but so far non-Loglan words). I did not make [niu] a name marker, so if one were to
use it with names (where it isn't really appropriate), one would have to pause initially: [niu, Djan].
I note in this connection that quotation of names with li...lu remains limited, since names by themselves are not
utterances: one needs the [la]. I fixed this as an exception in the previous parser; I may do it here or I may
not, haven't decided. Single name words can be quoted with [liu], of course, but not serial names.
12/24/2017 Refined treatment of vowel pairs for Cvv-V cmapua units. First 12/24 version rather disastrously
broken: this should be fixed!
12/23/2017 This is now completely commented, with minor local exceptions to which I will return later.
This document is the basis on which I will build all subsequent parsers, with due modifications to the comments.
The Python PEG engine and preamble files contain commands for constructinging a Python parser from it directly.
12/22/2017 major progress on commenting the grammar
yet later 12/20: no change in performance of the grammar, extensive commenting in the
grammar section. Considerable changes in arrangement: for example, vocatives, inverse vocatives,
and free modifiers are moved to a much earlier point. I'm hoping to get a genuinely almost readable
commented grammar...
later 12/20 starting the process of commenting and editing the grammar, starting
at basic sentence structures. Notably rewrote the class [keksent] more compactly,
one hopes with no actual effect on parses.
12/20/2017 Do not require expression of pause after finally stressed cmapua before
vowel initial predicate as a comma, since the initial vowel signals the pause anyway.
Allow final stress in names. Fixed bug in CVVHiddenStress. Prevented
broken monosyllables in finally stressed CVV djifoa. refinement of caprule
12/19/2017 seem to have had a versioning failure and lost the fix which requires
CVVy djifoa to be followed by complete complexes. Restored.
12/18/2017 fixed a bug in treatment of stressed syllables in recognizing predicate starts. Also
narrowed the generalized VCCV rule to allow more of the quite unlikely space of predicates with lots
of vowels before the CC pair. Probably they should be banned (and none have ever been proposed with
more than three) but that rule is not the context in which to arbitrarily ban half of them. Some cleanup
of the display of parses, for which updated version of logicpreamble.py should also be uploaded. A refinement
to class "connective" checking that apparent logical connectives are not initial segments of predicates.
This has the effect of delaying the declaration of "connective" until after the declaration of
"predstart".
12/17/2017 further refinement of the 12/16 version: a couple of bugs spotted.
12/16/2017 There should be no change in parsing behavior, but the predstart ruleset is shorter
and more intelligible, and I realized that Complex doesnt need a check for the anti-slinkui test
(the requirement that certain initial CVC cmapua be y hypenated which replaces the slinkui test))
at all: the way predstart works already ensures that initial CV cmapua fall off in the excluded
cases, the idea being that we test the front of a predicate without lookahead in all cases. Also
addressed the subtle point that one wasn't forced to pause after a predicate before following y
(not likely to arise as a problem).
12/14/2017 Corrected vowel grouping to avoid paradoxical vowel triples which are default
grouped in a way which becomes illegal if made explicit. SyllableA really should contain a final
consonant: the previous form was messing up vowel grouping. Serious bug where end of djifoa
and syllable resolution of a predicate may fail to agree. I think I blocked this by ensuring that
final djifoa are not followed by vowels. Other fine tuning of the complex algorithm. Also had
to repair the check for CVCCCV and CVCCVV predicates.
12/13/2017: added kie ( utterance ) kiu to class LiQuote. Did fine tuning to ensure