The Loglan PEG Grammar

In this file I will develop the entire Loglan grammar on top of the phonetic proposal

PEG notation

A PEG (Parsing Expression Grammar) is made up of lines of the form class_name <- PEG notation Each PEG notation describes a set of strings with conditions on the context in which they occur.

Concrete strings: 'string' or "string" literally denotes the 6 character string given.

Classes of characters: [aeiou] describes the set of one character strings which are either a, e, i, o, or u. Ranges can appear: [a-zA-z] describes the union of the sets of lower case letters and upper case letters, considered as one character strings.

If A and B are PEG notations, (A B) denotes a string of class A followed by a string of class B (in which the string of class A is the preferred string of this class read from the beginning of the source string).

If A and B are PEG notations, (A / B) denotes a string of either class A or a string of class B, with a string of class A being read by preference if possible. The fact that a preference is indicated in alternative lists makes PEG reading deterministic (in a sense, there are no ambiguities for a PEG grammar). The problem corresponding to ambiguity in a BNF grammar is incorrectly ordered lists of alternatives.

If A is a PEG notation, (A)? represents a string of class A (preferred) or an empty string if there is no string of class A: this represents optional appearance of A. (A)* represents zero or more consecutive strings of class A (as many as possible) and (A)+ represents one or more consecutive such strings.

If A is a PEG notation, &(A) represents a length 0 string which is followed by a string of class A, and !(A) represents a length 0 string which is not followed by a string of class A. This gives us powerful lookahead features: for example, ((A)! B represents a string of class B whose beginning is not also the beginning of a string of class A: it is tempting but not accurate to say that it does not have an initial segment of class A, because detection of a string of class A longer than the string of class B read would cause reading of this class to fail.

The period . represents the class of single characters (so !. is end of text).

New notations are introduced by lines

class_name <- PEG notation:

this is not just an abbreviation facility because such definitions may be mutually recursive.

A PEG notation applied to a source string will give either failure or a uniquely determined initial string of the source (parsed suitably); in a sense PEG is unambiguous. What corresponds as an issue to ambiguity for a BNF grammar is inappropriate choice of order of alternatives in PEG disjunctions (A / B): what often represents a problem with a grammar is what I call "preemption", where an earlier alternative reads an initial segment of a string where a later alternative could have read more of it.

It's possible to have a PEG go into an infinite loop and fail to produce a parse. My PEG generator has a termination checker, so the Loglan grammar does not have these problems. I have contemplated writing a preemption checker, but this is a rather difficult problem.

Dated updates now to appear here

a note, not reflecting a modification. I'm wondering whether the pause required in [fo tonira] ([fotonira] means something quite different) should be a mandatory comma pause. It looks as if it might not be hard to implement.

a note: there is a problem with interaction of quoted forms with alien text operators.

1/21/2022 Starting a literate programming exercise: turn this document into HTML while preserving its performance as a PEG grammar. #Also note that the alternative version is now turned off. The only component present is [gaa] and I do not see a reason for anyone to use it. #The alternative parser is readily turned back on by changing the line statement1x. This version labels the default stressed syllable in a predicate in the PhoneticComplex parse.

a serious problem with ICA, an actual ambiguity which has existed since the beginning of the language, hopefully fixed: the fix is that an apparent ICA initial utterance which could without the period be a continuation of a sentence is read as such. The important point is that there is no audible difference between comma followed by ICA and period followed by ICA: we solve the problem by reading the latter as the former where possible.

11/24/2021 KIA, the one "word" deletion operator, is installed. What it actually does is a bit subtle.

2/4/2021 Imposed the rule that two final consonants cannot be consonants from voiced/unvoiced pairs with different voice. Also forbid second final consonant to be h.

I have further fine-tuning of djifoa gluing in mind. Allow the -r glue to be expressed as -rr after all mandatory monosyllables, removing the annoying pronunciation problem? I was thinking of allowing -hy gluing in other contexts, but it is actually a bad idea.

9/15/2019 installed semantic case tags with order distinctions for use with predicates with more than one argument of the same case. one solution is beucine, beucito... another is beuzi, beuza, beuzu.

4/28/2019 Various debugging of the new predicate algorithm. Added CVVhy as a glued form for CVV djifoa. added capitalization of djifoa glue! Confirming my apparent earlier decision that a CVV(h)y djifoa must be followed by a full predicate complex.

4/26/2019: this incorporates various revisions to the phonetics, correcting errors or clarifying rules, motivated by my development of the phonetics section of a new grammar document. The one notable change is that [ci] is now only a name marker if followed by an explicit pause. This only requires changes in writing in serial names. In speech, it is recommended that one not pause after [ci] except before a name word. The benefit is that non-serial-name related uses of [ci] no longer threaten mysterious needs to add explicit pauses before following name words.

I want to add the [zao] proposal of John Cowan. Done, 4/15/2019. the imperative pronoun [koo] has been added though not officially. I should also add [dao] for the dummy argument, but not today (it is in as of 4/18)

#4/25 Making note of the idea that [ci] should not be a name marker unless followed by a pause. This would require that one pause before ci-marked names and it would remove some very confusing corrections for the false name marker problem. If we required the pause to be explicit we would be imposing the expectation that whitespace after [ci] is not a pause. Otherwise we could encourage writing a juncture after [ci] to deny presence of a pause, which is reasonable considering the meanings of [ci]. I am implementing the version with explicit pauses between [ci] and names and the directive not to pause after [ci] without explicit indication. This solution involves rewriting existing text only in the rare instances where [ci] precedes a name.

4/25/2019 Corrected some instances of (expanded) badstress. Now forbidding (C)VVVV initial predicates. Probably I should use class badstress systematically in defining cmapua.

4/24/2019 Final consonants in syllables cannot be followed by syllabic continuants. this rationalizes the definition of SyllableA.

4/22 I am thinking of explicitly flagging imperative sentences; not changing the grammar but making this visible in the parse. This might also have some effects on logical connections. 4/23 created an imperative class for atomic imperative sentences; this has no actual effect on parses, just organizes them in a more enlightening way.

4/17-18 2019: updates commented out which make sentpred linkable with forethought and afterthought connectives (making some uses of [guu] to share arguments unnecessary). There are subtleties. Basically, untensed predicates without argument lists will be linked by A and KA series connectives. Such a linked set can be tensed as a whole. Such a linked set will share a following termset. This will probably change many parses in the Visit and other legacy sources. This required some really subtle adjustments to work right, divinable from the actual rules given. Definitely experimental.

3/9/2019 further, extended LIU1 to handle [ainoi] and its kin (actual mod is to class Cmapua) Further, fixing mismatch between connective and A classes. One does now have to pause before [ha] and its compounds.

3/9/2019 repaired bugs in negative attitudinals. A pause in a negative attitudinal of the [no, ui] form will not break it. [ainoi] didnt work for two reasons: the clauses in the definition of NOUI were in the wrong order, and the connective class mistakenly included [noi] so the phonetics checker was crashing! I had to move N and NOI earlier to make this work. Not yet installed in the other version.

1/26/2019 added [vie], JCB's "objective subjunctive" as a PA class word. I should add this to the other file as well.

12/22/18: just a comment: one does not have to pause before [ha] and its compounds. I do not know whether to fix this. One did not have to in LIP either. For the moment I will leave it as it is. As a matter of style, one probably should pause.

10/6/18 minor adjustments, made only in this file. Allow [sujo] (a wicked thing to say). Do not allow [futo]: suffixed conversion operators must be nu + suffix.

6/2 fixed LIO + alien text. I also fixed some other glitches described in the reference grammar.

5/11 making version without "alternative parser" features. This version allows GAA but it doesn't do anything: the definitions of argumentA and kin are the only point of difference. Master version: becomes "alternative" by reinstating alternative definitions of argumentA and kin. Further, made changes recommended in the reference grammar. ALTERNATIVE -- this is actually my master version. Edit this and revise the argumentA and kin entries to make the original version.

4/24 discovered and repaired a bug re ci-marked names suffixed to descriptions. Discovered a bug in numerical descriptions yet to be fixed: [lio] needs to be an alien text marker, maybe taking double quotes. The description- with-suffixed-name bug was actually quite gruesome. I think it is repaired.

4/23 streamlined definition of descriptn. Shouldn't change anything. It was remarkably tricky though; preserving the old form in case of further trouble.

4/22 I think this will be the master grammar file, with alternative lines to turn off the GAA-related features. (1/21/2022, they are now turned off)

4/22 allowing general predicates in gasent1. This removes an extreme oddity in parsing of imperatives. I do not see any new dangers from this.

4/22 I changed the final element of a keksent to be a sentence (new class uttA0), not a general sentence fragment. several parse errors in the Visit were uncovered by this.

4/22: note that I still have the obligation to restore the [zao] construction.

4/9/2018 the large subject marker GAA can also be used to defend the beginnings of gasents and imperatives from absorbing trailing arguments into an unintended statement. In this context [gaa] may be followed by [ga] ;-)

4/8/2018 this is an alternative version in which an argument which starts an SVO sentence will not be accepted as a trailing argument of a previous sentence. This allows neat termination of [lepo] clauses preceding a subject, for example. Unlike the previous alternative approach, this seems to involve a single fairly tidy change: it is all an issue of avoiding needs for explicit closure. Further refinement: SVO sentences can be marked with GAA (which is not a tense: it appears optionally just before the predicate, or just before sutori arguments marked with GIO if there are any), the "large subject marker": an argument which starts an SVO sentence *not marked with GAA* will not be accepted as a trailing argument of a previous sentence. This is a sufficiently complex grammar change that it requires thought: it is not conservative in my usual sense. The fact that GAA carries a mandatory stress is virtuous. Its resemblance to the particle GA when used as a tense is not a bad thing: it would often be used instead of GA to close a [lepo] clause appearing as a subject, and it is perhaps better for that purpose. Note that GAA can and often will be followed by a tense. This grammar change depends strongly on the previous ruling that the O in SOV(O) sentences must be marked with [gio]: S gio O^n V (O^m).

nuu is an atomic A core and there is no nu-affix to A connectives and their kin 1/20/2018 redefined CA cores to include a possible NU prefix. This allows more logically connected tenses, for example.

1/13/2018 reorganized the internals of class PA in a way which should allow more things and not forbid anything legal now. this is pursuant on an analysis of the classes NI and PA as phrases, rather than words, as I start writing a global lexicography proposal document. Enforced explicit pauses after PA phrases appearing as arguments with a following modifier with an argument.

12/30/2017 fixed a problem with name markers in the clas NameWord and made a slight change to the new option in NI (names as dimensions).

12/27/2017 installing an alternative treatment of acronyms under which they are simply names (suffix -n to acronyms in all uses). supporting this requires no change at all to acronymic name usage (just use the -n versions with the usual rules for names), and for dimension usage requires [mue] to be a name marker and support for [mue] PreName as an alternative suffix to NI.

12/27/2017 Frivolously fooling with the capitalization conventions. They ought to work better now...but I could have broken something. the main new idea was to require that a capitalized embedded letteral actually be followed by lowercase if it was preceded by lowercase (with the obvious exception for a letteral followed by a letteral). Also changed the rules for diphthongs in cmapua to make all-caps legal for cmapua. The general idea is that one can start with a capital letter and stay capitalized until one hits a lower case letter, at which point one can jump back up to caps only at a juncture (after which you can remain capitalized) or temporarily for a vowel after z- (after which lower case resumes) or an embedded literal (after which lowercase resumes). The total effect is that this allows attested capitalization patterns in Loglan (including capitalization of embedded literals as in possessive articles and acronyms) and also allows all-caps for individual words (attested in Leith but suppressed in my version) and supports capitalization of components of names as in [la Beibi-Djein] (by artful use of syllable breaks: Leith just has BeibiDjein, which does not work for me).

12/26/2017 Installed [niu] (quotation of phonetically legal but so far non-Loglan words). I did not make [niu] a name marker, so if one were to use it with names (where it isn't really appropriate), one would have to pause initially: [niu, Djan].

I note in this connection that quotation of names with li...lu remains limited, since names by themselves are not utterances: one needs the [la]. I fixed this as an exception in the previous parser; I may do it here or I may not, haven't decided. Single name words can be quoted with [liu], of course, but not serial names.

12/24/2017 Refined treatment of vowel pairs for Cvv-V cmapua units. First 12/24 version rather disastrously broken: this should be fixed!

12/23/2017 This is now completely commented, with minor local exceptions to which I will return later. This document is the basis on which I will build all subsequent parsers, with due modifications to the comments. The Python PEG engine and preamble files contain commands for constructinging a Python parser from it directly.

12/22/2017 major progress on commenting the grammar

yet later 12/20: no change in performance of the grammar, extensive commenting in the grammar section. Considerable changes in arrangement: for example, vocatives, inverse vocatives, and free modifiers are moved to a much earlier point. I'm hoping to get a genuinely almost readable commented grammar...

later 12/20 starting the process of commenting and editing the grammar, starting at basic sentence structures. Notably rewrote the class [keksent] more compactly, one hopes with no actual effect on parses.

12/20/2017 Do not require expression of pause after finally stressed cmapua before vowel initial predicate as a comma, since the initial vowel signals the pause anyway. Allow final stress in names. Fixed bug in CVVHiddenStress. Prevented broken monosyllables in finally stressed CVV djifoa. refinement of caprule

12/19/2017 seem to have had a versioning failure and lost the fix which requires CVVy djifoa to be followed by complete complexes. Restored.

12/18/2017 fixed a bug in treatment of stressed syllables in recognizing predicate starts. Also narrowed the generalized VCCV rule to allow more of the quite unlikely space of predicates with lots of vowels before the CC pair. Probably they should be banned (and none have ever been proposed with more than three) but that rule is not the context in which to arbitrarily ban half of them. Some cleanup of the display of parses, for which updated version of logicpreamble.py should also be uploaded. A refinement to class "connective" checking that apparent logical connectives are not initial segments of predicates. This has the effect of delaying the declaration of "connective" until after the declaration of "predstart".

12/17/2017 further refinement of the 12/16 version: a couple of bugs spotted.

12/16/2017 There should be no change in parsing behavior, but the predstart ruleset is shorter and more intelligible, and I realized that Complex doesnt need a check for the anti-slinkui test (the requirement that certain initial CVC cmapua be y hypenated which replaces the slinkui test)) at all: the way predstart works already ensures that initial CV cmapua fall off in the excluded cases, the idea being that we test the front of a predicate without lookahead in all cases. Also addressed the subtle point that one wasn't forced to pause after a predicate before following y (not likely to arise as a problem).

12/14/2017 Corrected vowel grouping to avoid paradoxical vowel triples which are default grouped in a way which becomes illegal if made explicit. SyllableA really should contain a final consonant: the previous form was messing up vowel grouping. Serious bug where end of djifoa and syllable resolution of a predicate may fail to agree. I think I blocked this by ensuring that final djifoa are not followed by vowels. Other fine tuning of the complex algorithm. Also had to repair the check for CVCCCV and CVCCVV predicates.

12/13/2017: added kie ( utterance ) kiu to class LiQuote. Did fine tuning to ensure that cmapua streams stop before [li] or [kie], that names can stop at double quotes or close parentheses, and that the capitalization rule ignores opening parentheses as well as double quotes. One can now adorn li lu with quotes (on the inside) in a reasonable way and adorn kie kiu with parentheses (on the inside) in a reasonable way. One cannot *replace* these words (or any words) with punctuation in my model of Loglan. Also, updates to comments, and (end of utterance) added as a marker of terminal punctuation.

Comments on the initial release of this grammar

This is now done, in a first pass. That is, the grammar is adapted and appears to work, more or less. What is needed is comments on the lexicography and the grammar...Phonetics has now pretty clearly been sorted from the grammar (there are some places where the phonetics accept grammar information with regard to punctuation).

Alien text is now handled somewhat differently. Some issues to do with quoting names are not finalized and have not been tested.

I added -iy and -uy as VV forms allowed in general in cmapua but not in other words; they are always monosyllabic. What this immediately allows me to do is to give Y a name which is not phonetically irregular! [ziy] is supported: [yfi] is too, now.

capitalization is roughly back to where it was in the original, but all-caps are allowed.

acronyms are liable to be horrible.

Fixed the recursion problem in a way which will not be visible in ordinary parses. Streams of cmapua will always be broken at name or alien text markers (instead of using lookahead to check that we do not stand at the beginning of a name word or alien text word). The next cycle will then check for a name or alien text, and also check for badnamemarkers; no lookahead is happening while a stream of cmapua is being read except checking for the markers of names and alien text. This will change the way phonetic parses look (streams of cmapua will break (and sometimes resume) at name markers or alien text markers, but it will not change any grammatical parses.

Part I Phonetics

Mod bugs, I have implemented all of Loglan phonetics as described in my proposal. Borrowing djifoa are pretty tricky.

I have now parsed all the words in the dictionary, and all single words of appropriate classes parse successfully. I have added alien text and quotation constructions which do not conform to these rules; so actually all Loglan text should parse, mod some punctuation and capitalization issues. The conventions for alien text here are not the same as those in the current provisional parser.

I believe the conventions for forcing comma pauses before vowel initial cmapua and after names except in special contexts have been enforced. In a full grammar, one probably would want to disable pauses before vowel initial letterals (done). This grammar also does not support the lingering irregularities in acronyms (and won't).

This grammar (in Part I) is entirely about phonetics: all it does is parse text into names (with associated initial pauses or name markers), cmapua (qua unanalyzed streams of cmapua units), borrowings and complexes, along with interspersed comma pauses and marks of terminal punctuation. It does support conventions about where commas are required and a simple capitalization rule. Streams of cmapua break when markers initial in other forms are encountered (and may in some cases resume when the markers are a deception).

a likely locus for odd bugs is the group of predstartX rules which detect apparent cmapua which are actually preambles to predicates. These are tricky! (and I did indeed find some lingering problems when I parsed the dictionary). Another reason to watch this rule predstart is that it carries a lot of weight: !predstart is used as a lightweight test that what follows is a cmapua (a point discussed in more detail later).

In reviewing this, I think that very little is different from 1990's Loglan (the borrowing djifoa are post-1989 L1, but not my creation). Some things add precision without making anything in 1990's Loglan incorrect. The requirement that syllabic consonants be doubled is new, and makes some 1990's Loglan names incorrect. The requirement that names resolve into syllables is new, and makes some 1990's Loglan names incorrect, usually because they end in three consonants. The rule restricting final consonant pairs from being noncontinuant/continuant is new, but does not affect any actual predicate ever proposed. Enhancing the VccV rule to also forbid CVVV...ccV caused one predicate to be changed ([haiukre] became [haiukrre], and haiukre was a novelty anyway, using a new name for X in X-ray) The exact definition of syllables and use of syllable breaks and stress marks is new (the close comma was replaced with the hyphen, so Lo,is becomes Lo-is); but this does not make anything in 1990's Loglan incorrect, it merely increases precision and makes phonetic transcript possible. Forbidding doubled vowels in borrowings was new, was already approved, and caused us to change [alkooli] to [alkoholi]. Formally allowing the CVccVV and CVcccV predicates without y-hyphens took a proposal in 2013 because Appendix H was careless in describing their abandonment of the slinkui test, but the dictionary makes it evident that this was their intent all along. The slinkui test had already been abandoned in the 1990s. Formally abandoning qwx was already something that the dictionary workers in the 1990's were working on; we completed it. Allowing glottal stop in vowel pairs and forbidding it as an allophone of pause is a new phonetic feature in the proposal but not reflected in the parser, of course. Alternative pronunciations of y and h and allowing h in final position are invisible or do not make any 1990's Loglan incorrect. Permitting false name markers in names was already afoot in the 1990's and the basic outlines of our approach were already in place. The rule requiring explicit pauses between a name marker not starting a name word and the beginning of the next name word is new, but reflects something which was already a fact about 1990's Loglan pronunciation: those pauses had to be made in speech (and in the 1990's they had no tools to do relevant computer tests)! The requirement that names resolve into syllables restricts which literal occurrences of name markers are actually false name markers (the tail they induce in the name must itself resolve into syllables). Working out the full details of borrowing djifoa was interesting: I'm not sure that I've done anything *new* there; explicitly noting the stress shift in borrowing djifoa might be viewed as something new but it is a logical consequence of JCB's permission to pause after a borrowing djifoa, which contains explicit language about how it is to be stressed, and the final definition of a borrowing djifoa as simply a borrowing followed by -y. The shift strikes me as a really good idea anyway, because it marks djifoa with a pause after it as phonetically different in an additional way other than ending with the very indistinct vowel y. My rules as given here do not directly enforce the rule that a borrowing djifoa must be preceded by y but I think they indirectly enforce it in all or almost all cases: the parser tries to read a borrowing djifoa before reading any other kind of djifoa, so it is hard to see how to deploy a short djifoa in such a way that it would fall off the head of a borrowing without using y. These phonetics do not support certain irregularities in acronyms. We note that it is now allowed to insert [, mue] into an acronym, which would be necessary for example between a Ceo letteral and a following VCV letteral.

Sounds

Vowels

#all vowels

V1 <- [aeiouyAEIOUY]

#regular vowels

V2 <- [aeiouAEIOU]

Consonants

#consonants

C1 <- [bcdfghjklmnprstvzBCDFGHJKLMNPRSTVZ]

#consonants in voiced/unvoiced pairs

Cvoiced <- [bdgjvzBDGJVZ]

Cunvoiced <- [ptkcfsPTKCFS]

bad voice pair (or pair second term of which is h) forbidden as pairs of final consonants

Badvoice <- (Cvoiced (Cunvoiced/[Hh])/Cunvoiced (Cvoiced/[Hh]))

Letters and capitalization

letters

letter <- (![qwxQWX] [a-zA-Z])

a capitalization convention which allows what our current one allows and also allows all-caps. if case goes down from upper case to lower case, it can only go back up in certain cases. This does allow capitalization of initial segments of words. There is a forward reference to the grammar in that free capitalization of embedded literals is permitted, and capitalization of vowels guarded with z in literals as in DaiNaizA.

lowercase <- (![qwx] [a-z])

uppercase <- (![QWX] [A-Z])

caprule <- [\"(]? &([z] V1 (!uppercase/&TAI0)/lowercase TAI0 (!uppercase/&TAI0)/!(lowercase uppercase).) letter (&([z] V1 (!uppercase/&TAI0)/lowercase TAI0 (!uppercase/&TAI0)/!(lowercase uppercase).) (letter/juncture))* !(letter/juncture)

Junctures: syllable breaks and stresses

syllable markers: the hyphen is always medial so must be followed by a letter. the stress marks can be syllable final and word final. A juncture is never followed by another juncture.

juncture <- (([-] &letter)/[\'*]) !juncture

stress <- ['*] !juncture

Terminal punctuations and general characters

terminal punctuation terminal <- ([.:?!;#])

characters which can occur in words

character <- (letter/juncture)

Alien text

to really get all Loglan text, we should add the alien text constructions and the markers of alien text, [lie], [lao], [sao], [sue] and certain quotations which violate the phonetic rules.

we adopt the convention that all alien text may be but does not have to be enclosed in quotes. it needs to be understood that in quoted alien text, whitespace is understood as [, y,]; in the unquoted version this is shown explicitly. This handling of alien text is taken from the final 1990's treatment of Linnaeans = foreign names, and extended by us to replace the impossible treatment of strong quotation in 1989 Loglan.

this is a little different from what is allowed in the previous provisional parser, but similar. A difference is that all the alien text markers are allowed to be followed by the same sorts of alien text.

the forms with [hoi] and [hue] are required to have following quotes in written form to avoid unintended parses, which otherwise become likely in case of typos in non-alien text cases.

AlienText <- ([,]? [ ]+ [\"] (![\"].)+ [\"]/ [,]? [ ]+ (![, ]!terminal .)+ ([,]? [ ]+ [y] [,]? [ ]+ (![, ]!terminal .)+)*)

AlienWord <- &caprule ([Hh] [Oo] [Ii] juncture? &([,]? [ ]+ [\"])/[Hh][Uu] juncture? [Ee] juncture? &([,]? [ ]+ [\"]) / [Ll] [Ii] juncture? [Ee]juncture? /[Ll] [Aa] [Oo]juncture? /[Ll] [Ii] juncture? [Oo] juncture? /[Ss] [Aa] [Oo]juncture?/[Ss] [Uu] juncture? [Ee]juncture?) AlienText

while reading streams of cmapua, the parser will watch for the markers of alien text.

alienmarker <- ([Hh] [Oo] [Ii] juncture? &([,]? [ ]+ [\"])/[Hh][Uu] juncture? [Ee] juncture? &([,]? [ ]+ [\"]) / [Ll] [Ii] juncture? [Ee] juncture? /[Ll] [Aa] [Oo] juncture? /[Ll] [Ii] juncture? [Oo] juncture? /[Ss] [Aa] [Oo] juncture?/[Ss] [Uu] juncture? [Ee] juncture?) !V1

5/11/18 added [lio] as an alien text marker, to support numerals.

the continuant consonants and the syllabic pairs they can form

Complex Vowel Forms

continuant <- [mnlrMNLR]

syllabic <- (([mM] [mM] !(juncture? [mM]))/([nN] [nN] !(juncture? [nN]))/([rR] [rR] !(juncture? [rR]))/([lL] [lL] !(juncture? [lL])))

the obligatory monosyllables, and these syllables when broken by a usually bad syllable juncture. The i-final forms are not obligatory mono when followed by another i.

MustMono <- (([aeoAEO] [iI] ![iI]) /([aA] [oO]))

BrokenMono <- (([aeoAEO] juncture [iI] ![iI])/([aA] juncture [oO]))

the obligatory and optional monosyllables. Sequences of three of the same letter are averted. Avoid formation of doubled i or u after ui or ui.

Mono <- (MustMono/([iI] !([uU] [uU]) V2)/([uU] !([iI] [iI]) V2))

vowel pairs of the form found in cmapua and djifoa. (other than the special IY, UY covered in the cmapua rules)

The mysterious prohibition controls a permitted phonetic exception in djifoa gluing. compua are never followed directly by vocalic continuants in any case.

VV <- !(!MustMono V2 juncture? V2 juncture? [Rr] [Rr]) (!BrokenMono V2 juncture? V2)

the next vocalic unit to be chosen from a stream of vowels in a predicate or name. This is different than in our Sources and formally described in the proposal.

NextVowels <- (MustMono/(V2 &MustMono)/Mono/!([Ii] juncture [Ii] V1) !([Uu] juncture [Uu] V1) V2)

5/11/18 forbidding consonantal vowels to follow the same vowel.

the doubled vowels that trigger the rule that one of them must be stressed

DoubleVowel <- (([aA] juncture? [aA])/([eE] juncture? [eE])/([oO] juncture? [oO])/([iI] juncture [iI])/([uU] juncture [uU])/[iI] [Ii] &[iI]/[Uu] [uU] &[uU])

the mandatory "vowel" component of a syllable

Vocalic <- (NextVowels/syllabic/[Yy])

Complex Consonant Forms

the permissible initial pairs of consonants, and the same pairs possibly broken by syllable junctures.

Initial <- (([Bb] [Ll])/([Bb] [Rr])/([Cc] [Kk])/([Cc] [Ll])/([Cc] [Mm])/([Cc] [Nn])/([Cc] [Pp])/([Cc] [Rr])/([Cc] [Tt])/([Dd] [Jj])/([Dd] [Rr])/([Dd] [Zz])/([Ff] [Ll])/([Ff] [Rr])/([Gg] [Ll])/([Gg] [Rr])/([Jj] [Mm])/([Kk] [Ll])/([Kk] [Rr])/([Mm] [Rr])/([Pp] [Ll])/([Pp] [Rr])/([Ss] [Kk])/([Ss] [Ll])/([Ss] [Mm]) /[Ss] [Nn]/([Ss] [Pp])/([Ss] [Rr])/([Ss] [Tt])/([Ss] [Vv])/([Tt] [Cc])/([Tt] [Rr])/([Tt] [Ss])/([Vv] [Ll])/([Vv] [Rr])/([Zz] [Bb])/([Zz] [Ll])/([Zz] [Vv]))

MaybeInitial <- (([Bb] juncture? [Ll])/([Bb]juncture? [Rr])/([Cc]juncture? [Kk])/([Cc] juncture? [Ll])/([Cc]juncture? [Mm])/([Cc]juncture? [Nn])/([Cc]juncture? [Pp])/([Cc]juncture? [Rr])/([Cc]juncture? [Tt])/([Dd]juncture? [Jj])/([Dd]juncture? [Rr])/([Dd]juncture? [Zz])/([Ff]juncture? [Ll])/([Ff]juncture? [Rr])/([Gg]juncture? [Ll])/([Gg]juncture? [Rr])/([Jj]juncture? [Mm])/([Kk]juncture? [Ll])/([Kk] juncture? [Rr])/([Mm]juncture? [Rr])/([Pp]juncture? [Ll])/([Pp]juncture? [Rr])/([Ss]juncture? [Kk])/([Ss]juncture? [Ll])/([Ss] juncture? [Mm]) /[Ss] juncture? [Nn]/([Ss]juncture? [Pp])/([Ss]juncture? [Rr])/([Ss]juncture? [Tt])/([Ss]juncture? [Vv])/([Tt]juncture? [Cc])/([Tt]juncture? [Rr])/([Tt] juncture? [Ss])/([Vv]juncture? [Ll])/([Vv]juncture? [Rr])/([Zz]juncture? [Bb])/([Zz] juncture? [Ll])/([Zz] juncture? [Vv]))

the permissible initial consonant groups in a syllable. Adjacent consonants should be initial pairs. The group should not overlap a syllabic pair. Such a group is of course followed by a vocalic unit.

this rule for initial consonant groups is stated in NB3.

I forbid a three-consonant initial group to be followed by a syllabic pair. This seems obvious.

InitialConsonants <- ((!syllabic C1 &Vocalic)/(!(C1 syllabic) Initial &Vocalic)/(&Initial C1 !(C1 syllabic) Initial !syllabic &Vocalic))

the forbidden medial pairs and triples. These are forbidden regardless of placement of syllable breaks.

each of these is actually a single consonant followed by an initial, and the idea was to identify CVC-CCV junctions which would be hard to pronounce. But the placement of the syllable break is not relevant to the exclusion of the sequence. Notice that the continuant syllabic pairs are excluded: this prevents final consonants from being included in such pairs.

NoMedial2 <- (([Bb] juncture? [Bb])/([Cc] juncture? [Cc])/([Dd] juncture? [Dd])/([Ff] juncture? [Ff])/([Gg] juncture? [Gg])/([Hh] juncture? C1)/([Jj] juncture? [Jj])/([Kk] juncture? [Kk])/([Ll] juncture? [Ll])/([Mm] juncture? [Mm])/([Nn] juncture? [Nn])/([Pp] juncture? [Pp])/([Rr] juncture? [Rr])/([Ss] juncture? [Ss])/([Tt] juncture? [Tt])/([Vv] juncture? [Vv])/([Zz] juncture? [Zz])/([CJSZcjsz] juncture? [CJSZcjsz])/([Ff] juncture? [Vv])/([Kk] juncture? [Gg])/([Pp] juncture? [Bb])/([Tt] juncture? [Dd])/([FKPTfkpt] juncture? [JZjz])/([Bb] juncture? [Jj])/([Ss] juncture? [Bb]))

NoMedial3 <- (([Cc] juncture? [Dd] juncture? [Zz])/([Cc] juncture? [Vv] juncture? [Ll])/([Nn] juncture? [Dd] juncture? [Jj])/([Nn] juncture? [Dd] juncture? [Zz])/([Dd] juncture? [Cc] juncture? [Mm])/([Dd] juncture? [Cc] juncture? [Tt])/([Dd] juncture? [Tt] juncture? [Ss])/([Pp] juncture? [Dd] juncture? [Zz])/([Gg] juncture? [Tt] juncture? [Ss])/([Gg] juncture? [Zz] juncture? [Bb])/([Ss] juncture? [Vv] juncture? [Ll])/([Jj] juncture? [Dd] juncture? [Jj])/([Jj] juncture? [Tt] juncture? [Cc])/([Jj] juncture? [Tt] juncture? [Ss])/([Jj] juncture? [Vv] juncture? [Rr])/([Tt] juncture? [Vv] juncture? [Ll])/([Kk] juncture? [Dd] juncture? [Zz])/([Vv] juncture? [Tt] juncture? [Ss])/([Mm] juncture? [Zz] juncture? [Bb]))

The Syllable

there are no formal rules about syllables as such in our Sources, which is odd since the definition of predicates depends on the placement of stresses on syllables.

The first rule enforces the special point needed in complexes that a CVC syllable is preferred to a CV syllable where possible; we economically apply the same rule for default placement of syllable breaks everywhere, which is, with that exception, that the break comes as soon as possible.

the SyllableB approach is taken if the following syllable would otherwise start with a syllabic pair.

the reason for this approach is that if one syllabizes a well formed complex in this way... the syllable breaks magically fall on the djifoa boundaries. This does mean that the default break in [cabro] is [cab-ro], which feels funny but is harmless. Explicitly breaking it [ca-bro] will also parse correctly.

SyllableA <- (C1 V2 FinalConsonant (!Syllable FinalConsonant)?)

SyllableB <- (InitialConsonants? Vocalic (!Syllable FinalConsonant)? (!Syllable FinalConsonant)?)

Syllable <- ((SyllableA/SyllableB) juncture?)

The final consonant in a syllable. There may be one or two final consonants. A pair of final consonants may not be a non-continuant followed by a continuant. A final consonant may not start a forbidden medial pair or triple.

The rule that a final consonant pair may not be a non-continuant followed by a continuant is natural and obvious but not in our Sources. Such a pair of consonants would seem to naturally form another syllable.

a pair of final consonants cannot be differently voiced

FinalConsonant <- !syllabic !(&Badvoice C1 !Syllable) (!(!continuant C1 !Syllable continuant) !NoMedial2 !NoMedial3 C1 !(juncture? (V2/syllabic)))

#!((!MaybeInitial)C1 juncture? !syllabic C1 juncture? !syllabic C1) !(&MaybeInitial C1 juncture C1 !(juncture? C1))

Varieties of Syllable

Here are various flavors of syllable we may need.

this is a portmanteau definition of a bad syllable (the sort not allowed in a borrowing). SyllableD <- &(InitialConsonants? ([Yy]/DoubleVowel/BrokenMono/&Mono V2 DoubleVowel/!MustMono &Mono V2 BrokenMono)) Syllable

this (below) is the kind of syllable which can exist in a borrowed predicate: it cannot start with a continuant pair, it cannot have a y as vocalic unit, and its vocalic unit (whether it has one or two regular vowels) cannot be involved in a double vowel or an explicitly broken mandatory monosyllable.

BorrowingSyllable <- !syllabic (!SyllableD) Syllable

this is the final syllable of a predicate. It cannot be followed without pause by a regular vowel.

VowelFinal <- InitialConsonants? Vocalic juncture? !V2

syllables with syllabic consonant vocalic units this class is only used in borrowings, and we *could* reasonably require it to be followed by a vowel. But I won't for now. for gluing this restriction would work, but we might literally borrow predicates with syllabic continuant pronunciations.

SyllableC <- (&(InitialConsonants? syllabic) Syllable)

syllables with y

SyllableY <- (&(InitialConsonants? [Yy]) Syllable)

an explicitly stressed syllable.

StressedSyllable <- ((SyllableA/SyllableB) [\'*])

Name Words

a final syllable in a word, ending in a consonant.

NameEndSyllable <- (InitialConsonants? (syllabic/Vocalic &FinalConsonant) FinalConsonant? FinalConsonant? stress? !letter)

The Pause

the pause classes actually hang on the letter before the pause.

whitespace which might or might not be a pause.

maybepause <- (V1 [\'*]? [ ]+ C1)

explicit pauses: these are whitespace before a vowel or after a consonant, or comma marked pauses.

pause <- ((C1 [\'*]? [ ]+ &letter)/(letter [\'*]? [ ]+ &V1)/(letter [\'*]? [,] [ ]+ &letter))

The full analysis of names

these are final syllables in words followed by whitespace which might not be a pause. the definition actually doesnt mention the maybepause class.

MaybePauseSyllable <- InitialConsonants? Vocalic ['*]? &([ ]+ &C1)

a name word (without initial marking) is resolvable into syllables and ends with a consonant.

PreName <- ((Syllable &Syllable)* NameEndSyllable)

this is a busted name word with whitespace in it -- but not whitespace at which one has to pause.

BadPreName <- (MaybePauseSyllable [ ]+/Syllable &Syllable)* NameEndSyllable

This is a name marker followed by a consonant initial name word without pause.

I deployed a minimal set of name marker words; I can add the others whenever. I have decided (see below) to retain the social lubrication words as vocative markers *without* making them name markers, so one must pause [Loi, Djan]. By not allowing freemods right after vocative markers in the vocative rule, I make [Loi hoi Djan] work as well, without pause.

MarkedName <- &caprule ((([Ll] !pause [Aa] juncture?)/ ([Hh] [Oo] !pause [Ii] juncture?) / ([Hh] [Uu] juncture? !pause [Ee] juncture?) / ([Cc] !pause [Ii] juncture?)/([Ll] [Ii] juncture? !pause [Uu] juncture?)/[Gg][Aa] !pause [Oo] juncture?/[Mm][Uu] juncture? !pause [Ee] juncture?) [ ]* &C1 &caprule PreName)

MarkedName <- &caprule ((([Ll] !pause [Aa] juncture?)/ ([Hh] [Oo] !pause [Ii] juncture?) / ([Hh] [Uu] juncture? !pause [Ee] juncture?) /([Ll] [Ii] juncture? !pause [Uu] juncture?)/[Gg][Aa] !pause [Oo] juncture?/[Mm][Uu] juncture? !pause [Ee] juncture?) [ ]* &C1 &caprule PreName)

This is an unmarked name word with a false name marker in it.

FalseMarked <- (&PreName (!MarkedName character)* MarkedName)

This is the full definition of name words. These are either marked consonant initial names without pause defined above, names without false name markers beginning with explicit pauses (either comma marked or vowel-initial) and name markers followed, with or without pause, by name words. In the latter case there must be at least whitespace before a vowel initial name.

a series of names without false name markers and names marked with ci, separated by spaces, may be appended.

there is a look ahead at the grammar: a NameWord can be followed without explicit pause (there is whitespace and a pause in speech!) by another kind of utterance only in a serial name when what follows is of the form [ci] predunit, to be included in the name.

NameWord <- (&caprule MarkedName/([,] [ ]+ !FalseMarked &caprule PreName)/(&V1 !FalseMarked &caprule PreName)/&caprule ((([Ll] [Aa] juncture?)/([Hh] [Oo] [Ii] juncture?)/([Cc] &pause [Ii] juncture?)/([Ll] [Ii] juncture? [Uu] juncture?)/[Mm] [Uu] juncture? [Ee] juncture?/[Gg] [Aa] [Oo] juncture?) !V1 [,]? [ ]* &caprule PreName))([,]?[ ]+ !FalseMarked &caprule PreName/[,]?[ ]+ &([Cc] &pause [Ii]) NameWord)* &([ ]* [Cc] [Ii] predunit/&([,] [ ]+/terminal/[\")]/!.)./!.)

this is the minimal set of name marker words we are using. We may add more.

I am contemplating adding the words of social lubrication as name markers, but in a more restricted way that in the last provisional parser, in which I made them full-fledged vocative markers. [Actually, I preserved their status as vocative markers without restoring their status as name markers, in the latest version].

adding [mue] as a name marker

namemarker <- ([Ll] [Aa] juncture?/[Hh][Oo][Ii] juncture?/([Hh] [Uu] juncture? [Ee] juncture?)/[Cc] &pause [Ii] juncture?/[Ll][Ii] juncture? [Uu] juncture?/[Gg][Aa][Oo] juncture?/[Mm] [Uu] juncture? [Ee] juncture?) !V1

this is the bad name marker phenomenon that needs to be excluded. This captures the idea that what follows the name could be pronounced without pause as a name word according to the orthography, but the fact that whitespace is present shows that this is not the intention.

it is worth noting that name markers at heads of name words pass this test (because I omitted the test that what follows is not a PreName in the interests of minimizing lookahead); but this test is only applied to strings that have already been determined not to be of class NameWord.

badnamemarker <- namemarker !V1 [, ]? [ ]* BadPreName we test for the bad name marker condition at the beginning of each stream of cmapua, and streams of cmapua stop before name markers (and may resume at a name marker if neither a NameWord nor the bad marker condition is found).

We have at any rate completely solved the phonetic problem of names and their markers.

Predicate Start Test

predicate start tests: the idea is the same as class "connective" below, to recognize the start of a predicate without recursive appeals to the whole nasty definition of predicate. The reason to do it is to recognize when CV^n followed by CC cannot be a cmapua unit.

New implementation 4/28/2019. This allows only (C)V(V)(V) before the pair of vowels, for much less potential lookahead.

Vthree <- (V2 juncture?) (V2 juncture?) (V2 juncture?)

Vfour <- (V2 juncture?) (V2 juncture?) (V2 juncture?) (V2 juncture?)

predicate starting with two or three consonants: rules out CC(C)V(V) forms. Junctures in the initial consonant group ignored.

predstartA1 <- (&MaybeInitial C1 juncture? MaybeInitial/MaybeInitial) &V2 !(V2 stress !Mono V2) !(V2 juncture? V2 !character) !(V2 juncture? !character)

an apparent cmapua unit followed by a consonant group which cannot start a predicate -- CV(V) case

predstartA2 <- C1 V2 juncture? (V2 juncture?)? !predstartA1 C1 juncture? C1

a stressed CV^n before a consonant group (CV(V) case)

predstartA3 <- C1 !Vthree (!StressedSyllable V2 juncture?)? &StressedSyllable V2 V2? juncture? C1 juncture? C1

other (C)V^n followed by nonpredicate

predstartA4 <- C1? V2 juncture? (V2 juncture?)? (V2 juncture?)? !predstartA1 !(MaybeInitial V2) C1 juncture? C1

other stressed (C)V^n followed by consonant group

predstartA5 <- C1? !Vfour (!StressedSyllable V2 juncture?)? (!StressedSyllable V2 juncture?)? &StressedSyllable V2 V2? juncture? !(MaybeInitial V2) C1 juncture? C1

forms with y; implemented CVVhy alternative for CVV cmapua

predstartA6 <- C1 (V2 juncture?) (V2 juncture? [Hh]?/C1 juncture? (C1 juncture?)?) [Yy]

predstart <- predstartA1/predstartA2/predstartA3/predstartA4/predstartA5/predstartA6

it is worth noting that in the sequel we have systematically replaced tests &Cmapua with !predstart. The former involves lots of lookahead and was causing recursion crashes in Python. The phonetics and the grammar are both structured so that any string starting with a name marker is tested for NameWord-hood before it is tested for cmapua-hood; the only thing it is tested for later is predicate-hood, and predstart is a rough and ready test that something might be a predicate (and at any rate cannot be a cmapua).

Structure Word Phonetics

this class requires pauses before it, after all the phonetic word classes. what is being recognized is the beginning of a logical connective.

To avoid horrible recursion problems, giving this a concrete phonetic definition without much lookahead. This can go right up in the phonetics section if it works (and here it is!).

single vowel cmapua syllables early for connectives

a <- ([Aa] !badstress juncture? !V1)

e <- ([Ee] !badstress juncture? !V1)

i <- ([Ii] !badstress juncture? !V1)

o <- ([Oo] !badstress juncture? !V1)

u <- ([Uu] !badstress juncture? !V1)

Hearly <- (!predstart [Hh])

Nearly <- (!predstart [Nn])

these appear here for historical reasons and could be moved later

connective <- [ ]* !predstart ([Nn] [Oo] juncture? !i)? (a/e/i/o/u/Hearly a/Nearly UU) juncture? !V2 !(!predstart [Ff] [Ii]) !(!predstart [Mm] [Aa]) !(!predstart [Zz] [Ii])

cmapua units starting with consonants. This is the exact description from NB3. The fancy tail in each of the three cases is enforcing the rule about pausing before a following predicate if stressed.

consonant initial cmapua units may not be followed by vowels without pause.

I am adding [iy] and [uy] (always monosyllable, yuh and wuh) as vowel pairs permitted in VV and CVV cmapua units. it is worth noting that the "yuh" and "wuh" pronunciations of these diphthongs are surprising to the English-reading eye. The use for this envisaged is that the name [ziy] of Y becomes easy to introduce. Adding word space is always nice, and these words seem pronounceable. I also made [yfi] possible: Y now has phonetically regular names.

CmapuaUnit <- (C1 Mono juncture? V2 !(['*] [ ]* &C1 predstart) juncture? !V1/C1 (VV/[Ii][Yy]/[Uu][Yy]) !(['*] [ ]* &C1 predstart) juncture? !V1/C1 V2 !(['*] [ ]* &C1 predstart) juncture? !V1)

A stream of cmapua is read until the start of a predicate or a name marker word or an alien text marker word or a quote or parenthesis marker word is encountered. the stream might resume with a name marker word if it does not in fact start a name word and does not potentially start a name word due to inexplicit whitespace (doesn't satisfy the bad name marker condition).

we force explicit comma pauses before logical connectives, but not before vowel initial cmapua in general; other conditions force at least whitespace, which does stand for a pause, before such words.

detect starts of quotes or parentheses with li or [kie]

likie <- ([Ll] [Ii] juncture? !V1/[Ki] [Ii] juncture? [Ee] juncture? !V1)

a special provision is made for NO UI forms as single words. [yfi] is supported.

Cmapua <- &caprule !badnamemarker (!predstart (VV/[Ii][Yy]/[Uu][Yy]) !(['*] [ ]* &C1 predstart) juncture? NOI/!predstart [Nn] [Oo] juncture? !predstart (VV/[Ii][Yy]/[Uu][Yy]) !(['*] [ ]* &C1 predstart) juncture?/((!predstart (VV/[Ii][Yy]/[Uu][Yy]) !(['*] [ ]* &C1 predstart) juncture?)+ / ((!predstart V1 !(['*] [ ]* &C1 predstart) juncture?)/ !predstart CmapuaUnit) (!namemarker !alienmarker !likie !predstart CmapuaUnit)*)/!predstart V2 !(['*] [ ]* &C1 predstart) juncture?) !V1 !(C1+ juncture) !([ ]* connective)

I have apparently now completely solved the problem of parsing cmapua as well as name words.

Predicate Phonetics

Now for predicates.

Djifoa ("affixes")

the elementary djifoa (not borrowings)

various special flavors of these djifoa will be needed. These are the general definitions.

The NOY and Bad forms are for use for testing candidate borrowings for resolution with bad syllable break placements. Borrowings do not contain Y...

CVV djifoa with phonetic hyphens.

added checks to all cmapua classes: the vowel final ones, when not phonetically hyphenated, cannot be followed by a regular vowel. This is crucial for getting the syllable analysis and the djifoa analysis to end at the same point.

allowing h to be inserted before y in CVVy djifoa for a CVVhy form.

allowing -r glue to be expressed as -rr

CVV <- C1 VV (juncture? [Hh]? [Yy] [-]? &(Complex) /juncture? [Rr] [Rr]? juncture? &C1/[Nn] juncture? &[Rr]/juncture? !V2)

CVVNoHyphen <- C1 VV juncture? !V2

CVVHiddenStress <- C1 &DoubleVowel V1 [-]? V1 ([-]? [Hh]? [Yy] [-]? &Complex /[Rr] [-]? &C1/[Nn] [-]? &[Rr]/[-]? !V2)

CVVFinalStress <- C1 VV (['*] [Hh]? [Yy] [-]? &Complex /[Rr] ['*] &C1/['*] [Rr] [Rr] juncture? &C1/[Nn] ['*] &[Rr]/['*] !V2)

CVVNOY <- C1 VV (juncture? [Rr] [Rr]? juncture? &C1/[Nn] juncture? &[Rr]/juncture? !V2)

CVVNOYFinalStress <- C1 VV ([Rr] ['*] &C1/['*] [Rr] [Rr] juncture? &C1/[Nn] ['*] &[Rr]/['*] !V2)

CVVNOYMedialStress <- C1 !BrokenMono V2 ['*] V2 [-]? !V2

CCV djifoa with phonetic hyphens.

CCV <- Initial V2 (juncture? [Yy] [-]? &letter/juncture? !V2)

CCVStressed <- Initial V2 (['*] [Yy] [-]? &letter/['*] !V2)

CCVNOY <- Initial V2 juncture? !V2

CCVBad <- MaybeInitial V2 juncture? !V2 CCVBadStressed <- MaybeInitial V2 ['*] !V2

CVC djifoa with phonetic hyphens. These cannot be final and are always followed by a consonant (well, the -y form may be followed by a vowel... an eccentric syllable break is supported if the CVC is y-hyphenated: [me-ky-kiu] and [mek-y-kiu] are both legal. The default is the latter.

CVC <- (C1 V2 !NoMedial2 !NoMedial3 C1 (juncture? [Yy] [-]? &letter/juncture? &C1)/C1 V2 juncture C1 [Yy] [-]? &letter)

CVCStressed <- (C1 V2 !NoMedial2 !NoMedial3 C1 (['*] [Yy] [-]? &letter/['*] &letter)/C1 V2 ['*] C1 [Yy] [-]? &letter)

CVCNOY <- C1 V2 !NoMedial2 !NoMedial3 C1 juncture? &C1

CVCBad <- C1 V2 !NoMedial2 !NoMedial3 juncture? C1 &C1

CVCNOYStressed <- C1 V2 !NoMedial2 !NoMedial3 C1 ['*] &C1

CVCBadStressed <- C1 V2 !NoMedial2 !NoMedial3 ['*] C1 &C1

the five letter forms (always final in complexes) CCVCV <- Initial V2 juncture? C1 V2 [-]? !V2

CCVCVStressed <- Initial V2 ['*] C1 V2 [-]? !V2

CCVCVBad <- MaybeInitial V2 juncture? C1 V2 [-]? !V2

CCVCVBadStressed <- MaybeInitial V2 ['*] C1 V2 [-]? !V2

CVCCV <- (C1 V2 juncture? Initial V2 [-]? !V2/C1 V2 !NoMedial2 C1 juncture? C1 V2 [-]? !V2)

CVCCVStressed <- (C1 V2 ['*] Initial V2 [-]? !V2/C1 V2 !NoMedial2 C1 ['*] C1 V2 [-]? !V2)

the medial five letter djifoa

CCVCY <- Initial V2 juncture? C1 [Yy] [-]?

CVCCY <- (C1 V2 juncture? Initial [Yy] [-]?/C1 V2 !NoMedial2 C1 juncture? C1 [Yy] [-]?)

CCVCYStressed <- Initial V2 ['*] C1 [Yy] [-]?

CVCCYStressed <- (C1 V2 ['*] Initial [Yy] [-]?/C1 V2 !NoMedial2 C1 ['*] C1 [Yy] [-]?)

Borrowed Predicates

to reason about resolution of borrowings into both syllables and djifoa (we want to exclude the latter but we need to define it adequately) we need to recognize where to stop. A predicate word ends either at a non-character (not a letter or syllable mark: whitespace, comma or terminal punctuation) or it has an explicit or deducible penultimate stress. Borrowings do not contain doubled vowels, so they have to have explicit stress in the latter case.

analysis: the stressed tail consists of a stressed syllable followed by an unstressed syllable. identifying an unstressed final syllable is complicated by recognizing which CVV combinations can be one syllable. This will either be an explicitly stressed syllable followed by a single syllable or a syllable suitable to be stressed followed by an explicitly final syllable. CVV djifoa can contain both syllables in a tail and of course the five letter djifoa have to be tails. A never stressed SyllableC (with a continuant) may intervene.

tail of a borrowing with an explicit stress

BorrowingTail1 <- !SyllableC &StressedSyllable BorrowingSyllable (!StressedSyllable &SyllableC BorrowingSyllable)? !StressedSyllable &BorrowingSyllable VowelFinal

tail of a borrowing or borrowing djifoa with no explicit stress

BorrowingTail2 <- !SyllableC BorrowingSyllable (!StressedSyllable &SyllableC BorrowingSyllable)? !StressedSyllable &BorrowingSyllable VowelFinal (&[Yy]/!character)

tail of a stressed borrowing djifoa, different because stress is shifted to the end

BorrowingTail3 <- !SyllableC !StressedSyllable BorrowingSyllable (!StressedSyllable &SyllableC BorrowingSyllable)? &BorrowingSyllable InitialConsonants? Vocalic ['*] &[Yy]

BorrowingTail <- BorrowingTail1 / BorrowingTail2

short forms that are ruled out: CCVV and CCCVV forms.

CCVV <- (InitialConsonants V2 juncture? V2 juncture? !character / InitialConsonants V2 ['*] !Mono V2 juncture?)

VCCV and some related forms are ruled out (rule predstartF above is about this)

a continuant syllable cannot be initial in a borrowing and there cannot be successive continuant syllables. There really ought to be no more than one!

borrowing, before checking that it doesnt resolve into djifoa

PreBorrowing <- &predstart!CCVV!Cmapua!SyllableC(!BorrowingTail!(StressedSyllable)!(SyllableC SyllableC)BorrowingSyllable)* BorrowingTail

ditto for an explicitly stressed borrowing

StressedPreBorrowing <- &predstart!CCVV!Cmapua!SyllableC(!BorrowingTail!(StressedSyllable)!(SyllableC SyllableC)BorrowingSyllable)* BorrowingTail1

borrowing djifoa without explicit stress (before resolution check)

PreBorrowing2 <- &predstart!CCVV!Cmapua!SyllableC(!BorrowingTail!(StressedSyllable)!(SyllableC SyllableC)BorrowingSyllable)* BorrowingTail2

stressed borrowing djifoa (before resolution check).

PreBorrowing3 <- &predstart!CCVV!Cmapua!SyllableC(!BorrowingTail3!(StressedSyllable)!(SyllableC SyllableC)BorrowingSyllable)* BorrowingTail3 Now comes the problem of trying to say that a preborrowing cannot resolve into cmapua. The difficulty is with recognizing the tail, so making sure that the two resolutions stop in the same place.

we know because it is a borrowing that there is at most one explicit stress, and it has to fall in one of the cmapua! This should make it doable.

borrowing djifoa are terminated with y, so the final djifoa needs to take this into account

the idea behind both djifoa analyses is the same. If we end with a final djifoa followed by a non-character, we improve our chances of ending the syllable analysis at the same point. We control this by identifying djifoa with stresses in them: a medially stressed djifoa must be the last one (and the syllable analysis will find its stressed syllable and end at its final syllable, the fact that djifoa cannot be followed by vowels ensuring that the syllable analysis cannot overrun its end. When the djifoa is finally stressed, the complex analysis ends with a further djifoa guaranteed to have just one syllable, and the syllable analysis again will stop in the same place. The medial five letter forms and borrowing djifoa of course are finally stressed mod an additional unstressed syllable which is skipped by the syllable analysis, because it allows one to ignore an actually penultimate syllable with y or a syllabic consonant. In the case where we never find a stress and end up at a final djifoa, the syllable analysis will carry right through to the same final point.

in the attempted resolution of borrowings, our life is easier because we do not have borrowing djifoa or medial five letter forms to consider, or any forms with y-hyphens.

RFinalDjifoa <- (CCVCVBad/CVCCV/CVVNoHyphen/CCVBad/CVCBad) (&[Yy]/!character)

RMediallyStressed <- (CCVCVBadStressed/CVCCVStressed/CVVNOYMedialStress)

RFinallyStressed <- (CVVNOYFinalStress/CCVBadStressed/CVCBadStressed/CVCNOYStressed)

BorrowingComplexTail <- (RMediallyStressed/RFinallyStressed (&(C1 Mono) CVVNoHyphen/CCVBad)/RFinalDjifoa)

ResolvedBorrowing <- (!BorrowingComplexTail(CVVNOY/CCVBad/CVCBad))* BorrowingComplexTail

borrowed predicates Borrowing <- !ResolvedBorrowing &caprule PreBorrowing !([ ]* (connective))

explicitly stressed borrowed predicates

StressedBorrowing <- !ResolvedBorrowing &caprule StressedPreBorrowing !([ ]* &V1 Cmapua)

#This is the shape of non-final borrowing djifoa. Notice that a final stress is allowed. #The curious provision for explicitly stressing a borrowing djifoa and pausing is supported.

borrowing djifoa without explicit stress (stressed ones are not of this class!) Note that one can pause after these (explicitly, with a comma, in which case the stress must be explicit too)

BorrowingDjifoa <- !ResolvedBorrowing &caprule PreBorrowing2 (['*] [y] [,] [ ]+/juncture? [y] [-]?)

stressed borrowing djifoa finally implemented!

StressedBorrowingDjifoa <- !ResolvedBorrowing &caprule PreBorrowing3 [y] [-]? ([,] [ ]+)?

Complex Predicates

We resolve complexes twice, once into syllables and once into djifoa. We again have to ensure that we end up in the same place! The syllable resolution is very similar to that of borrowings; the unstressed middle syllable of the tail can be a SyllableY, and can also be a SyllableC if the final djifoa is a borrowing.

A stressed borrowing djifoa with the property that the tail is still a phonetic complex is a unit for this analysis.

note here that I specifically rule out a complex being followed without pause by y. I do not rule this out for the vowel final djifoa because they can be followed by y at the end of a borrowing djifoa.

DefaultStressedSyllable <- Syllable

PhoneticComplexTail1 <- !SyllableC !SyllableY &StressedSyllable DefaultStressedSyllable (!StressedSyllable &(SyllableC/SyllableY) Syllable)? !StressedSyllable !SyllableY VowelFinal !V1

PhoneticComplexTail2 <- !SyllableC !SyllableY DefaultStressedSyllable (!StressedSyllable &(SyllableC/SyllableY) Syllable)? !StressedSyllable !SyllableY VowelFinal !character

PhoneticComplexTail <- PhoneticComplexTail1 / PhoneticComplexTail2

note the explicit predstart test here.

PhoneticComplex <- &predstart!CCVV!Cmapua!SyllableC(StressedBorrowingDjifoa &PhoneticComplex/!PhoneticComplexTail!(StressedSyllable)!(SyllableC SyllableC) Syllable)* PhoneticComplexTail

the analysis of final djifoa and stressed djifoa differs only in details from what is above for resolution of borrowings. The issues about CVV djifoa with doubled vowels are rather exciting.

a stressed borrowing djifoa with the tail still a phonetic complex is a black box unit for this construction.

My approach imposes the restriction on JCB's "pause after a borrowing djifoa" idea that what follows the pause must itself contain a penultimate stress: [igllu'ymao] is a predicate but [igllu'y, mao] is not. while [iglluy', gudmao] is a predicate.

the analysis of the djifoa resolution process is the same as above, with additional remarks about doubled vowel syllables: notice that where the complex tail involved a doubled vowel syllable without explicit stress, we insist on that djifoa or the single-syllable next djifoa ending in a non-character: in the absence of explicit stress, we always rely on whitespace or punctuation to indicate the end of the predicate.

all sorts of subtleties about borrowings and borrowing djifoa are finessed by always looking for them first. There are no restrictions re fronts of borrowings or borrowing djifoa looking like regular djifoa; the fact that borrowing djifoa end in y and borrowings do not contain y makes it always possible to tell when one is looking at the head of a borrowing djifoa. Regular djifoa just before a borrowing djifoa need to be y-hyphenated so as not to be absorbed into the front of the borrowing (I don't believe that I actually need to impose a formal rule to this effect, though I am not absolutely certain; it would be difficult to formulate [and does appear in the previous version, where it is a truly unintelligible piece of PEG code]).

FinalDjifoa <- (Borrowing/CCVCV/CVCCV/CVVNoHyphen/CCVNOY) !character

MediallyStressed <- (StressedBorrowing/CCVCVStressed/CVCCVStressed/CVVNOYMedialStress)

FinallyStressed <-(StressedBorrowingDjifoa/CCVCYStressed/CVCCYStressed/CVVFinalStress/CCVStressed/CVCStressed)

ComplexTail <- (CVVHiddenStress (&(C1 Mono) CVVNoHyphen/CCVNOY) !character/FinallyStressed (&(C1 Mono) CVVNoHyphen/CCVNOY)/MediallyStressed/FinalDjifoa)

PreComplex <- (!CVVHiddenStress (!ComplexTail)(StressedBorrowingDjifoa &PhoneticComplex/BorrowingDjifoa/CVCCY/CCVCY/CVV/CCV/CVC))* ComplexTail

originally I had complicated tests here for the conditions under which an initial CVC cmapua has to be y-hyphenated: I was being wrong headed, the predstart rules already enforce this (in the bad cases, the initial CV- falls off). The user will simply find that they cannot put the word together otherwise. The previous version did need this test because it actually used full lookahead to check for the start of a predicate.

Complex <- &caprule &PreComplex PhoneticComplex !([ ]* (connective))

Quotation and Parenthesis of well-formed Loglan utterances; word classes

format for the LI quote and KIE parenthesis

LiQuote <- (&caprule [Ll][Ii]juncture? comma2? [\"] phoneticutterance [\"] comma2? &caprule [Ll][Uu]juncture? !([ ]* connective)/(&caprule [Kk][Ii]juncture?[Ee]juncture? comma2? [(] phoneticutterance [)] comma2? &caprule [Kk][Ii]juncture?[Uu]juncture? !([ ]* connective)))

the condition on Word that a Cmapua is not followed by another Cmapua with mere whitespace between was used by [liu] quotation, but is now redundant, because I have required that [liu] quotations be closed with explicit pauses in all cases.

Word <- (NameWord / Cmapua / Complex/CCVNOY)

it is an odd point that all borrowings parse as complexes -- so when I parsed all the words the first time they all parsed as complexes. A borrowing is a complex consisting of a single final borrowing djifoa! I did redesign this so that borrowings are parsed as borrowings. (This is the class I used to parse the dictionary).

Yes, CVC djifoa do get parsed as names in the dictionary, so the CVC case here is redundant. I actually think that only the CCV djifoa actually get parsed as such.

SingleWord <- (Borrowing !./Complex !./ Word !./PreName !. /CCVNOY) !.

name word appearing initially without leading spaces is important, because one type of NameWord includes a leading comma.

The full phonetic utterance classes

phoneticutterance1 <- (NameWord /[ ]* LiQuote/[ ]* NameWord/[ ]* AlienWord/[ ]*Cmapua/[ ]* '--'/[ ]* '...'/[ ]* Borrowing![y]/[ ]* Complex/[ ]* (CCVNOY))+

phoneticutterance <- (phoneticutterance1/[,][ ]+/terminal)+

Interlude: Phonemes and Pauses

Consonants and vowel groups in cmapua

as noted above, !predstart stands in for the computationally disastrous &Cmapua

badstress <- ['*] [ ]* &C1 predstart

B <- (!predstart [Bb])

C <- (!predstart [Cc])

D <- (!predstart [Dd])

F <- (!predstart [Ff])

G <- (!predstart [Gg])

H <- (!predstart [Hh])

J <- (!predstart [Jj])

K <- (!predstart [Kk])

L <- (!predstart [Ll])

M <- (!predstart [Mm])

N <- (!predstart [Nn])

P <- (!predstart [Pp])

R <- (!predstart [Rr])

S <- (!predstart [Ss])

T <- (!predstart [Tt])

V <- (!predstart [Vv])

Z <- (!predstart [Zz])

the monosyllabic classes may be followed by one vowel if they start a Cvv-V cmapua unit; the others may never be followed by vowels. Classes ending in -b are used in Cvv-V cmapua units.

the single vowel classes were moved before the class connective in the phonetics section.

V3 <- juncture? V2 !badstress

AA <- ([Aa] juncture? [Aa] !badstress juncture? !V1)

AE <- ([Aa] juncture? [Ee] !badstress juncture? !V1)

AI <- ([Aa] [Ii] !badstress juncture? !(V1))

AO <- ([Aa] [Oo] !badstress juncture? !(V1))

AIb <- ([Aa] [Ii] !badstress juncture? &(V2 juncture? !V1))

AOb <- ([Aa] [Oo] !badstress juncture? &(V2 juncture? !V1))

AU <- ([Aa] juncture? [Uu] !badstress juncture? !V1)

EA <- ([Ee] juncture? [Aa] !badstress juncture? !V1)

EE <- ([Ee] juncture? [Ee] !badstress juncture? !V1)

EI <- ([Ee] [Ii] !badstress juncture? !(V1))

EIb <- ([Ee] [Ii] !badstress juncture? &(V2 juncture? !V1))

EO <- ([Ee] juncture? [Oo] !badstress juncture? !V1)

EU <- ([Ee] juncture? [Uu] !badstress juncture? !V1)

IA <- ([Ii] juncture? [Aa] !badstress juncture? !(V1))

IE <- ([Ii] juncture? [Ee] !badstress juncture? !(V1))

II <- ([Ii] juncture? [Ii] !badstress juncture? !(V1))

IO <- ([Ii] juncture? [Oo] !badstress juncture? !(V1))

IU <- ([Ii] juncture? [Uu] !badstress juncture? !(V1))

IAb <- ([Ii] juncture? [Aa] !badstress juncture? &(V2 juncture? !V1))

IEb <- ([Ii] juncture? [Ee] !badstress juncture? &(V2 juncture? !V1))

IIb <- ([Ii] juncture? [Ii] !badstress juncture? &(V2 juncture? !V1))

IOb <- ([Ii] juncture? [Oo] !badstress juncture? &(V2 juncture? !V1))

IUb <- ([Ii] juncture? [Uu] !badstress juncture? &(V2 juncture? !V1))

OA <- ([Oo] juncture? [Aa] !badstress juncture? !V1)

OE <- ([Oo] juncture? [Ee] !badstress juncture? !V1)

OI <- ([Oo] [Ii] !badstress juncture? !(V1))

OIb <- ([Oo] [Ii] !badstress juncture? &(V2 juncture? !V1))

OO <- ([Oo] juncture? [Oo] !badstress juncture? !V1)

OU <- ([Oo] juncture? [Uu] !badstress juncture? !V1)

UA <- ([Uu] juncture? [Aa] !badstress juncture? !(V1))

UE <- ([Uu] juncture? [Ee] !badstress juncture? !(V1))

UI <- ([Uu] juncture? [Ii] !badstress juncture? !(V1))

UO <- ([Uu] juncture? [Oo] !badstress juncture? !(V1))

UU <- ([Uu] juncture? [Uu] !badstress juncture? !(V1))

UAb <- ([Uu] juncture? [Aa] !badstress juncture? &(V2 juncture? !V1))

UEb <- ([Uu] juncture? [Ee] !badstress juncture? &(V2 juncture? !V1))

UIb <- ([Uu] juncture? [Ii] !badstress juncture? &(V2 juncture? !V1))

UOb <- ([Uu] juncture? [Oo] !badstress juncture? &(V2 juncture? !V1))

UUb <- ([Uu] juncture? [Uu] !badstress juncture? &(V2 juncture? !V1))

adding the new IY and UY, which might see use some time. they are mandatory monosyllables but do not take a possible additional following vowel as the regular ones do. So far only used in [ziy].

IY <- [Ii] [Yy] !badstress juncture? !V1

UY <- [Uu] [Yy] !badstress juncture? !V1

The optional pause and commas

this is a pause not required by the phonetics. This is the only sort of pause which could in principle carry semantic freight (the pause/GU equivalence beloved of our Founder) but we have abandoned this. There is one place, after initial no in an utterance, where a pause can have effect on the parse (but not on the meaning, I believe, unless a word break is involved).

this class should NEVER be used in a context which might follow a name word. In previous versions, pauses after name words were included in the name word; this is not the case here, so a PAUSE after a name word would not be recognized as a mandatory pause.

in any event, as long as we stay away from pause/GU equivalence, this is not a serious issue!

this class does do some work in the handling of issues surrounding the legacy shape of APA connectives, concerning which the less said, the better.

PAUSE <- [,] [ ]+ !(V1/connective) &caprule

more punctuation

comma <- [,] [ ]+ &caprule

comma2 <- [,]? [ ]+ &caprule

Part II: Lexicography

In this section I develop the grammar of words in Loglan. I'll work by editing the original provisional PEG grammar.

I place the start of this section exactly here, just before two final items of punctuation, because these items of punctuation look forward not only to lexicography but to the full grammar!

Period and end of utterance

the end of utterance symbol [#] should be added in the phonetics section as a species of terminal marker. Done. We do *not* actually endorse use of this marker, but we can notionally support it and it is in our sources.

end <- (([ ]* '#' [ ]+ utterance)/([ ]+ !.)/!.)

this rule allows terminal punctuation to be followed by an inverse vocative, a frequent occurrence in Leith's novel, and something which makes sense.

period <- (([!.:;?] (&end/([ ]+ &caprule))) (invvoc period?)?)

Letters with y will be special cases idea: allow IY and UY (always monosyllables) as vowel combinations in cmapua only. done: Y has a name now. [yfi] is also added.

The cmapua word classes

the classes in this section after this point are the cmapua word classes of Loglan (if they begin with [ ]* or a word class). I suppose the alien text classes are not really word classes, but they are lexicographic items, as it were. Paradoxically, the PA and NI classes admit internal explicit pauses. So of course do predicate words!

Loglan does admit true multisyllable cmapua: there are words made of cmapua units which have joints between units at which one cannot pause without breaking the word. Lojban, I am told, does not.

this version has the general feature that the quotation and alien text constructions are not hacked: they are supported by the phonetic rules (as dire exceptions, of course) and the grammatical constructions conform with the phonetic layer. Alien text and utterances quoted with [li]...[lu] can be enclosed in double quotes. LI only supports full utterances, for the moment. All alien text constructors take the same class as argument: the vocative and inverse vocative *require* quotes to avoid misreading ungrammatical expressions with typos as correct (inverse) vocatives.

Letterals (first approximation)

the names [yfi], [ziy] for Y are supported. The Ceo names are left as they are. I decided that a second short series of letteral pronouns is actually a reasonable use of short words, and the Ceio words are there for other uses.

TAI0 <- (V1 juncture? M a/V1 juncture? F i/V1 juncture? Z i/!predstart C1 AI/!predstart C1 EI/!predstart C1 AIb u/!predstart C1 EIb (u)/!predstart C1 EO/ Z [Ii] V1 !badstress juncture? !V1 (M a)?)

Logical and causal connectives

a negative suffix used in various contexts. Always a suffix: its use as a prefix in tenses was a mistake in NB3 and I think still supported in LIP. Ambiguities demonstrably followed from this usage (an example of how the demonstration of non-ambiguity of 1989 Loglan was compromised by the opaque lexicography).

NOI <- (N OI)

the logical connectives. [A0] is the class of core logical connectives. [A] is the fully decorated logical connective with possible nu- (always in nuno- or nuu) and no- prefixes, possible -noi suffix, and possible (problematic) PA suffix, closed with -fi (our new proposal) or an explicit pause.

A0 <- &Cmapua (a/e/o/u/H a/N UU)

A <- [ ]* !predstart !TAI0 (N [o])? A0 NOI? !([ ]+ PANOPAUSES PAUSE) !(PANOPAUSES !PAUSE [ ,]) (PANOPAUSES ((F i)/&PAUSE))?

4/18 in connected sentpreds, fi must be used to close, not a pause.

A2 <- [ ]* !predstart !TAI0 (N [o])? A0 NOI? !([ ]+ PANOPAUSES PAUSE) !(PANOPAUSES !PAUSE [ ,]) (PANOPAUSES (F i))?

A not closed with -fi or a pause

ANOFI <- [ ]* (!predstart !TAI0 ( (N [o])? A0 NOI? PANOPAUSES?))

A1 <- A

versions of A with different binding strength

ACI <- (ANOFI C i)

AGE <- (ANOFI G e)

a tightly binding series of logical connectives used to link predicates this also includes the fusion connective [ze] when used between predicates.

CA0 <- (( (N o)? ((C a)/(C e)/(C o)/(C u)/(Z e)/(C i H a)/N u C u)) NOI?)

CA1 <- (CA0 !([ ]+ PANOPAUSES PAUSE) !(PANOPAUSES !PAUSE [ ,]) (PANOPAUSES ((F i)/&PAUSE))?)

CA1NOFI <- (CA0 PANOPAUSES?)

CA <- ([ ]* CA1)

the fusion connective when used in arguments

ZE2 <- ([ ]* (Z e))

sentence connectives. [I] is the class of utterance initiators (no logical definition). the subsequent classes are inhabited by sentence logical connectives with various binding strengths.

I <- ([ ]* !predstart !TAI0 i !([ ]+ PANOPAUSES PAUSE) !(PANOPAUSES !PAUSE [ ,]) (PANOPAUSES ((F i)/&PAUSE))?)

ICA <- ([ ]* i ((H a)/CA1))

ICI <- ([ ]* i CA1NOFI? C i)

IGE <- ([ ]* i CA1NOFI? G e)

forethought logical connectives

KA0 <- ((K a)/(K e)/(K o)/(K u)/(K i H a)/(N u K u))

causal and comparative modifiers

KOU <- ((K OU)/(M OI)/(R AU)/(S OA)/(M OU)/(C IU))

negative and converse forms

KOU1 <- (((N u N o)/(N u)/(N o)) KOU)

the full type of forethought connectives, adding the causal and comparative connectives

KA <- ([ ]* ((KA0)/((KOU1/KOU) K i)) NOI?)

the last component of the KA...KI... structure of forethought connections

KI <- ([ ]* (K i) NOI?)

causal and comparative modifiers which are *not* forethought connectives KOU2 <- (KOU1 !KI)

Quantity words

a test used to at least partially enforce the penultimate stress rule on quantifier predicates

BadNIStress <- ((C1 V2 V2? stress (M a)? (M OA)? NI RA)/(C1 V2 stress V2 (M a)? (M OA)? NI RA))

root quantity words, including the numerals

NI0 <- (!BadNIStress ((K UA)/(G IE)/(G IU)/(H IE)/(H IU)/(K UE)/(N EA)/(N IO)/(P EA)/(P IO)/(S UU)/(S UA)/(T IA)/(Z OA)/(Z OO)/(H o)/(N i)/(N e)/(T o)/(T e)/(F o)/(F e)/(V o)/(V e)/(P i)/(R e)/(R u)/(S e)/(S o)/(H i)))

the class of SA roots, which modify quantifiers

SA <- (!BadNIStress ((S a)/(S i)/(S u)/(IE (comma2? !IE SA)?)) NOI?)

the family of quantifiers which double as suffixes for the quantifier predicates this class perhaps should also include some other quantifier words. [re] for example ought to be handled in the same way as [ra,ri,ro]. No action here, just a remark.

RA <- (!BadNIStress ((R a)/(R i)/(R o)/R e/R u))

re and ru added to class RA 5/11/18

quantifier units consisting of a NI or RA root with [ma] 00 or [moa] 000 appended; to [moa] one can further append a digit to iterate [moa]: [fomoate] is four billion, for example. [rimoa], a few thousand.

a NI1 or RA1 may be followed by a pause before another NI word other than a numerical predicate; one is allowed to breathe in the middle of long numerals. I question whether the pause provision makes sense in RA1.

NI1 <- ((NI0 (!BadNIStress M a)? (!BadNIStress M OA NI0*)?) (comma2 !(NI RA) &NI)?)

RA1 <- ((RA (!BadNIStress M a)? (!BadNIStress M OA NI0*)?) (comma2 !(NI RA) &NI)?)

a composite NI word, optional SA prefix before a sequence of NI words or a RA word, or a single SA word [which will modify a default quantifier not expressed], possibly negated, connected with CA0 roots to other such constructs.

NI2 <- (( (SA? (NI1+/RA1))/SA) NOI? (CA0 ((SA? (NI1+/RA1))/SA) NOI?)*)

a full NI word with an acronymic dimension (starting with [mue], ending with a pause) or [cu] appended. I need to look up [cu] and figure out its semantics. An arbitrary name word may now be used as a dimension, as well.

NI <- ([ ]* NI2 (&(M UE) Acronym (comma/&end/&period) !(C u)/comma2? M UE comma2? PreName !(C u))? (C u)?)

mex is now identical with NI, but it's in use in later rules.

mex <- ([ ]* NI)

The overused CI

a word used for various tightly binding constructions: a sort of verbal hyphen. also a name marker, which means phonetic care is needed (pause after constructions with [ci]).

CI <- ([ ]* (C i))

Acronyms

Acronyms, which are names (not predicates as in 1989 Loglan) or dimensions (in NI above). units in acronym are TAI0 letterals, zV short forms for vowels, the dummy unit [mue], and NI1 quantity units. NI1 quantity units may not be initial. [mue] units may be preceded by pauses. An acronym has at least two units.

it is worth noting that acronyms, once viewed as names, could be entirely suppressed as a feature of the grammar by really making them names (terminate them with -n). I suppose a similar approach would work for dimensions, allowing any name word to serve as a dimension. [mue] would be a name marker for use with dimensions in this case. [temuedain], three dollars. Now supported.

Acronym <- ([ ]* &caprule ((M UE)/TAI0/(Z V2 !V2)) ((comma &Acronym M UE)/NI1/TAI0/(Z V2 (!V2/(Z &V2))))+)

Letterals and other pronouns

the full class of letterals, including the [gao] construction whose details I should look at.

TAI <- ([ ]* (TAI0/((G AO) !V2 [ ]* (PreName/Predicate/CmapuaUnit))))

atomic non-letteral pronouns.

#4/15/2019 reserved [koo] for a Lojban style imperative pronoun, though not officially adopting it. Also adding [dao] for a default, don't care argument, another Lojban feature.

DA0 <- ((T AO)/(T IO)/(T UA)/(M IO)/(M IU)/(M UO)/(M UU)/(T OA)/(T OI)/(T OO)/(T OU)/(T UO)/(T UU)/(S UO)/(H u)/(B a)/(B e)/(B o)/(B u)/(D a)/(D e)/(D i)/(D o)/(D u)/(M i)/(T u)/(M u)/(T i)/(T a)/(M o)/(K OO)/(D AO))

letterals (not including [gao] constructions and atomic pronouns optionally suffixed with a digit. One should pause after the suffixed forms, because [ci] is a name marker.

DA1 <- ((TAI0/DA0) (C i ![ ] NI0)?)

general pronoun words. DA <- ([ ]* DA1)

Tenses, locatives and modals

roots for PA words: tense and location words, prepositions building relative modifiers. All can optionally be negated with -noi. They may also be quantified. They may also be closed with ZI class affixes. PA cores.

PA0 <- (NI2? (N u !KOU)? ((G IA)/(G UA)/(P AU)/(P IA)/(P UA)/(N IA)/(N UA)/(B IU)/(F EA)/(F IA)/(F UA)/(V IA)/(V II)/(V IU)/(C OI)/(D AU)/(D II)/(D UO)/(F OI)/(F UI)/(G AU)/(H EA)/(K AU)/(K II)/(K UI)/(L IA)/(L UI)/(M IA)/(N UI)/(P EU)/(R OI)/(R UI)/(S EA)/(S IO)/(T IE)/ (V IE)/(V a)/(V i)/(V u)/(P a)/(N a)/(F a)/(V a)/(KOU !(N OI) !KI)) (N OI)? ZI?)

the form used for actual prepositions and suffixes to A words, with minimal pauses allowed. these are built by concatenating KOU2 and PA0 units, then linking these with CA0 roots (which can take no- prefixes and -noi suffixes, and next to which one *can* pause), optionally suffixed with a class ZI suffix.

PANOPAUSES <- ((KOU2/PA0)+ ((comma2? CA0 comma2?) (KOU2/PA0)+)*)

prepositional words

PA3 <- ([ ]* PANOPAUSES)

class PA can appear as tense markers or as relative modifiers without arguments; here pauses are allowed not only next to CA0 units but between KOU2/PA units. Like NI words, PA words are a class of arbitrary length constructions, and we think breaths within them (especially complex ones) are natural.

PA <- ((KOU2/PA0)+ (((comma2? CA0 comma2?)/(comma2 !mod1a)) (KOU2/PA0)+)*) !modifier

PA2 <- ([ ]* PA)

GA <- ([ ]* (G a))

the class of tense markers which can appear before predicates.

PA1 <- ((PA2/GA))

suffixes which indicate extent or remoteness/proximity of the action of prepositions. ZI <- ((Z i)/(Z a)/(Z u))

Articles and other descriptors

the primitive description building "articles". These include [la] which requires special care in its use because it is a name marker.

LE <- ([ ]* ((L EA)/(L EU)/(L OE)/(L EE)/(L AA)/(L e)/(L o)/(L a)))

articles which can be used with abstract descriptions: these include some quantity words. this means that some abstract descriptions are semantically indefinites: I wonder if this could be improved by having a separate abstract indefinite construction.

LEFORPO <- ([ ]* ((L e)/(L o)/NI2))

the numerical/quantity article.

LIO <- ([ ]* (L IO))

structure words for the ordered and unordered list constructions.

LAU <- ([ ]* (L AU))

LOU <- ([ ]* (L OU))

LUA <- ([ ]* (L UA))

LUO <- ([ ]* (L UO))

ZEIA <- ([ ]* Z EIb a)

ZEIO <- ([ ]* Z EIb o)

initial and final words for quoting Loglan utterances.

LI1 <- (L i)

LU1 <- (L u)

Quotations and other alien text constructions

quoting Loglan utterances, with or without explicit double quotes (if they appear, they must appear on both sides). The previous version allowed quotation of names; likely this should be restored.

LI <- ([ ]* LI1 comma2? utterance0 comma2? LU1/[ ]* LI1 comma2? [\"] utterance0 [\"] comma2? LU1)

the foreign name construction. This is an alien text construction

LAO <- ([ ]* &([Ll] [Aa] [Oo]juncture?) AlienWord)

the strong quotation construction. This is an alien text construction.

LIE <- ([ ]* &([Ll] [Ii] juncture? [Ee]juncture?) AlienWord)

LIO1 <- ([ ]* &([Ll] [Ii] juncture? [Oo]juncture?) AlienWord)

I am not sure this class is used at all.

LW <- Cmapua

articles for quotation of words

LIU0 <- ((L IU)/(N IU))

this now imposes the condition that an explicit comma pause (or terminal punctuation, or end) must appear at the end of the Word or PreName quoted with [liu]. This seems like a good idea, anyway.

this class appeals to the phonetics. Words and PreNames can be quoted. The ability to quote names here may remove the need to quote them with [li]...[lu]. Of course, some Words are in fact phrases rather than single words: we will see whether the privileges afforded are used. The final clause allows use of letterals as actual names of letters.

added [niu]: didn't make it a name marker.

LIU1 <- ([ ]* ([Ll]/[Nn])[iI] juncture? [Uu] juncture? !V1 comma2? (PreName/Word) &(comma/terminal/end) /[ ]*(L II TAI ))

the construction of foreign and onomatopoeic predicates. These are alien text constructions. SUE <- ([ ]* &([Ss] [Uu] juncture? [Ee] juncture?/[Ss] [Aa] [Oo] juncture?) AlienWord)

Assorted left and right closers

left marker in a predicate metaphor construction

CUI <- ([ ]* (C UI) )

other uses of GA

GA2 <- ([ ]* (G a) )

ge/geu act as "parentheses" to make an atomic predicate from a complex metaphorically and logically connected predicates; [ge] has other left marking uses.

GE <- ([ ]* (G e) )

GEU <- ([ ]* ((C UE)/(G EU)) )

final marker of a list of head terms

GI <- ([ ]* ((G i)/(G OI)) )

used to move a normally prefixed metaphorical modifier after what it modifies.

GO <- ([ ]* (G o) )

marker for second and subsequent arguments before the predicate; NEW

GIO <- ([ ]* (G IO) )

the generic right marker of many constructions.

GU <- ([ ]* (G u) ) various flavors of right markers. It should be noted that at one point I executed a program of simplifying these to reduce the likelihood that multiple [gu]'s would ever be needed to close an utterance. first of all, I made the closures leaner, moving them out of the classes closed to their clients so that they generally can be used only when needed. Notably, the grammar of [guu] is quite different. Second, I introduced some new flavors of right marker. All can be realized with [gu], but if one knows the right flavor one can close the right structure with a single right closure. right markers of subordinate clauses (argument modifiers). [gui] closes a different class than in the trial.85 grammar, with similar but on the whole better results.

GUIZA <- ([ ]* (G UI) (Z a) )

GUIZI <- ([ ]* (G UI) (Z i) )

GUIZU <- ([ ]* (G UI) (Z u) )

GUI <- (!GUIZA !GUIZI !GUIZU ([ ]* (G UI) ))

right markers of abstract predicates and descriptions. probably the forms with z are to be preferred (and the other two are not needed) but I preserve all five classes for now.

GUO <- ([ ]* (G UO) )

GUOA <- ([ ]* (G UOb a/G UO Z a) )

GUOE <- ([ ]* (G UOb e) )

GUOI <- ([ ]* (G UOb i/G UO Z i) )

GUOO <- ([ ]* (G UOb o) )

GUOU <- ([ ]* (G UOb u/G UO Z u) )

right marker used to close term (argument/predicate modifier) lists. it is important to note that in our grammar GUU is not a component of the class termset, nor is it a null termset: it appears in other classes which include termsets as an option to close them. The effects are similar to those in the trial.85 grammar, but there is less of a danger that extra unexpected closures will be needed.

GUU <- ([ ]* (G UU) )

a new closure for arguments in various contexts

GUUA <- ([ ]* (G UUb a) )

a new closure for sentences. In particular, it may have real use in closing up the scope of a list of fronted terms before a series of logically connected sentences.

GIUO <- ([ ]* (G IUb o) )

right marker used to close arguments tightly linked with JE/JUE. GUE <- ([ ]* (G UE) )

a new closure for descpreds GUEA <- ([ ]* (G UEb a) )

Miscellaneous clause constructors

used to build tightly linked term lists.

JE <- ([ ]* (J e) )

JUE <- ([ ]* (J UE) )

used to build subordinate clauses (argument modifiers).

JIZA <- ([ ]* ((J IE)/(J AE)/(P e)/(J i)/(J a)/(N u J i)) (Z a) )

JIOZA <- ([ ]* ((J IO)/(J AO)) (Z a) )

JIZI <- ([ ]* ((J IE)/(J AE)/(P e)/(J i)/(J a)/(N u J i)) (Z i) )

JIOZI <- ([ ]* ((J IO)/(J AO)) (Z i) )

JIZU <- ([ ]* ((J IE)/(J AE)/(P e)/(J i)/(J a)/(N u J i)) (Z u) )

JIOZU <- ([ ]* ((J IO)/(J AO)) (Z u) )

JI <- (!JIZA !JIZI !JIZU ([ ]* ((J IE)/(J AE)/(P e)/(J i)/(J a)/(N u J i)) ))

JIO <- (!JIOZA !JIOZI !JIOZU ([ ]* ((J IO)/(J AO)) ))

Case tags, semantic and positional

case tags, both numerical position tags and the optional semantic case tags.

DIO <- ([ ]* ((B EU)/(C AU)/(D IO)/(F OA)/(K AO)/(J UI)/(N EU)/(P OU)/(G OA)/(S AU)/(V EU)/(Z UA)/(Z UE)/(Z UI)/(Z UO)/(Z UU)) ) (C i ![ ] NI0/ZI)?

markers of indirect reference. Originally these had the same grammar as case tags, but they are now different.

LAE <- ([ ]* ((L AE)/(L UE)) )

The predicate constructor me

[me] turns arguments into predicates, [meu] closes this construction.

ME <- ([ ]* ((M EA)/(M e)) )

MEU <- ([ ]* M EU )

Reflexive and conversion operators

reflexive and conversion operators: first the root forms, then those with optional numerical suffixes.

NU0 <- ((N UO)/(F UO)/(J UO)/(N u)/(F u)/(J u))

NU <- [ ]* (((N u/N UO) !([ ]+ (NI0/RA)) (NI0/RA)?)/NU0)+ freemod?

Abstract predicate constructors

I do *not* think that [poia] will really be confused with [po ia], particularly since we do require an explicit pause before [ia] in the latter case, but I record this concern: the forms with z might be preferable.

#constructions from sentences

PO1 <- ([ ]* ((P o)/(P u)/(Z o)))

PO1A <- ([ ]* ((P OIb a)/(P UIb a)/(Z OIb a)/(P o Z a)/(P u Z a)/(Z o Z a)))

PO1E <- ([ ]* ((P OIb e)/(P UIb e)/(Z OIb e)))

PO1I <- ([ ]* ((P OIb i)/(P UIb i)/(Z OIb i)/(P o Z i)/(P u Z i)/(Z o Z i)))

PO1O <- ([ ]* ((P OIb o)/(P UIb o)/(Z OIb o)))

PO1U <- ([ ]* ((P OIb u)/(P UIb u)/(Z OIb u)/(P o Z u)/(P u Z u)/(Z o Z u)))

abstract predicate constructor from simple predicates

POSHORT1 <- ([ ]* ((P OI)/(P UI)/(Z OI)))

word forms associated with the above abstract predicate root forms

PO <- ([ ]* PO1 )

POA <- ([ ]* PO1A )

POE <- ([ ]* PO1E )

POI <- ([ ]* PO1E )

POO <- ([ ]* PO1O )

POU <- ([ ]* PO1U )

POSHORT <- ([ ]* POSHORT1 )

register markers

DIE <- ([ ]* ((D IE)/(F IE)/(K AE)/(N UE)/(R IE)) )

freemods and freemod builders

vocative forms: I still have the words of social lubrication as vocative markers.

HOI <- ([ ]* ((H OI)/(L OI)/(L OA)/(S IA)/(S IE)/(S IU)) )

the verbal scare quote. The quantifier suffix indicates how many preceding words are affected; this is an odd mechanism.

JO <- ([ ]* (NI0/RA/SA)? (J o) )

markers for forming parenthetical utterances as free modifiers.

KIE <- ([ ]* (K IE) )

KIU <- ([ ]* (K IU) )

KIE2 <- [ ]* K IE comma2? [(]

KIU2 <- [ ]* [)] comma2? K IU

marker for forming smilies.

SOI <- ([ ]* (S OI) )

a grab bag of attitudinal words, including but not restricted to the VV forms.

UI0 <- (!predstart (!([Ii] juncture? [Ee]) VV juncture?/(B EA)/(B UO)/(C EA)/(C IA)/(C OA)/(D OU)/(F AE)/(F AO)/(F EU)/(G EA)/(K UO)/(K UU)/(R EA)/(N AO)/(N IE)/(P AE)/(P IU)/(S AA)/(S UI)/(T AA)/(T OE)/(V OI)/(Z OU)/((L OI))/((L OA))/((S IA))/(S II)/(T OE)/((S IU))/(C AO)/(C EU)/((S IE))/(S EU)/(S IEb i)))

negative forms of the attitudinals. The ones with [no] before the two vowel forms are a phonetic exception. The others should also be (though they present no pronunciation problem) so that they are resolved as single words.

NOUI <- (([ ]* UI0 NOI)/([ ]* N [o] juncture? comma? [ ]* UI0 ))

all attitudinals (adding the discursives nefi, tofi... etc) there is a technical problem with mixing UI0 roots of VV and CVV shapes.

UI1 <- ([ ]* (UI0+/(NI F i)) )

the inverse vocative marker

HUE <- ([ ]* (H UE))

Negation

occurrences of [no] as a word rather than an affix.

NO1 <- ([ ]* !KOU1 !NOUI (N o) !(comma2? Z AO comma2? Predicate) !([ ]* KOU) !([ ]* (JIO/JI/JIZA/JIOZA/JIZI/JIOZI/JIZU/JIOZU)) )

gaa, the large subject marker in the alternative parser

a technical closure for the alternative parser approach: the "large subject marker"

GAA <- (NO1 freemod?)* ([ ]* (G AA))

The large word classes (names and predicates)

Names, acronyms and PreNames from above.

AcronymicName <- Acronym &(comma/period/end)

DJAN <- (PreName/AcronymicName)

predicate words which are phonetically cmapua

"identity predicates". Converses are provided as a new proposal.

BI <- ([ ]* (N u)? ((B IA)/(B IE)/(C IE)/(C IO)/(B IA)/(B [i])) )

interrogative and pronoun predicates

LWPREDA <- ((H e)/(D UA)/(D UI)/(B UA)/(B UI))

here I should reinstall the [zao] proposal.

the predicate words defined above in the phonetics section

Predicate <- (CmapuaUnit comma2? Z AO comma2?)* Complex (comma2? Z AO comma2? Predicate)?

predicate words, other than the "identity predicates" of class [BI] these include the numerical predicates (NI RA), also cmapua phonetically.

we are installing John Cowan's [zao] proposal here, experimentally, 4/15/2019

PREDA <- ([ ]* &caprule (Predicate/LWPREDA/(![ ] NI RA)) )

Part 3: The Grammar Proper

Right markers turned into classes

guoa <- (PAUSE? (GUOA/GU) freemod?)

guoe <- (PAUSE? (GUOE/GU) freemod?)

guoi <- (PAUSE? (GUOI/GU) freemod?)

guoo <- (PAUSE? (GUOO/GU) freemod?)

guou <- (PAUSE? (GUOU/GU) freemod?)

guo <- (!guoa !guoe !guoi !guoo !guou (PAUSE? (GUO/GU) freemod?))

guiza <- (PAUSE? (GUIZA/GU) freemod?)

guizi <- (PAUSE? (GUIZI/GU) freemod?)

guizu <- (PAUSE? (GUIZU/GU) freemod?)

gui <- (PAUSE? (GUI/GU) freemod?)

gue <- (PAUSE? (GUE/GU) freemod?)

guea <- (PAUSE? (GUEA/GU) freemod?)

guu <- (PAUSE? (GUU/GU) freemod?)

guua <- (PAUSE? (GUUA/GU) freemod?)

giuo <- (PAUSE? (GIUO/GU) freemod?)

meu <- (PAUSE? (MEU/GU) freemod?)

geu <- GEU

Here note the absence of pause/GU equivalence.

gap <- (PAUSE? GU freemod?)

The vocative and inverse vocative

this is the vocative construction. It can appear early because all of its components are marked.

the intention is to indicate who is being addressed. This can be handled via a name, a descriptive argument, a predicate or an alien text name (the last must be quoted). The complexities of these grammatical constructions can be deferred until they are introduced.

HOI0 <- [ ]* [Hh] [Oo] [Ii] juncture?

restore words of social lubrication as vocative markers but not as name markers: [loi, Djan]

I do not allow a freemod to intervene between a vocative marker and the associated utterance, to avoid unintended grabbing of subjects by the words of social lubrication when they are used as vocative markers. This lets [Loi, Djan] and [Loi hoi Djan] be equivalent. The comma needed in the first because the social lubrication words are in this version not name markers.

HOI0 <- ([ ]* ((([Hh] OI)/([Ll] OI)/([Ll] OA)/([Ss] IA)/([Ss] IE)/([Ss] IU)))) juncture? !V1

voc <- (HOI0 comma2? name /(HOI comma2? descpred guea? namesuffix?)/(HOI comma2? argument1 guua?)/[ ]* &([Hh] [Oo] [Ii] juncture?) AlienWord)

this is the inverse vocative. It can appear early because all of its components are marked.

the intention is to indicate who is speaking. The range of ways this can be handled is similar to the range of ways it can be handled for the vocative; there is the further option of a sentence (the [statement] class) and there is a strong closure option for the case where an argument is used (to avoid it inadvertantly expanding to a sentence).

HUE0 <- [ ]* &caprule [Hh] [Uu] juncture? [Ee] juncture? !V1

invvoc <- (HUE0 comma2? name/HUE freemod? descpred guea? namesuffix?/(HUE freemod? statement giuo?)/(HUE freemod? argument1 guu?)/[ ]* &([Hh] [Uu] juncture? [Ee] juncture?) AlienWord)

Free modifiers

this is the class of free modifiers. Most of its components are head marked (those that aren't appear just above), and it is useful for it to appear early because these things appear everywhere in subsequent constructions. A free modifier, of whatever sort, is a freely insertable gadget which modifies the immediately preceding construction, or the entire utterance if it is initial.

NOUI is a negated attitudinal word. UI1 is an attitudinal word: these express an emotional attitude toward the assertion (noting that EI marks questions (yes or no answer expected) and SEU marks utterances as answers).

SOI creates smilies in a general sense: [soi crano] indicates that the listener should imagine the speaker smiling; similarly for other predicates.

DIE and NO DIE are register markers, communicating the social attitude of the speaker toward the one addressed: [die] for example is "dear"

KIE...KIU constructs a full parenthetical utterance as a comment, which can be enclosed in actual parentheses inside the marker words.

JO is a scare quote device.

deletion of a previous word or wordlike unit (or more than one) using K IA

kiamod <- comma2? !(!PreName !predstart K IA) ((PreName/LIU1/AlienWord/Cmapua ([ ]* (!(K IA)) !PreName !predstart Cmapua)*/Word) kiamod* comma2? !PreName !predstart K IA) comma2?

the comma is a freemod with no semantic content: this is a device for discarding phonetically required pauses and the speaker's optional pauses alike. The pause before a non-pause marked prename is part of the NameWord and so is excluded. Ellipses and dashes are fancy pauses supported as freemods.

freemod <- ((kiamod/NOUI/(SOI freemod? descpred guea?)/DIE/(NO1 DIE)/(KIE comma? utterance0 comma? KIU)/(KIE2 comma? utterance0 comma? KIU2)/invvoc/voc/(comma !(!FalseMarked PreName))/JO/UI1/([ ]* '...' ([ ]* &letter)?)/([ ]* '--' ([ ]* &letter)?)) freemod?)

Tightly bound arguments and lists thereof

the classes juelink to linkargs describe very tightly bound arguments which can be firmly attached to predicates in the context of metaphorical modifications and the use of predicates in descriptive arguments.

note that we allow predicate modifiers (prepositional phrases) to be bound with [je/jue] which is not allowed in 1989 Loglan, but which we believe is supported in Lojban.

juelink <- (JUE freemod? (term/(PA2 freemod? gap?)))

links1 <- (juelink (freemod? juelink)* gue?)

links <- ((links1/(KA freemod? links freemod? KI freemod? links1)) (freemod? A1 freemod? links1)*)

jelink <- (JE freemod? (term/(PA2 freemod? gap?)))

linkargs1 <- (jelink freemod? (links/gue)?)

linkargs <- ((linkargs1/(KA freemod? linkargs freemod? KI freemod? linkargs1)) (freemod? A1 freemod? linkargs1)*)

Abstract argument constructions

class abstractpred supports the construction of event, property, and quantity predicates from sentences. These are closable with [guo] if introduced with [po,pu,zo] and closable with suffixed variants of [guo] if introduced with suffixed variants of [po,pu,zo] (a NEW idea but it is clear that closure of these predicates (and of the more commonly used associated descriptions) is an important issue). abstractpred <- ((POA freemod? uttAx guoa?)/(POA freemod? sentence guoa?)/(POE freemod? uttAx guoe?)/(POE freemod? sentence guoe?)/(POI freemod? uttAx guoi?)/(POI freemod? sentence guoi?)/(POO freemod? uttAx guoo?)/(POO freemod? sentence guoo?)/(POU freemod? uttAx guou?)/(POU freemod? sentence guou?)/(PO freemod? uttAx guo?)/(PO freemod? sentence guo?))

Atomic predicates (predunit)

predunit1 describes the truly atomic forms of predicate.

PREDA is the class of predicate words (the phonetic predicate words along with the special phonetic cmapua which are predicates, listed above under the PREDA rule. NU PREDA handles permutations and identifications of arguments of PREDAs.

SUE contains the alien text constructions with [sao] and [sue], semantically quite different but syntactically handled in the same way.

[ge]...[geu/cue] (the closing optional) can parenthesize a fairly complex predicate phrase and turn it into an atomic form. These forms can have conversion or reflexive operators (NU) applied. I should look into why the class handled in the conversion case is different. An important use of this is in metaphor constructions, but it has other potential uses.

abstractpred is the class of abstraction predicates just introduced above. These are treated as atomic in this grammar: it should be noted that their privileges in the trial.85 grammar are (absurdly) limited.

[me]...[meu] (the closing optional, but important to have available) forms predicates from arguments, the predicate being true of the objects to which the argument refers. [Ti me le mrenu] : this is one of the men we are talking about.

predunit1 <- ((SUE/(NU freemod? GE freemod? despredE (freemod? geu comma?)?)/(NU freemod? PREDA)/(comma? GE freemod? descpred (freemod? geu comma?)?)/abstractpred/(ME freemod? argument1 meu?)/PREDA) freemod?)

[no] binds very tightly to predunit1: a possibly multiply negated predunit1 (or an unadorned predunit1) is a predunit2.

predunit2 <- ((NO1 freemod?)* predunit1)

an instance of NO2 is one not absorbed by a predunit. Example: [Da no kukra prano] X is a slow (not-fast) runner vs [Da no ga kukra prano] (X is not a fast runner, and in fact may not run at all).

NO2 <- (!predunit2 NO1)

a predunit3 is a predunit2 with tightly attached arguments.

predunit3 <- ((predunit2 freemod? linkargs)/predunit2)

a predunit is a predunit3 or a predunit3 converted by the short-scope abstraction operators [poi/pui/zoi] to an abstraction predicate. This is the kind of predicate which can appear as a component in a serial name.

predunit <- ((POSHORT freemod?)? predunit3)

a further "atomic" (because tightly packaged) form is a forethought connected pair of predicates (this being the full predicate class defined at the end of the process) possibly closed with [guu], possibly multiply negated as well.

the closure with guu eliminated the historic rule against kekked heads of metaphors.

kekpredunit <- ((NO1 freemod?)* KA freemod? predicate freemod? KI freemod? predicate guu?)

The construction of metaphors

there follows the construction of metaphorically modified predicates, along with tightly logically linked predicates.

CI and simple juxtaposition of predicates both represent modification of the second predicate by the first. We impose no semantic conditions on this modification, except in the case of modification by predicates logically linked with CA, which do distribute logically in the expected way both as modifiers and as modified. We do not regard [preda1 preda2] as necessarily implying preda2: we do regard it as having the same place structure as preda2. It is very often but not always a qualification or kind of preda2; in any case it is a relation analogous to preda2.

modification with CI binds most tightly.

we eliminated the distinction between the series of sentence and description predicate preliminary classes: there seems to be no need for it even in the trial.85 grammar.

despredA <- ((predunit/kekpredunit) (freemod? CI freemod? (predunit/kekpredunit))*)

this is logical connection of predicates with the tightly binding CA series of logical connectives. CUI can be used to expand the scope of a CA connective over a metaphor on the left. [ge]...[geu] is used to expand scope on the right (and could also be used on the left, it should be noted). descpredC is an internal of despredB assisting the function of CUI. the !PREDA in front of CUI is probably not needed.

despredB <- ((!PREDA CUI freemod? despredC freemod? CA freemod? despredB)/despredA)

despredC <- (despredB (freemod? despredB)*)

tight logical linkage of despredB's

despredD <- (despredB (freemod? CA freemod? despredB)*)

chain of modifications of despredD's (grouping to the left)

despredE <- (despredD (freemod? despredD)*)

the GO construction allows inverse modification: [preda1 GO preda2] is [preda2 preda1] as it were. there are profound effects on grouping.

descpred <- ((despredE freemod? GO freemod? descpred)/despredE)

this version which appears in sentence predicates as opposed to descriptions differs in allowing loosely linked arguments (termsets) instead of those linked with [je/jue] for the predicate moved to the end by GO.

sentpred <- ((despredE freemod? GO freemod? barepred)/despredE)

Construction of sentence modifiers

the construction of predicate modifiers (prepositional phrases usable as terms along with arguments).

mod1a <- (PA3 freemod? argument1 guua?)

note special treatment of predicate modifiers without actual arguments. the !barepred serves to distinguish these predicate modifiers from actual "tenses" (predicate markers).

mod1 <- ((PA3 freemod? argument1 guua?)/(PA2 freemod? !barepred gap?))

forethought connection of modifiers. There is some subtlety in how this is handled.

kekmod <- ((NO1 freemod?)* (KA freemod? modifier freemod? KI freemod? mod))

mod <- (mod1/((NO1 freemod?)* mod1)/kekmod)

afterthought connection of modifiers

modifier <- (mod (A1 freemod? mod)*)

Serial names (a flash point)

the serial name is a horrid heterogenous construction! It can involve components of all three of the major phonetic classes essentially! However, I believe I have the definition right, with all the components correctly guarded :-)

name <- (PreName/AcronymicName) (comma2? !FalseMarked PreName/comma2? &([Cc] [Ii]) NameWord/comma2? CI predunit !(comma2? (!FalseMarked PreName))/comma2? CI AcronymicName)* freemod?

LA0 <- [ ]* [Ll] [Aa] juncture?

LANAME <- (LA0 comma2? name)

General construction of descriptive arguments

general constructions of arguments with "articles".

the rules here have the "possessive" construction as in [lemi hasfa; le la Djan, hasfa] embedded. These are not the same construction in 1989 Loglan, though speakers might think they are. Here they are indeed the same. The "possessor" cannot be "indefinite" (cannot start with a quantifier word); the possessor can be followed by a tense, as in [le la Djan, na hasfa], "John's present house", by analogy with [lemina hasfa], which is accepted by LIP (because LIP accepts [lemina] as a word).

there are other subtleties to be reviewed.

descriptn <- (!LANAME ((LAU wordset1)/(LOU wordset2)/(LE freemod? ((!mex arg1a freemod?)? (PA2 freemod?)?)? (mex freemod? arg1a/mex freemod? descpred/descpred))/(GE freemod? mex freemod? descpred)))

abstract descriptions. Note that abstract descriptions are closed with [guo] entirely independently of abstract predicates: [le po preda guo] does not have a grammatical component [po preda guo]. This avoids the double closure often apparently necessary in Lojban.

abstractn <- ((LEFORPO freemod? POA freemod? uttAx guoa?)/(LEFORPO freemod? POA freemod? sentence guoa?)/(LEFORPO freemod? POE freemod? uttAx guoe?)/(LEFORPO freemod? POE freemod? sentence guoe?)/(LEFORPO freemod? POI freemod? uttAx guoi?)/(LEFORPO freemod? POI freemod? sentence guoi?)/(LEFORPO freemod? POO freemod? uttAx guoo?)/(LEFORPO freemod? POO freemod? sentence guoo?)/(LEFORPO freemod? POU freemod? uttAx guou?)/(LEFORPO freemod? POU freemod? sentence guou?)/(LEFORPO freemod? PO freemod? uttAx guo?)/(LEFORPO freemod? PO freemod? sentence guo?))

a wider class of basic argument constructions. Notice that LANAME is always read by preference to descriptn.

namesuffix <- (&(comma2 !FalseMarked PreName/[ ]* [Cc][Ii] juncture? comma2? (PreName/AcronymicName)) ([ ]* [Cc][Ii] juncture? comma2?/comma2)? name)

arg1 <- (abstractn/(LIO freemod? descpred guea?)/(LIO freemod? argument1 guua?)/(LIO freemod? mex gap?)/LIO1/LAO/LANAME/(descriptn guua? namesuffix?)/LIU1/LIE/LI)

this adds pronouns (incl. the fancy [gao] letterals) and the option of left marking an argument with [ge]

arg1a <- ((DA/TAI/arg1/(GE freemod? arg1a)) freemod?)

Argument modifiers (subordinate clauses)

argmod1 <- ((([ ]* (N o) [ ]*)? ((JI freemod? predicate)/(JIO freemod? sentence)/(JIO freemod? uttAx)/(JI freemod? modifier)/(JI freemod? argument1)))/(([ ]* (N o) [ ]*)? (((JIZA freemod? predicate) guiza?)/((JIOZA freemod? sentence) guiza?)/((JIOZA freemod? uttAx) guiza?)/((JIZA freemod? modifier) guiza?)/(JIZA freemod? argument1 guiza?)))/(([ ]* (N o) [ ]*)? ((JIZI freemod? predicate guizi?)/(JIOZI freemod? sentence guizi?)/(JIOZI freemod? uttAx guizi?)/(JIZI freemod? modifier guizi?)/(JIZI freemod? argument1 guizi?)))/(([ ]* (N o) [ ]*)? ((JIZU freemod? predicate guizu?)/(JIOZU freemod? sentence guizu?)/(JIOZU freemod? uttAx guizu?)/(JIZU freemod? modifier guizu?)/(JIZU freemod? argument1 guizu?))))

we improved the trial.85 grammar by closing not argmod1 but argmod with [gui]. But the labelled argument modifier constructors when building an argmod1 have the argmod1 construction closed with the corresponding labelled right marker, of course. Thus gui and guiza actually have different grammar.

trial.85 did not provide forethought connected argument modifiers, and we also see no need for them, though they could readily be added.

argmod <- (argmod1 (A1 freemod? argmod1)* gui?)

Arguments resume

affix argument modifiers to a definite argument

arg2 <- (arg1a freemod? argmod*)

build a possibly indefinite argument from an argument: to le mrenu

arg3 <- (arg2/(mex freemod? arg2))

build an indefinite argument from a predicate

indef1 <- (mex freemod? descpred) affix an argument modifier to an indefinite argument

indef2 <- (indef1 guua? argmod*)

indefinite <- indef2

link arguments with the fusion connective [ze]

arg4 <- ((arg3/indefinite) (ZE2 freemod? (arg3/indefinite))*)

forethought connection of arguments. Note use of argx

arg5 <- (arg4/(KA freemod? argument1 freemod? KI freemod? argx))

arguments with possible negations followed by possible indirect reference constructions.

argx <- ((NO1 freemod?)* (LAE freemod?)* arg5)

afterthought connection with the tightly binding ACI connectives

arg7 <- (argx freemod? (ACI freemod? argx)?)

afterthought connection with the usual A connectives. Can't start with GE to avoid an ambiguity (to which 1989 Loglan is vulnerable) involving AGE connectives.

arg8 <- (!GE (arg7 freemod? (A1 freemod? arg7)*))

afterthought connection (now right grouping, instead of the left grouping above) using the AGE connectives. GUU can be used to affix an argument modifier at this top level.

argument1 <- (((arg8 freemod? AGE freemod? argument1)/arg8) (GUU freemod? argmod)*)

possibly negated and case tagged arguments. We (unlike 1989 Loglan) are careful to use argument only where case tags are appropriate.

argument <- ((NO1 freemod?)* (DIO freemod?)* argument1)

an argument which is actually case tagged.

argxx <- (&((NO1 freemod?)* DIO) argument)

Term lists

arguments and predicate modifiers actually associated with predicates. term <- (argument/modifier)

a term list consisting entirely of modifiers.

modifiers <- (modifier (freemod? modifier)*)

a term list consisting entirely of modifiers and tagged arguments.

modifiersx <- ((modifier/argxx) (freemod? (modifier/argxx))*)

the subject class is a list of terms (arguments and predicate modifiers) in which all but possibly one of the arguments are tagged, and there is at least one argument, tagged or otherwise.

subject <- ((modifiers freemod?)? ((argxx subject)/(argument (modifiersx freemod?)?)))

this case is identified as an aid to experimental termination of argument lists statement1 <- (subject freemod? (GIO freemod? terms1)? predicate)

change this to something you wont encounter to turn off the alternative parser or to statement1 to turn it on

statement1x <- 'xxx'

these classes are exactly argument, but are used to signal which argument position after the predicate an argument occupies. I think the grammar is set up so that these will actually never be case tagged, though the grammar does not expressly forbid it.

I am trying a simple version of the "alternative parser" approach: a term list will refuse to digest an argument which starts a new SVO sentence (statement1).

argumentA <- !statement1x argument

argumentA <- argument

argumentB <- !statement1x argument

argumentB <- argument

argumentC <- !statement1x argument

argumentC <- argument

argumentD <- !statement1x argument

argumentD <- argument

for argument lists not guarded against absorbing a following subject

argumentA1 <- argument

argumentB1 <- argument

argumentC1 <- argument

argumentD1 <- argument

a general term list. It cannot contain more than four untagged arguments (they will be labelled with the lettered subclasses given above).

terms <- ((modifiersx? argumentA (freemod? modifiersx)? argumentB? (freemod? modifiersx)? argumentC? (freemod? modifiersx)? argumentD?)/modifiersx)

terms list not guarded against absorbing a following subject

terms1 <- ((modifiersx? argumentA1 (freemod? modifiersx)? argumentB1? (freemod? modifiersx)? argumentC1? (freemod? modifiersx)? argumentD1?)/modifiersx)

innards of ordered and unordered list constructions. These are something I totally rebuilt, as they were in a totally unsatisfactory state in trial.85. Note the use of comma words to separate items in lists.

word <- (arg1a/indef2)

words1 <- (word (ZEIA? word)*)

words2 <- (word (ZEIO? word)*)

wordset1 <- (words1? LUA)

wordset2 <- (words2? LUO)

the full term set type to be affixed to predicates.

forethought connection of term lists

termset1 <- (terms/(KA freemod? termset2 freemod? guu? KI freemod? termset1))

afterthought connection of term lists. There are cunning things going on here getting [guu] to work correctly. Note that [guu] is NOT a null term list as it was in trial.85.

termset2 <- (termset1 (guu &A1)? (A1 freemod? termset1 (guu &A1)?)*)

there is an interesting option here of a list of terms followed by [go] followed by a predicate intended to metaphorically modify the predicate to which the terms are affixed. Is there a reason why we cannot have a more complex construction in place of terms?

termset <- ((terms freemod? GO freemod? barepred)/termset2)

The general verb phrase construction

this is the untensed predicate with arguments attached. Here is the principal locus of closure with [guu], but it is deceptive to say that [guu] merely closes barepred, as we have seen above, for example in [termset2].

barepred <- (sentpred freemod? ((termset guu?)/(guu (&termset)))?)

tensed predicates

markpred <- (PA1 freemod? barepred)

there follows an area in which my grammar looks different from trial.85. Distinct parallel forms for marked and unmarked predicates are demonstrably not needed even in trial.85. The behavior of the ACI connectives is plain weird in trial.85; here we treat ACI connectives in the same way as A connectives, but binding more tightly.

units for the ACI construction following -- possibly multiply negated bare or marked predicates.

adding shared termsets to logically connected predicates are handled differently here than in trial.85, which uses a very elegant but dreadfully left-grouping rule which a PEG cannot handle. Any realistic situation should be manageable.

backpred1 <- ((NO2 freemod?)* (barepred/markpred))

ACI connected predicates. Shared termsets are added. Notice how we first group backpred1's then recursively group backpreds.

backpred <- (((backpred1 (ACI freemod? backpred1)+ freemod? ((termset guu?)/(guu &termset))?) ((ACI freemod? backpred)+ freemod? ((termset guu?)/(guu &termset))?)?)/backpred1)

A connected predicates; same comments as just above. Cannot start with GE to fix ambiguity with AGE connectives.

predicate2 <- (!GE (((backpred (A1 !GE freemod? backpred)+ freemod? ((termset guu?)/(guu &termset))?) ((A1 freemod? predicate2)+ freemod? ((termset guu?)/(guu &termset))?)?)/backpred))

predicate2's linked with right grouping AGE connectives (A and ACI are left grouping).

predicate1 <- ((predicate2 AGE freemod? predicate1)/predicate2)

identity predicates from above, possibly negated

identpred <- ((NO1 freemod?)* (BI freemod? argument1 guu?))

predicates in general. Note that identity predicates cannot be logically connected except by using forethought connection (see above).

predicate <- (predicate1/identpred)

The sentence

The gasent is a basic form of the Loglan sentence in which the predicate leads. The basic structure is [PA word (usually a tense) or [ga]) followed optionally by terms followed optionally by [ga] followed by terms. The list of terms after [ga] (if present) will either contain at least one argument and no more than one untagged argument (a subject) [gasent1] or all the arguments of the predicate [gasent2]. We deprecate other arrangements possible in 1989 Loglan because they would cause unexpected reorientation of the arguments already given before [ga] as second and further arguments were read after [ga]. [barepred] is an untensed predicate possibly with arguments; [sentpred] is "simply a verb", i.e., a predicate without arguments.

there is a semantic change from 1989 Loglan reflected in a grammar change here: in [gasent1] the final (ga subject) is optional. When it does not appear, the resulting sentence is an observative (a sentence with subject omitted), not an imperative. Imperatives for us are unmarked.

In the alternative version, the use of the large subject marker GAA can prevent inadvertant absorption of a preceding trailing argument into a statement

4/22 allowing general predicates in gasent. Otherwise the spaces of observatives and imperatives become quite confused.

gasent1 <- ((NO1 freemod?)* (GAA? freemod? &markpred predicate (GA2 freemod? subject)?))

gasent2 <- ((NO1 freemod?)* (GAA? PA1 freemod? sentpred modifiers? (GA2 freemod? subject freemod? GIO? freemod? terms?)))

gasent <- (gasent2/gasent1)

this is the simple Loglan sentence in various basic orders. The form "gasent" is discoussed just above. Predicate modifiers can be prefixed to the gasent. The final form given here is the basic SVO sentence. The "subject" class is a list of terms #(arguments and predicate modifiers) containing at most one un-case-tagged argument. The most general SVO form is subject, followed optionally #by [gio] followed by a list of terms (1989 Loglan allowed more than one untagged argument before the predicate, but this leads to practical problems #in which preceding constructions with errors in them may supply extra unintended arguments. It should be noted in NB3 that JCB envisioned #a single argument before the predicate, followed by the predicate, which may itself contain further arguments. A gasent nay optionally be negated #(even multiple times).

re [gio] and some other changes, in his comments on the NB3 grammar JCB often notes restrictions on appearances of term lists which he intends but which he thought were hard to implement in the machine grammar. The appearance of just one argument before the "verb" in an SVO sentence was one of these (though later he takes it as a virtue that the actual machine grammar supports SOV: we did not consider it a virtue to have unmarked SOV after observing unintended parses appearing in the Visit text). Another example of this (which would not have been hard for JCB to implement, in fact) is our restriction of the form "terms gasent" to "modifiers gasent". His comments make it clear that he does not want arguments among those terms.

statement <- (gasent/(modifiers freemod? gasent)/(subject freemod? GAA? freemod? (GIO freemod? terms1)? predicate))

this is a forethought connected basic sentence. It is odd (and actual odd results can be exhibited) that the final segment in both of these rules is of the very general class uttA1, which includes some quite fragmentary utterances usually intended as answers.

12/20/2017 I rewrote the rule in a more compact form. This rule looks ahead to the class [sentence] which we now develop; for the moment notice that [sentence] will include [statement].

4/14 tentatively allowing initial modifiers here and leaving this out of uttA0 which replaces uttA1 below. The intention is to eliminate weird sentence fragments.

keksent <- modifiers? freemod? (NO1 freemod?)* (KA freemod? headterms? freemod? sentence freemod? KI freemod? uttA0)

sentence negation. We allow this to be set off from the main sentence with a mere pause, because generally it does not differ in meaning from the result of negating the first argument or predicate modifier.

neghead <- ((NO1 freemod? gap)/(NO2 PAUSE))

this class includes [statement], predicate modifiers preceding a predicate (which may contain arguments), a statement, a predicate, and a keksent. Of these, the first and third are imperatives.

in the alternative version, the large subject marker GAA can prevent inadvertant absorption of preceding trailing arguments into a statement

4/23/2019 added actual rule for imperative sentences. This should not affect the parse in any essential way.

imperative <- ((modifiers freemod?)? GAA? !gasent predicate)

sen1 <- (neghead freemod?)* (imperative/statement/keksent)

sen1 <- ((neghead freemod?)* ((modifiers freemod? GAA? !gasent predicate)/statement/GAA? predicate/keksent))

the class [sentence] consists of sen1's afterthought connected with A connectives

sentence <- (sen1 ([!.:;?]? ICA freemod? sen1)*)

[headterms] is a list of terms (arguments and predicate modifiers) ending in [gi]. Preceding a [sen1] with these causes all predicates in the [sen1] to share these arguments. We propose either that the headterms arguments be directly appended to the argument list of each component of the [sen1], or that there is an argument with a numbered case tag at the beginning of the headterms list, and the list is inserted at the appropriate position in each component sentence. Neither of these is the condition described in Loglan I, which presupposes that we always know what the last argument of each predicate used is.

headterms <- (terms GI)+

this is the previous class prefixed with a list of fronted terms. we think the [giuo] closure might prove useful.

uttAx <- (headterms freemod? sentence giuo?)

Utterances

weird answer fragments

uttA <- ((A1/mex) freemod?)

a broad class of utterances, including various things one would usually only say as answers. Notice that this utterance class can take terminal punctuation.

uttA0 <- sen1/uttAx

uttA1 <- ((sen1/uttAx/links/linkargs/argmod/(modifiers freemod? keksent)/terms/uttA/NO1) freemod? period?)

possibly negated utterances of the previous class.

uttC <- ((neghead freemod? uttC)/uttA1)

utterances linked with more tightly binding ICI sentence connectives. Single sentences are of this class if not linked with ICI or ICA.

uttD <- ((sentence period? !ICI !ICA)/(uttC (ICI freemod? uttD)*))

utterances of the previous class linked with ICA. I went to some trouble to ensure that a freestanding [sentence] is actually parsed as a sentence, not a composite uttD, which was a deficiency, if not an ambiguity of LIP and of the trial.85 grammar.

uttE <- (uttD (ICA freemod? uttD)*)

utterances of the previous class linked with I sentence connectives.

uttF <- (uttE (I freemod? uttE)*)

the utterance class for use in the context of parenthetical freemods or quotations, in which it does not go to end of text.

utterance0 <- (!GE ((ICA freemod? uttF)/(!PAUSE freemod period? utterance0)/(!PAUSE freemod period?)/(uttF IGE utterance0)/uttF/(I freemod? uttF?)/(I freemod? period?)) (&I utterance0)?)

Notice that there are two passes here: the parser first checks that the entire utterance is phonetically valid, then returns and checks for grammatical validity.

the full utterance class. This goes to end of text, and incorporates the phonetics check. This incorporates the only situations in which a freemod is initial. The IGE connectives bind even more loosely than the I connectives and right-group instead of left grouping.

utterance <- &(phoneticutterance !.) (!GE ((ICA freemod? uttF (&I utterance)? end)/(!PAUSE freemod period? utterance)/(!PAUSE freemod period? (&I utterance)? end)/(uttF IGE utterance)/(I freemod? period? (&I utterance)? end)/(uttF (&I utterance)? end)/(I freemod? uttF (&I utterance)? end)))