KIF: BNF
pat hayes
phayes at ai.uwf.edu
Thu Jun 28 12:54:49 CDT 2001
KIF Syntax
An ISO- KIF document is encoded as a sequence of Unicode characters (
UCS-2, from the Unicode 'base multilingual plane' according to
ISO/IEC 10646-1:1999). The standard encoding is UTF-8 (ISO
10646-1:2000 Annex D ) .
Only characters in the US-ASCII subset are reserved for special use
in ISO-KIF, so that SUO-KIF can be encoded as an ASCII text string if
required. UCS-2 characters outside that range are represented in
ASCII text by a character coding sequence of the form \unnnn or
\unnnnnn where n is a hexadecimal digit character. When transforming
an ASCII text string to UTF-8, such a sequence should be replaced by
the corresponding UTF-8 byte encoding.
The rest of this document uses the ASCII rendering.
Lexicalization
The syntax of KIF is designed to be easy to process mechanically. The
syntax is defined in terms of disjoint blocks of characters called
lexical items or lexemes . A character stream can be converted into a
stream of lexemes by a simple recursive process of lexicalisation
which checks for a small number of interlexical characters, which
indicate the end of a lexical item (and sometimes the start of the
next one.) The interlexical characters are:
" U+0022
( U+0028
) U+0029
<space> U+0020
<tab> U+0009
<line> U+000A
<page> U+000C
<return> U+000D
(Note that the backslash character \ (reverse solidus U+005C) also
plays a role in lexicalization, by cancelling the interlexicality of
the immediately following character.)
Certain characters are reserved for special use as the first
character in a lexical item. Most lexical items can be assigned a
unique syntactic role solely on the basis of their first character,
called the marking character; this role is then sufficient to enable
a parser to construct the syntactic form of the KIF expression. The
marker characters are:
< U+003C
" U+0022
( U+0028
) U+0029
@ U+0040
# U+0023
? U+0035
% U+0025
` U+0060
= U+003D
/ U+002F
[ U+005B
: U+003A
(Note. Not all of the marking characters are included in the lexical
class <marker>)
(Note. The < (U+003C) and \ (reverse solidus U+005C) characters are
reserved for special use.
< is reserved for use in an XML coding of ISO-KIF.
\ plays a special role in enabling "out of code" characters to be
included in KIF ASCII text. Followed by the lowercase letter u
(U+0075) and a four- or six-digit hexadecimal code, it is used to
transcribe non-ASCII UTF-8 characters, as explained above. Any string
of this form plays the same syntactic role in an ASCII string
rendering as a single ordinary character. Backslash is also used as a
cancelling prefix. The special roles of any of the interlexical or
marking characters, and of the backslash character itself, can be
cancelled by preceding the character with a backslash. This enables
marking and interlexical characters to be included in lexical items
without writing their Unicode equivalents.
In the following BNF, angle brackets <> enclose syntactic class
names, | indicates alternatives, * indicates any sequence of
expressions, + indicates any nonempty sequence, square brackets [ ]
indicate an optional sequence, and curly brackets { } are used as
grouping characters. To avoid ambiguity, <plus>, <asterisk>, and
<vertical> are used to indicate the characters + , * and | .
The BNF also applies to UTF-8 encodings, with the change noted to the
category <nonascii>.
1. Character classification
<white> ::= <space>|<tab>|<line>|<page>|<return>
<lexbreak> ::= <white>|(|)|"|\
<alpha> ::= A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z|
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
<digit> ::= 0|1|2|3|4|5|6|7|8|9
<marker> ::= @|#|?|`|.|=|/|[|:|%
<other> ::= !|$|%|&|<asterisk>|<plus>|,|-|;|]|^|_|{|}|<vertical>|~|>|'
2. Lexical analysis
<hexa> ::= <digit>|A|B|C|D|E|F
<nonascii> ::=
\u<hexa><hexa><hexa><hexa>|\u<hexa><hexa><hexa><hexa><hexa><hexa>
Note. For UTF-8 strings, <nonascii> is the set of all non-control
characters of UTF-8 outside the ASCII range, ie all non-control
characters in the range U+007F-FFFFFD. The use of the \uXXXX sequence
in UTF-8 character strings, while not illegal, is DEPRECATED.
<insert> ::= \<lexbreak>|<nonascii>
<wordchar> ::= <alpha>|<digit>|<marker>|<other>|<insert>
<stringchar> ::= <wordchar>|<white>|(|)
<docstring> ::= "<stringchar>*"
Note. A docstring can contain any character, but quotemark " or
backslashh can occur only in the combination /" or //. Docstrings are
the only lexical items that can contain whitespace. They are used
solely for documentation purposes and are considered to be 'atomic'
lexical items with no logical interpretation or syntactic structure.
They can be used to attach any kind of 'external' information to a
KIF expression.
<lexitem> ::= <wordchar>*|(|)|<docstring>
Note. <lexitem> is an "umbrella" category which includes all lexical items.
The job of a lexical analyser is to divide the input character stream
into a stream of non-overlapping lexical items, separated by
<lexbreak> characters.
The rest of the syntactic specification is understood to refer only
to the stream of lexical items, so that any whitespace (not included
in an <insert>) is ignored. It defines a finer classification of
lexical items into distinct lexical subclasses.
3. Lexical syntax
<relname> ::= =|/=|
<connective> ::= and|or|iff|implies|not
<wordbody> ::= <wordchar>+
<indvar> ::= ?<wordbody>
<word> ::= {<alpha>|<other>|<insert>}<wordchar>*
<specialname> ::= %<wordbody>|documentation
<restriction> ::= :<word> //not a typo :-) //
<seqvar> ::= @<wordbody>
<importname> ::= #<wordbody>
<quote> ::= `<wordbody>'
<numeral> ::= <digit>*
Note. A word is any lexical item that does not start with a marking
character (though it may include them.)
Numerals should not be used as words, to avoid confusion with their
use in ontologies which refer to numbers.
<specialname> is a name form reserved for 'private use' by other
systems which may wish to attach information to KIF expressions in
other formats (such as timestamping and authorship information).
The KIF core uses only <relname>, <connective>, <indvar> , <word> ,
<reservename> and <formname>; the other lexical forms have special
meanings in various KIF extensions, so their use in the KIF core is
PROHIBITED in order to maintain compatibility with the extensions.
Seqvars are used in the sequence-quantifier extension; importnames
are used in the namespace extension; quotes are used in the meta
extension. Note that quotes are distinct from documentation strings.
4. Core expression syntax //read Notes and comments below before exploding.//
Note. All KIF expressions are written in LISP-style 'S-expression
syntax'; they consist entirely of nested pairs of parentheses
enclosing the logical subexpressions of the expression in question.
Hence, KIF expressions can be encoded directly as LISP data
structures. However, KIF does not require that expressions must be
encoded in this way.
Note. The expression syntax will be modified in some of the extensions.
<term> ::= <indvar> |<name>|(<term>+)|(term
<specialname>*)|(<term> <docstring>)
<eqsent> ::= (= <term> <term>)
<ineqsent> ::= (/= <term> <term>)
<atomsent> ::= (<term>+)|<eqsent>|<ineqsent>
<boolsent> ::= ({and|or} <sent>*) | ({implies|iff}
<sent><sent>)|(not <sent>)
<quantsent> ::= ({forall|exists} (<var>*) <sent>)
<sent> ::= <atomsent>|<boolsent>|<quantsent> |(<sent> <docstring>)
<docform> ::= (documentation [<word>] <docstring>)
-------
Notes and comments
This BNF is rather persnickety, but it has the merit of being
absolutely watertight in the way it handles whitespace and
lexicalization (unlike previous KIF BNFs, which all do some
handwaving.) Ive tried to organize it logically: all the
lexicalization and character-streaming issues, and the Unicode/ASCII
translations, are handled in parts 1 and 2; part 3 classifies lexical
items, and then all the actual syntax is kept in one place. Only part
4 should need to be altered in any extension.
I have deliberately kept the form of <words> as lexically
unrestricted as possible, to allow people to use things like URIs as
words. This requires being rather 'generous' with allowing characters
into words, but that is fine as long as we have some way to put them
into categories, and since we use ? and @ for variable markers, I
decided to try to keep this idea going by trying to do it all by the
choice of the first character.
I used a % prefix rather than the post/prefix dots to indicate
'external' words with a special non-KIF meaning, but if anyone
objects to this convention we could replace it with some other
convention, or omit it altogether from the core (though I think it
would be pity to omit it as it is quite harmless and may solve some
very big headaches for some users.)
Keeping < as a marking character is just a way to stop it being the
first item in any word, as that would break XML. (In XML one uses
< to escape the difficulty.)
The expression syntax is about as free as it can possibly be: there
is no distinction between function, constant or relation symbols, and
variables can appear anywhere. It would be easy to put those
classifications back in if people want them there.
The syntax includes a proposal which I think it would be potentially
useful and relatively harmless to include. This is what I was
talking about as 'wrappers' in the telecon.
The last clauses in <term> and <sent> allow comments to be 'wrapped'
around any subexpression of a KIF expression; the idea is that these
will be semantically invisible. So for example you could write
(forall (?x ?y)(iff ((R ?x ?y) "why is this called R? -Pat 3/18") (R ?y ?x)))
which would have the exact same meaning as what you would get by
stripping out the comment syntax:
(forall (?x ?y)(iff (R ?x ?y)(R ?y ?x)))
In the same spirit I've included documentation forms (not
'expressions' since they have no semantics) in the syntax, with or
without names. To 'attach' an documentary string to an axiom, just
write
( <the axiom>
"documentation: yadahyadah")
which is logically indistinguishable from writing the axiom by itself.
I have omitted restricted quantifier forms from this syntax, but it
would be easy to put them back in if people want them. I think they
are harmless and useful. Change <quantsent> to:
<quantsent> ::= ({forall|exists} ({<var>[<restriction>]}* <sent>*) <sent>)
We are going to need restrictions in the sorted extension in any case.
---------------------------------------------------------------------
IHMC (850)434 8903 home
40 South Alcaniz St. (850)202 4416 office
Pensacola, FL 32501 (850)202 4440 fax
phayes at ai.uwf.edu
http://www.coginst.uwf.edu/~phayes
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/enriched
Size: 11320 bytes
Desc: not available
Url : http://philebus.tamu.edu/pipermail/kif/attachments/20010628/efea82ff/attachment.bin
More information about the Kif
mailing list