PyPES™ is the Python Platform for Experimentation with Semantics. At present, the term "platform" is perhaps an aspiration rather than an accomplishment. — Let's call them "Python Procedures for Experimentation with Semantics" for now.

license

PyPES™ is free software. In a nutshell: You're free to use this software and redistribute modifications, as long as you advertise the fact that you're doing this and don't show your gratitude by suing any PyPES™ contributors. But note that the licensing terms, in their legal smallprint, supersede and prevail over this paragraph.

subversion access

The preferred way of obtaining PyPES™ is via subversion. For anonymous access via HTTP, do something along the following lines.

$ svn co http://svn.semantilog.org/PyPES/trunk PyPES
[...]
 U   PyPES
Checked out revision 543.
$

This will get you the newest "trunk" development version. If you would instead like a tagged version, do this instead:

$ svn co http://svn.semantilog.org/PyPES/tags/0.99.2 PyPES-0.99.2

download

documentation

Some documentation is available as well. (It's not a lot yet, but it will hopefully expand gradually).

mailing list

your e-mail:

After hitting the "subscribe" button above, you will receive an e-mail containing instructions on how to proceed to membership validation.

about PyPES™

mrs input

Currently, there is only one source for semantic representations that the library is developed and routinely being tested on. This is the ERG grammar, a grammar of English based on the HPSG linguistic theory. It is a computational grammar within the DELPH-IN software toolkit for grammar engineering and parsing.

In principle, however, PyPES™ should straightforwardly apply to other DELPH-IN grammars, or any other toolkit producing compatible MRS structures. MRS is an algebra governing the grammatical composition of semantic representations, as well as a representation language for the resulting semantic structures.

...these are a lot of acronyms, considering the length of the above paragraph. The truth is: if these terms aren't familiar to you already, this page won't be of any use to you. But please do get in touch, if you want to find out how PyPES™ can be useful to you.

protoforms™: the semantics of the semantics

PyPES™ can read an MRS structure and convert it into a protoform™, and then execute the following operations on it:

  • input and output in various plaintext and XML formats;
  • managing identities of variables and handles, automating (re)naming, preventing accidental capturing, etc.;
  • comparisons for symbolic and isomorphic subsumption;
  • efficient solving and selective exploration of scope underspecification ambiguities (basically reimplementing Utool functionality);
  • ambiguity-invariant scope resolution based on syllogistic forms and reification yielding collective readings (every man loves a woman = there is some loving going on, with all men collectively forming the subject and a woman forming the object of that sentence);
  • cross-referencing the grammar's word-prime semantics with information in the grammar's SEM-I;
  • translation of the grammar's word-prime semantics for logical connectives, quantifiers and other higher-order constructions, as well as closed-class predications like equality into first-order logic.

McPIET™

PyPES™ is primarily a toolkit for experimenting with textual inference or pseudo-inference based on deep semantics. Although the above operations may well be useful for a broad range of semantically-motivated NLP solutions, their primary function within PyPES™ is to support the application of logic to natural language semantics.

PyPES™ contains the Monte Carlo Pseudo Inference Engine for Text (McPIET™). Let's say you have the correct MRSes for the following sentences:

(p1)   Socrates is a man.
(p2)   Every man is mortal.
(q)   Socrates is mortal.
(q')   Socrates is not mortal.
(r)   Socrates will die one day.

PyPES™ transforms them into protoforms™, and can compose them into a single protoform™ representing a compound proposition, e.g. p1 ∧ p2 → q.

McPIET™ can work out that this propositional formula p1 ∧ p2 → q is a tautology (standard of proof 1.0), while the formulae p1 ∧ p2 → ¬ q and p1 ∧ p2 → q' are contradictions (standard of proof 0.0).

The formula p1 ∧ p2 → r is a logical contingency. McPIET™ will then return a standard of proof between 0.0 and 1.0, exclusive. You are now in the situation, where you are missing knowledge (Every man who is mortal will die one day). But rather than insist on completeness and consistency, McPIET™ will allow you to use such a numeric standard of proof to draw comparisons.

PyPES™ tools for inference data and evaluation

You may well find PyPES™ useful as an experimental framework for working with inference datasets, even if you will not be using McPIET™ or any of the PyPES™ functionality for working with semantic representations. In particular, the datasets supported are

Given these datasets, PyPES™ supports a number of useful operations on them:

  • "cosmetic" preprocessing such as fixing punctuation;
  • conversion of the various datasets into a common XML format for the logical and textual data as well as annotation;
  • production of lists of unique textual items, removing duplicates;
  • running preprocessing infrastructure (such as a parser) on the text items offline and storage of results for later use in inferencing;
  • running inference engines, and recording results in annotation files;
  • determining entailment decisions from system outputs by thresholding;
  • comparing annotations with entailment decisions (notably the gold standard vs. the output of various systems to be compared against each other);
  • producing output files from such statistics in CSV format at various levels of aggregation for visualization and further analysis in external tools.

PyPES™/McPIET™ vs. Boxer vs. Glue Semantics

With the scoping machinery and the first-order approximation in place, PyPES™ makes it possible to translate text into formulae of FOPC. This is what Boxer does for CCG and what Glue Semantics does for LFG. (I'm not sure about the availability of a software implementation for the latter).

However, the devil is in the detail as usual. Boxer uses DRT as an intermediate representation, and glue semantics uses linear logic as a metalogic to describe the relationship between f-structures and FOPC formulae. Thus, the three implementations take very different approaches to such a translation of text into FOPC.

IMHO, the main problem with Boxer and glue semantics is their strong commitment throughout to classical bivalent logic, which is limited in its ability to represent natural language semantics. FOPC lacks some kinds of expressive power that are important for natural language, such as quantifiers like most as well as weakening and strengthening modifiers like very. The straightforward logical encoding used by Boxer and glue semantics leads to overcommitment in some places, for example forcing strictly recursive quantifier scopings when little or nothing is known about the scopings from the natural language input. PyPES™, on the other hand, is heavily inspired by slacker semantics in trying to avoid overcommitment.

McPIET™, instead of strictly assuming first order theorem proving, allows any kind of logic with a language and model theory that vaguely resembles a predicate calculus. If you can write down a model theoretic definition assigning truth values to quantifications involving most or modifications involving very, these definitions can be straightforwardly incorporated into McPIET™. — This is not possible for Boxer or glue semantics.

McPIET™ also brings a modern edge to inferencing in making heavy use of soft computing techniques. These allow robust approximate inferencing in the face of missing real-world and background knowledge where classical theorem proving fails due to the limitations of the classical notions of validity and unsatisfiability for complete and consistent logical theories.

(c) Copyright 2007 -- 2009