SaturationEnv¶

SaturationEnv is an abstract class for environments guiding the choice of a given clause in the saturation algorithm used to build automated theorem provers. It has two subclasses:

VampireEnv is an environment for guiding the choice of a given clause in the saturation loop of a Vampire prover. Since we focus on guiding the saturation loop here, we don’t use the Avatar [3]
IProverEnv is an environment for guiding the choice of a given clause in the saturation loop of the iProver [1]

Action Space	`Discrete(n)`
Observation Space	`Sequence(Clause(n), stack=False)`
import	`import gym_saturation; gymnasium.make("Vampire-v0")`
import	`import gym_saturation; gymnasium.make("iProver-v0")`

Here Clause(n) is an alias for

Dict(
    'birth_step': Discrete(n),
    'inference_parents': Sequence(
        Text(1, 256, characters=ALPHANUMERIC_WITH_UNDERSCORE),
        stack=False
    ),
    'inference_rule': Text(1, 256, characters=ALPHANUMERIC_WITH_UNDERSCORE),
    'label': Text(1, 256, characters=ALPHANUMERIC_WITH_UNDERSCORE),
    'literals': Text(1, 4000, characters=EXTENDED_ALPHANUMERIC),
    'role': Text(1, 256, characters=ALPHANUMERIC_WITH_UNDERSCORE)
)

and EXTENDED_ALPHANUMERIC is ALPHANUMERIC_WITH_UNDERSCORE extended by nine special characters (), |~=!$. Such a structure corresponds to clauses (logical statements) in the TPTP language.

Description¶

The given clause (or saturation) algorithm is the basis of many contemporary provers. See [2] for an excellent introduction and Python code snippets. In short, the algorithm is the following one:

unprocessed_clauses: list[Clause] = get_preprocessed_theorem_statement()
processed_clauses: list[Clause] = []
while EMPTY_CLAUSE not in unprocessed_clauses and unprocessed_clauses:
    given_clause: Clause = select_given_clause(unprocessed_clauses)
    processed_clauses.append(given_clause)
    new_clauses: list[Clause] = apply_inference_rules(
        given_clause, processed_clauses
    )
    unprocessed_clauses.extend(new_clauses)
    unprocessed_clauses.remove(given_clause)

get_preprocessed_theorem_statement corresponds to the environment reset, and typically includes parsing, Skolemization, transformation to conjunctive normal form among other things.

apply_inference_rules is a ‘logic engine’ of a prover corresponding to what happens during the environment’s step. It usually includes resolution, superposition and other nasty stuff. To guarantee the algorithm applicability, the inference rules system must be refutation complete.

select_given_clause is a trainable agent policy.

If the EMPTY_CLAUSE (aka falsehood or contradiction) appears among the unprocessed_clauses, it means we arrived at a refutation of the original set of clauses, which gives us a proof by contradiction for our theorem. If the unprocessed_clauses becomes empty, it means we’ve built a counter-example for our proposition.

Action Space¶

Action is an index of a given clause. It belongs to a discrete space of size n. n is the maximal number of clauses in a proof state (unprocessed_clauses and processed_clauses together).

Observation Space¶

An observation is a tuple of strings, containing a clause literals in the TPTP syntax, e.g. 'mult(X, mult(Y, Z)) = mult(mult(X, Y), Z)' for each clause belonging to unprocessed_clauses and processed_clauses

Starting State¶

A starting state of the environment depends on a task set (a theorem to prove). If there are N unprocessed clauses in the pre-processed theorem statement, the starting state contains N strings.

By default, the task is a simple theorem from group theory:

cnf(associativity, axiom, mult(X, mult(Y, Z)) = mult(mult(X, Y), Z)).
cnf(left_identity, axiom, mult(e, X) = X).
cnf(left_inverse, axiom, mult(inv(X), X) = e).
cnf(idempotent_element, hypothesis, mult(a, a) = a).
cnf(negated_conjecture, negated_conjecture, ~ a = e).

One can set another task by specifying a filename of a respective TPTP problem:

env.set_task(filename)

Rewards¶

Reward is 1.0 after a step iff the saturation algorithm terminated at this step, and 0.0 otherwise.

Episode End¶

Termination means the saturation algorithm ended with refutation found or satisfiability established.
Truncation happens if the number of clauses in the state exceeds action_space.n.

Information¶

The environment returns no additional information.

Arguments¶

import gymnasium

gymnasium.make(
    "Vampire-v0",  # or "iProver-v0"
    max_clauses=1000,
    render_mode="human",
    prover_binary_path="vampire",  # or "iproveropt"
)

max_clauses=1000: the size n of the action space.

render_mode="human": either ansi (return the clauses from the current proof state in the TPTP format) or human (print the ansi rendering to the standard output)

prover_binary_path="vampire" (or "iproveropt"): the path to a prover binary (supposed to be on the $PATH by default)

References¶

Version History¶

v0: Initial version release