Environments¶

Saturation Environment¶

class gym_saturation.envs.saturation_env.SaturationEnv(max_clauses: int = 1000, render_mode: str = 'human')¶

Saturation algorithm in a reinforcement learning friendly way.

It’s an abstract class, so here we have only trivial smoke tests. One should override _do_deductions method in children classes.

Refer to SaturationEnv for more documentation.

>>> class DummyProver(SaturationEnv):
...     def _do_deductions(action):
...         pass
>>> env = DummyProver(render_mode="rgb_array")
Traceback (most recent call last):
 ...
ValueError: Expected a render mode among ['ansi', 'human'] but got rgb_a...

>>> env = DummyProver()
>>> env.render_mode = "rgb_array"
>>> env.render()
Traceback (most recent call last):
 ...
NotImplementedError

__init__(max_clauses: int = 1000, render_mode: str = 'human')¶

Initialise spaces et al.

Parameters:

max_clauses – maximal number of clauses to store in proof state
render_mode – a mode of running render method

get_task() → str¶

Get the task that the agent is performing in the current environment.

Returns:: a TPTP problem filename
Raises:: ValueError – is task is not set

on_truncated() → None¶: Prover-specific episode truncation.

render() → None¶

Return or print the proof state.

Returns:: proof state (TPTP formatted) or nothing (depending on render_mode)

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]¶

Reset the environment.

Parameters:

seed – seed for compatibility
options – options for compatibility

Returns:

observations and info

set_task(task: str) → None¶

Set the specified task to the current environment.

Parameters:: task – a TPTP problem filename

step(action: Any) → Tuple[Tuple[Dict[str, Any], ...], float, bool, bool, Dict[str, Any]]¶

Run one time-step of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. Accepts an action and returns a tuple (observation, reward, terminated, truncated, info)

Parameters:

action – an action provided by the agent

Returns:

a tuple of four values:

observation: agent’s observation of the current environment
reward: amount of reward returned after previous action
terminated: Whether the proof was found
truncated: Whether the maximal number of clauses in the proof state were reached
info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Saturation Environment with Vampire back-end¶

class gym_saturation.envs.vampire_env.VampireEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')¶

An RL environment wrapper around Vampire prover.

Refer to SaturationEnv for more documentation.

We can run a full Gymnasium environment check:

>>> from gymnasium.utils.env_checker import check_env
>>> import gymnasium as gym
>>> env = gym.make("Vampire-v0").unwrapped
>>> check_env(env)
cnf(1, ...).
...
cnf(5, ...).

repeating actions change nothing

>>> env = gym.make("Vampire-v0", max_clauses=5)
>>> _ = env.reset()
>>> one = env.step(0)
>>> two = env.step(0)
>>> one == two
True

episode is truncated if we have more than max_clauses clauses

>>> _, _, _, truncated, _ = env.step(1)
>>> _, _, _, truncated, _ = env.step(2)
>>> truncated
True

sometimes Vampire can solve a problem during pre-processing

>>> from gym_saturation.constants import MOCK_TPTP_PROBLEM
>>> trivial_problem = os.path.join(os.path.dirname(MOCK_TPTP_PROBLEM),
...     "TST002-1.p")
>>> env.unwrapped.set_task(trivial_problem)
>>> _, _ = env.reset()
>>> env.unwrapped.state.terminated
True
>>> _, _, terminated, _, _ = env.step(0)
>>> terminated
True

a test of an unexpected reply from Vampire

>>> from gym_saturation.constants import MOCK_TPTP_FOLDER
>>> vampire_binary = os.path.join(MOCK_TPTP_FOLDER, "..", "vampire-mock")
>>> vampire_env = VampireEnv(prover_binary_path=vampire_binary)
>>> vampire_env.reset()
Traceback (most recent call last):
 ...
ValueError: ('Unexpected response type: ', 'who could expect that?')

__init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')¶

Initialise a VampireWrapper.

Parameters:

max_clauses – maximal number of clauses in proof state
render_mode – a mode of running render method
prover_binary_path – a path to Vampire binary; by default we expect it to be in the $PATH

close() → None¶: Terminate Vampire process.

on_truncated() → None¶: Terminate Vampire process.

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]¶

Reset the environment.

Parameters:

seed – seed for compatibility
options – options for compatibility

Returns:

observations and info

Saturation Environment with iProver back-end¶

class gym_saturation.envs.iprover_env.IProverEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')¶

An RL environment around iProver.

Refer to SaturationEnv for more documentation.

>>> from gymnasium.utils.env_checker import check_env
>>> import gymnasium as gym
>>> env = gym.make(
...     "iProver-v0",
...     max_clauses=5
... ).unwrapped
>>> env.relay_server
Traceback (most recent call last):
 ...
ValueError: run ``reset`` first!
>>> check_env(env)
cnf(c_53, ...).
...
cnf(c_49, ...).

when episode truncated, all threads are terminated

>>> _ = env.reset()
>>> _, _, _, truncated, _ = env.step(4)
>>> truncated
True

__init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')¶

Initialise the environment.

Parameters:

max_clauses – maximal number of clauses in proof state
render_mode – a mode of running render method
prover_binary_path – a path to iProver binary; by default, we assume it to be iproveropt and in the $PATH

close() → None¶: Stop relay server.

on_truncated() → None¶: Terminate threads.

property relay_server: RelayServer¶

Return the relay server object.

Raises:: ValueError – if called before reset

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) → Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]¶

Reset the environment.

Parameters:

seed – seed for compatibility
options – options for compatibility

Returns:

observations and info