Environments

Saturation Environment

class gym_saturation.envs.saturation_env.SaturationEnv(max_clauses: int = 1000, render_mode: str = 'human')

Saturation algorithm in a reinforcement learning friendly way.

It’s an abstract class, so here we have only trivial smoke tests. One should override _do_deductions method in children classes.

Refer to SaturationEnv for more documentation.

>>> class DummyProver(SaturationEnv):
...     def _do_deductions(action):
...         pass
>>> env = DummyProver(render_mode="rgb_array")
Traceback (most recent call last):
 ...
ValueError: Expected a render mode among ['ansi', 'human'] but got rgb_a...
>>> env = DummyProver()
>>> env.render_mode = "rgb_array"
>>> env.render()
Traceback (most recent call last):
 ...
NotImplementedError
__init__(max_clauses: int = 1000, render_mode: str = 'human')

Initialise spaces et al.

Parameters:
  • max_clauses – maximal number of clauses to store in proof state

  • render_mode – a mode of running render method

get_task() str

Get the task that the agent is performing in the current environment.

Returns:

a TPTP problem filename

Raises:

ValueError – is task is not set

on_truncated() None

Prover-specific episode truncation.

render() None

Return or print the proof state.

Returns:

proof state (TPTP formatted) or nothing (depending on render_mode)

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]

Reset the environment.

Parameters:
  • seed – seed for compatibility

  • options – options for compatibility

Returns:

observations and info

set_task(task: str) None

Set the specified task to the current environment.

Parameters:

task – a TPTP problem filename

step(action: Any) Tuple[Tuple[Dict[str, Any], ...], float, bool, bool, Dict[str, Any]]

Run one time-step of the environment’s dynamics.

When end of episode is reached, you are responsible for calling reset() to reset this environment’s state. Accepts an action and returns a tuple (observation, reward, terminated, truncated, info)

Parameters:

action – an action provided by the agent

Returns:

a tuple of four values:

  • observation: agent’s observation of the current environment

  • reward: amount of reward returned after previous action

  • terminated: Whether the proof was found

  • truncated: Whether the maximal number of clauses in the proof state were reached

  • info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)

Saturation Environment with Vampire back-end

class gym_saturation.envs.vampire_env.VampireEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')

An RL environment wrapper around Vampire prover.

Refer to SaturationEnv for more documentation.

We can run a full Gymnasium environment check:

>>> from gymnasium.utils.env_checker import check_env
>>> import gymnasium as gym
>>> env = gym.make("Vampire-v0").unwrapped
>>> check_env(env)
cnf(1, ...).
...
cnf(5, ...).

repeating actions change nothing

>>> env = gym.make("Vampire-v0", max_clauses=5)
>>> _ = env.reset()
>>> one = env.step(0)
>>> two = env.step(0)
>>> one == two
True

episode is truncated if we have more than max_clauses clauses

>>> _, _, _, truncated, _ = env.step(1)
>>> _, _, _, truncated, _ = env.step(2)
>>> truncated
True

sometimes Vampire can solve a problem during pre-processing

>>> from gym_saturation.constants import MOCK_TPTP_PROBLEM
>>> trivial_problem = os.path.join(os.path.dirname(MOCK_TPTP_PROBLEM),
...     "TST002-1.p")
>>> env.unwrapped.set_task(trivial_problem)
>>> _, _ = env.reset()
>>> env.unwrapped.state.terminated
True
>>> _, _, terminated, _, _ = env.step(0)
>>> terminated
True

a test of an unexpected reply from Vampire

>>> from gym_saturation.constants import MOCK_TPTP_FOLDER
>>> vampire_binary = os.path.join(MOCK_TPTP_FOLDER, "..", "vampire-mock")
>>> vampire_env = VampireEnv(prover_binary_path=vampire_binary)
>>> vampire_env.reset()
Traceback (most recent call last):
 ...
ValueError: ('Unexpected response type: ', 'who could expect that?')
__init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')

Initialise a VampireWrapper.

Parameters:
  • max_clauses – maximal number of clauses in proof state

  • render_mode – a mode of running render method

  • prover_binary_path – a path to Vampire binary; by default we expect it to be in the $PATH

close() None

Terminate Vampire process.

on_truncated() None

Terminate Vampire process.

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]

Reset the environment.

Parameters:
  • seed – seed for compatibility

  • options – options for compatibility

Returns:

observations and info

Saturation Environment with iProver back-end

class gym_saturation.envs.iprover_env.IProverEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')

An RL environment around iProver.

Refer to SaturationEnv for more documentation.

>>> from gymnasium.utils.env_checker import check_env
>>> import gymnasium as gym
>>> env = gym.make(
...     "iProver-v0",
...     max_clauses=5
... ).unwrapped
>>> env.relay_server
Traceback (most recent call last):
 ...
ValueError: run ``reset`` first!
>>> check_env(env)
cnf(c_53, ...).
...
cnf(c_49, ...).

when episode truncated, all threads are terminated

>>> _ = env.reset()
>>> _, _, _, truncated, _ = env.step(4)
>>> truncated
True
__init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')

Initialise the environment.

Parameters:
  • max_clauses – maximal number of clauses in proof state

  • render_mode – a mode of running render method

  • prover_binary_path – a path to iProver binary; by default, we assume it to be iproveropt and in the $PATH

close() None

Stop relay server.

on_truncated() None

Terminate threads.

property relay_server: RelayServer

Return the relay server object.

Raises:

ValueError – if called before reset

reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]]

Reset the environment.

Parameters:
  • seed – seed for compatibility

  • options – options for compatibility

Returns:

observations and info