Environments¶
Saturation Environment¶
- class gym_saturation.envs.saturation_env.SaturationEnv(max_clauses: int = 1000, render_mode: str = 'human')¶
Saturation algorithm in a reinforcement learning friendly way.
It’s an abstract class, so here we have only trivial smoke tests. One should override
_do_deductions
method in children classes.Refer to SaturationEnv for more documentation.
>>> class DummyProver(SaturationEnv): ... def _do_deductions(action): ... pass >>> env = DummyProver(render_mode="rgb_array") Traceback (most recent call last): ... ValueError: Expected a render mode among ['ansi', 'human'] but got rgb_a...
>>> env = DummyProver() >>> env.render_mode = "rgb_array" >>> env.render() Traceback (most recent call last): ... NotImplementedError
- __init__(max_clauses: int = 1000, render_mode: str = 'human')¶
Initialise spaces et al.
- Parameters:
max_clauses – maximal number of clauses to store in proof state
render_mode – a mode of running
render
method
- get_task() str ¶
Get the task that the agent is performing in the current environment.
- Returns:
a TPTP problem filename
- Raises:
ValueError – is task is not set
- on_truncated() None ¶
Prover-specific episode truncation.
- render() None ¶
Return or print the proof state.
- Returns:
proof state (TPTP formatted) or nothing (depending on
render_mode
)
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]] ¶
Reset the environment.
- Parameters:
seed – seed for compatibility
options – options for compatibility
- Returns:
observations and info
- set_task(task: str) None ¶
Set the specified task to the current environment.
- Parameters:
task – a TPTP problem filename
- step(action: Any) Tuple[Tuple[Dict[str, Any], ...], float, bool, bool, Dict[str, Any]] ¶
Run one time-step of the environment’s dynamics.
When end of episode is reached, you are responsible for calling
reset()
to reset this environment’s state. Accepts an action and returns a tuple (observation, reward, terminated, truncated, info)- Parameters:
action – an action provided by the agent
- Returns:
a tuple of four values:
observation: agent’s observation of the current environment
reward: amount of reward returned after previous action
terminated: Whether the proof was found
truncated: Whether the maximal number of clauses in the proof state were reached
info: contains auxiliary diagnostic information (helpful for debugging, and sometimes learning)
Saturation Environment with Vampire back-end¶
- class gym_saturation.envs.vampire_env.VampireEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')¶
An RL environment wrapper around Vampire prover.
Refer to SaturationEnv for more documentation.
We can run a full Gymnasium environment check:
>>> from gymnasium.utils.env_checker import check_env >>> import gymnasium as gym >>> env = gym.make("Vampire-v0").unwrapped >>> check_env(env) cnf(1, ...). ... cnf(5, ...).
repeating actions change nothing
>>> env = gym.make("Vampire-v0", max_clauses=5) >>> _ = env.reset() >>> one = env.step(0) >>> two = env.step(0) >>> one == two True
episode is truncated if we have more than
max_clauses
clauses>>> _, _, _, truncated, _ = env.step(1) >>> _, _, _, truncated, _ = env.step(2) >>> truncated True
sometimes Vampire can solve a problem during pre-processing
>>> from gym_saturation.constants import MOCK_TPTP_PROBLEM >>> trivial_problem = os.path.join(os.path.dirname(MOCK_TPTP_PROBLEM), ... "TST002-1.p") >>> env.unwrapped.set_task(trivial_problem) >>> _, _ = env.reset() >>> env.unwrapped.state.terminated True >>> _, _, terminated, _, _ = env.step(0) >>> terminated True
a test of an unexpected reply from Vampire
>>> from gym_saturation.constants import MOCK_TPTP_FOLDER >>> vampire_binary = os.path.join(MOCK_TPTP_FOLDER, "..", "vampire-mock") >>> vampire_env = VampireEnv(prover_binary_path=vampire_binary) >>> vampire_env.reset() Traceback (most recent call last): ... ValueError: ('Unexpected response type: ', 'who could expect that?')
- __init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'vampire')¶
Initialise a VampireWrapper.
- Parameters:
max_clauses – maximal number of clauses in proof state
render_mode – a mode of running
render
methodprover_binary_path – a path to Vampire binary; by default we expect it to be in the $PATH
- close() None ¶
Terminate Vampire process.
- on_truncated() None ¶
Terminate Vampire process.
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]] ¶
Reset the environment.
- Parameters:
seed – seed for compatibility
options – options for compatibility
- Returns:
observations and info
Saturation Environment with iProver back-end¶
- class gym_saturation.envs.iprover_env.IProverEnv(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')¶
An RL environment around iProver.
Refer to SaturationEnv for more documentation.
>>> from gymnasium.utils.env_checker import check_env >>> import gymnasium as gym >>> env = gym.make( ... "iProver-v0", ... max_clauses=5 ... ).unwrapped >>> env.relay_server Traceback (most recent call last): ... ValueError: run ``reset`` first! >>> check_env(env) cnf(c_53, ...). ... cnf(c_49, ...).
when episode truncated, all threads are terminated
>>> _ = env.reset() >>> _, _, _, truncated, _ = env.step(4) >>> truncated True
- __init__(max_clauses: int = 1000, render_mode: str = 'human', prover_binary_path: str = 'iproveropt')¶
Initialise the environment.
- Parameters:
max_clauses – maximal number of clauses in proof state
render_mode – a mode of running
render
methodprover_binary_path – a path to iProver binary; by default, we assume it to be
iproveropt
and in the $PATH
- close() None ¶
Stop relay server.
- on_truncated() None ¶
Terminate threads.
- property relay_server: RelayServer¶
Return the relay server object.
- Raises:
ValueError – if called before
reset
- reset(*, seed: int | None = None, options: Dict[str, Any] | None = None) Tuple[Tuple[Dict[str, Any], ...], Dict[str, Any]] ¶
Reset the environment.
- Parameters:
seed – seed for compatibility
options – options for compatibility
- Returns:
observations and info