Random and age agents for Vampire and iProver

Random agent for Vampire

To make a gym-saturation environment, we have to import the package

import gymnasium as gym

import gym_saturation

then we can make a prover environment as any other Gymnasium one

env = gym.make("Vampire-v0")

before using the environment, we should reset it

observation, info = env.reset()

gym-saturation environments don’t return any info

print(info)
{}

observation is a tuple of JSON representations of logic clauses

from pprint import pprint

pprint(observation)
({'birth_step': 0,
  'inference_parents': (),
  'inference_rule': 'input',
  'label': '1',
  'literals': 'mult(X0,mult(X1,X2)) = mult(mult(X0,X1),X2)',
  'role': 'lemma'},
 {'birth_step': 0,
  'inference_parents': (),
  'inference_rule': 'input',
  'label': '2',
  'literals': 'mult(e,X0) = X0',
  'role': 'lemma'},
 {'birth_step': 0,
  'inference_parents': (),
  'inference_rule': 'input',
  'label': '3',
  'literals': 'e = mult(inv(X0),X0)',
  'role': 'lemma'},
 {'birth_step': 0,
  'inference_parents': (),
  'inference_rule': 'input',
  'label': '4',
  'literals': 'a = mult(a,a)',
  'role': 'lemma'},
 {'birth_step': 0,
  'inference_parents': (),
  'inference_rule': 'input',
  'label': '5',
  'literals': 'e != a',
  'role': 'lemma'})

We can render the environment state in the TPTP format. By default, we are trying to prove a basic group theory lemma: every idempotent element equals the identity

env.render()
cnf(1, lemma, mult(X0,mult(X1,X2)) = mult(mult(X0,X1),X2), inference(input, [], [])).
cnf(2, lemma, mult(e,X0) = X0, inference(input, [], [])).
cnf(3, lemma, e = mult(inv(X0),X0), inference(input, [], [])).
cnf(4, lemma, a = mult(a,a), inference(input, [], [])).
cnf(5, lemma, e != a, inference(input, [], [])).

here is an example of an episode during which we play random actions. We set the random seed for reproducibility.

env.action_space.seed(0)
terminated, truncated = False, False
while not (terminated or truncated):
    action = env.action_space.sample()
    observation, reward, terminated, truncated, info = env.step(action)
env.close()

the episode terminated with positive reward

print(terminated, truncated, reward)
True False 1.0

It means we arrived at a contradiction ($false) which proves the lemma. Notice the birth_step number of a contradiction, it shows how many steps we did to find proof.

pprint(observation[-1])
{'birth_step': 1077,
 'inference_parents': ('17', '5'),
 'inference_rule': 'subsumption_resolution',
 'label': '18',
 'literals': '$false',
 'role': 'lemma'}

Age agent for iProver

We initialise iProver-based environment in the same way

env = gym.make("iProver-v0")

Instead of a random agent, let’s use Age agent which selects actions in the order they appear

observation, info = env.reset()
terminated, truncated = False, False
action = 0
while not (terminated or truncated):
    observation, reward, terminated, truncated, info = env.step(action)
    action += 1
env.close()
Loop <_UnixSelectorEventLoop running=False closed=True debug=False> that handles pid 7094 is closed

We still arrive at contradiction but it takes a different number of steps

print(terminated, truncated, reward)
pprint(observation[-1])
True False 1.0
{'birth_step': 1,
 'inference_parents': ('c_85', 'c_53'),
 'inference_rule': 'forward_subsumption_resolution',
 'label': 'c_86',
 'literals': '$false',
 'role': 'lemma'}

Total running time of the script: (0 minutes 0.825 seconds)

Gallery generated by Sphinx-Gallery