Rewards are Python objects that convert one environment transition into a scalar reward. Maps do not contain reward values. Tasks select a reward module and may provide default reward settings.
| reward_id | Module | Intended use |
|---|---|---|
custom_reward |
nesylink.rewards.custom_template |
starter template |
mathematical_logic/task_1 |
nesylink.rewards.mathematical_logic.task_1 |
mathematical logic key-door task |
mathematical_logic/task_2 |
nesylink.rewards.mathematical_logic.task_2 |
mathematical logic monster/key/exit task |
mathematical_logic/task_3 |
nesylink.rewards.mathematical_logic.task_3 |
mathematical logic multi-room return task |
mathematical_logic/task_4 |
nesylink.rewards.mathematical_logic.task_4 |
mathematical logic bridge/equipment/guardian task |
mathematical_logic/task_5 |
nesylink.rewards.mathematical_logic.task_5 |
mathematical logic multi-room exploration task |
BaseReward is the unified reward core.
Responsibilities:
prev_obs / prev_infoprev_obs/obs/prev_info/info/actionreward_weightsextra_reward()check_termination()Common signals:
stephp_deltahp_lossgold_deltakeys_deltamonster_hitmonster_killkey_collectedgold_collecteditem_collectedagent_healedagent_damagedtrap_triggeredabyss_fallshield_blockdoor_openedchest_openedchest_revealedbutton_pressedswitch_activatedbridge_rotateddynamic_object_state_changedtalked_npcroom_changedexit_reachedenvironment_completedworld_completeddeathinvalid_actionplayer_tile_changedmonster_hp_totalactive_monstersBaseReward.compute_reward(...) multiplies each signal by its configured
weight, then adds any extra_reward(...) returned by a subclass.
Use a built-in reward:
from nesylink.env import make_env
env = make_env(map_id="mathematical_logic/task_1", reward_id="mathematical_logic/task_1")
Override weights:
env = make_env(
map_id="mathematical_logic/task_1",
reward_id="mathematical_logic/task_1",
reward_kwargs={
"step": -0.01,
"keys_delta": 5.0,
"door_opened": 3.0,
"exit_reached": 20.0,
"death": -10.0,
"invalid_action": -0.05,
},
)
Use a custom module:
env = make_env(
map_id="dungeon",
reward_module="experiments.rewards.my_reward",
reward_kwargs={"step": -0.02},
)
Each concrete reward module must expose:
def make_reward(**kwargs):
...
Typical custom reward:
from nesylink.rewards.base import BaseReward
class MyReward(BaseReward):
reward_name = "my_reward"
reward_weights = {
"step": -0.01,
"gold_delta": 1.0,
"keys_delta": 5.0,
"exit_reached": 50.0,
"death": -20.0,
}
def make_reward(**kwargs):
return MyReward(**kwargs)
A reward can terminate an episode by overriding check_termination(...):
class ExitReward(BaseReward):
reward_name = "exit_reward"
reward_weights = {"step": -0.01, "exit_reached": 20.0}
def check_termination(self, signals, obs, info, action=None):
if signals.get("exit_reached", 0) > 0:
return True, "exit_reached"
return False, None
The environment merges base termination, such as death or world completion, with reward-driven termination.
Every step stores reward metadata in info["reward"]:
obs, reward, terminated, truncated, info = env.step(action)
print(info["reward"]["reward_name"])
print(info["reward"]["reward_signals"])
print(info["reward"]["reward_weights"])
print(info["reward"]["terminated"])
print(info["reward"]["terminated_reason"])
Use this metadata during training. It is the fastest way to confirm whether a learning problem is caused by missing events, weak weights, or an unreachable map objective.