nesylink

NesyLink Rewards

Rewards are Python objects that convert one environment transition into a scalar reward. Maps do not contain reward values. Tasks select a reward module and may provide default reward settings.

Built-in Rewards

reward_id Module Intended use
custom_reward nesylink.rewards.custom_template starter template
mathematical_logic/task_1 nesylink.rewards.mathematical_logic.task_1 mathematical logic key-door task
mathematical_logic/task_2 nesylink.rewards.mathematical_logic.task_2 mathematical logic monster/key/exit task
mathematical_logic/task_3 nesylink.rewards.mathematical_logic.task_3 mathematical logic multi-room return task
mathematical_logic/task_4 nesylink.rewards.mathematical_logic.task_4 mathematical logic bridge/equipment/guardian task
mathematical_logic/task_5 nesylink.rewards.mathematical_logic.task_5 mathematical logic multi-room exploration task

BaseReward

BaseReward is the unified reward core.

Responsibilities:

Common signals:

BaseReward.compute_reward(...) multiplies each signal by its configured weight, then adds any extra_reward(...) returned by a subclass.

Reward Selection

Use a built-in reward:

from nesylink.env import make_env

env = make_env(map_id="mathematical_logic/task_1", reward_id="mathematical_logic/task_1")

Override weights:

env = make_env(
    map_id="mathematical_logic/task_1",
    reward_id="mathematical_logic/task_1",
    reward_kwargs={
        "step": -0.01,
        "keys_delta": 5.0,
        "door_opened": 3.0,
        "exit_reached": 20.0,
        "death": -10.0,
        "invalid_action": -0.05,
    },
)

Use a custom module:

env = make_env(
    map_id="dungeon",
    reward_module="experiments.rewards.my_reward",
    reward_kwargs={"step": -0.02},
)

Reward Module Contract

Each concrete reward module must expose:

def make_reward(**kwargs):
    ...

Typical custom reward:

from nesylink.rewards.base import BaseReward


class MyReward(BaseReward):
    reward_name = "my_reward"
    reward_weights = {
        "step": -0.01,
        "gold_delta": 1.0,
        "keys_delta": 5.0,
        "exit_reached": 50.0,
        "death": -20.0,
    }


def make_reward(**kwargs):
    return MyReward(**kwargs)

Reward-driven Termination

A reward can terminate an episode by overriding check_termination(...):

class ExitReward(BaseReward):
    reward_name = "exit_reward"
    reward_weights = {"step": -0.01, "exit_reached": 20.0}

    def check_termination(self, signals, obs, info, action=None):
        if signals.get("exit_reached", 0) > 0:
            return True, "exit_reached"
        return False, None

The environment merges base termination, such as death or world completion, with reward-driven termination.

Inspecting Reward Metadata

Every step stores reward metadata in info["reward"]:

obs, reward, terminated, truncated, info = env.step(action)
print(info["reward"]["reward_name"])
print(info["reward"]["reward_signals"])
print(info["reward"]["reward_weights"])
print(info["reward"]["terminated"])
print(info["reward"]["terminated_reason"])

Use this metadata during training. It is the fastest way to confirm whether a learning problem is caused by missing events, weak weights, or an unreachable map objective.