Séminaire de Philosphie et IA
Nous aurons le plaisir d'écouter : Mehdi Khamassi ( ISIR, CNRS)
Titre: Improving AI systems' alignment with human values through hierarchical motivational reinforcement learning
Résumé: I'll first present our new proposal to distinguish between strong and weak value alignment of AI systems with human values. To illustrate this distinction, we proposed a series of prompts showing ChatGPT's, Gemini's and Copilot's failures to recognize some situations where human values like dignity, privacy or well-being are at stake.. I'll then briefly introduce a novel extension of the RL framework that we proposed, called the 'Purpose' framework, based on a three-level motivational system (operational level, motivational level, purpose level) for open-ended learning agents. Extending the motivational reinforcement learning formalism, it is intended to relate the purpose level (rules/conventions/norms at the societal level, but also missions required by humans from artificial agents), to the motivational level (so as to modulate the agents' homeostatic, epistemic, social and mission drives), which in turn determine the multidimensional reward function that will be used by the RL agents at the operational level.