CoSy logo Cognitive Systems for Cognitive Assistants

The PlayMate: An Object Manipulation Scenario


Year 1

The emphasis for the first year was on integrating spatial understanding from vision with that from language. This required building systems that could maintain several models of the world, and to exchange information between them. We were able to demonstrate a system that could couple the learning of the names of objects with the learning of their appearance, and could then answer questions about the relative locations and names of these objects. The system allows the human speaker to move objects in and out of the scene in real time.

Year 2

During the second year we worked on integrated vision and dialogue together with planning, and learning of ontologies. This required a new kind of software architecture that maintains a distributed set of representations of the scene, the intentions of the actors, and of its general knowledge. In addition to all the functionality of the first year system the resulting system is capable of learning about the meanings of colour words, and descriptions of shape, and size. It is also able to follow instructions to move objects. These can specified in terms of spatial relations that are quite natural for humans, for example "Put the blue thing to the left of the red box".

Year 3

During the third year we worked on integrating manipulation with continual planning. This means that the robot can replan manipulations to achieve qualitative states --- put the red thing to the left of the blue thing --- even when the human interferes with its activities. If the world turns out differently than expected it replans on the fly. The third year also allowed integration of incremental processing of utterances with information from other modalities via binding. Finally we used the PlayMate scenario to investigate trade-offs in the space of architectures.

Year 4

During Year 4 we built on the planning and binding work from Year 3 to produce a robotic control architecture based on a novel fusion of these technologies. Using cross-modal binding as an information exchange substrate we gave the MAPSIM continual planner the role of a process mediator able to trigger actions across the whole architecture. The planner was fed by goals produced by a motivation subsystem that could convert from cross-modal intentional content into MAPL (the language of MAPSIM). Using this framework the control flow in the PlayMate became a lot simpler and more principled, and the abstract interface provided by the planner in CAST allowed the system to become a lot more modular (and thus more easily ported to the Explorer). With this approach we integrated an action recognition subsystem into the PlayMate demonstrator which allowed us to both recognise and play (using the manipulation work from Year 3) simple games with coloured shapes. We also built upon previous work on visual learning to provide a cleaner interface for implicit and explicit property learning, whilst using this as a platform for further exploring ideas about clarification (which we initially started in Year 1). Within the planner itself, we started to treat separate CAST subarchitectures as separate agents in a multi-agent planning problem, allowing us to broaden the scope of the internal processing behaviours that we could reason about.


Last modified: 8.1.2009 15:55:58