HERMES will fulfil two main objectives: on the one hand, the goal will be description, or the generation of conceptual descriptions based on acquired and analysed motion patterns. On the other hand, the aim will be communication using visualization, or the generation of synthetic motion patterns based on textual descriptions.
Firstly, natural language text generation will be accommodated within HERMES based on the following considerations:
-
Semantic descriptions will enable researchers to check details of the conceptual knowledge base.
-
Semantic descriptions will allow communication with end-users of HERMES in a most natural manner.
-
Semantic descriptions will support conceptual abstraction, thereby facilitating the communication of short messages or essential details, possibly in response to inquiries communicated by a microphone near the recording camera or by an UMTS mobile phone, for example for blind people.
Descriptive texts will be applied to outdoor or indoor scenes from different parts of the EU. The inclusion of videos from different parts of Europe will also constitute a mean to prevent overadaptation of HERMES to a small set of learning videos. In addition, once a system-internal conceptual representation has been built, it will be possible to enlarge this for natural language text generation in the languages of all groups cooperating within HERMES. Also, we will test whether the same video recordings are interpreted in different manners in different parts of Europe (or similar situations just happen in a different manner, for example people nicely queuing up at a bus station in one country and habitually cluster around the bus doors in another). Thus, on the one hand, HERMES will achieve automatic translation of visual information and, on the other hand, it will be able to investigate how and why human motion may produce different descriptions, due to the cultural characteristics of the areas where a given language is spoken. Secondly, animation will be accommodated within HERMES based on the following considerations:
-
Analysis-by-synthesis at the three stages of human behaviour, i.e. motion of people, their posture analysis, and their face characterization.
-
Animated computer graphics as a visual language to quickly communicate essential aspects to involved people like bus-drivers, policemen for helping people at pedestrian crossings, waiters in a lobby, etc.
-
Animated computer graphics, again at three motion categories, for checking the conceptual knowledge base underlying the entire approach. Since this knowledge base is expected to grow or need adaptation throughout the project, animated computer graphics will provide the means to quickly check larger parts.
Using both natural language text generation and animation, quantitative measures and qualitative descriptions will be developed to analyze the robustness and the efficiency of the proposed cognitive system. In fact, the performance of the system will be studied by considering the following strategy [Arens and Nagel 2003], [Nagel 2004]: let the system generate a synthetic image sequence using the textual descriptions obtained from a previously recorded image sequence. Both synthetic and original sequences can be compared to evaluate the suitability and correctness of the knowledge being considered so far
|