3D body pose detection
The basis of the Advanced Occupant Monitoring System is the real-time detection of body pose in 3D. For all vehicle occupants captured by cameras, the body pose is recognized as a 3D skeletal model using machine learning techniques. The resulting image of the captured occupants includes the position of the eyes and head, neck, shoulders, elbows, wrists, torso, pelvis, and upper and lower legs - as soon as they are visible in the camera image. The recording does not require biometric data and is therefore particularly privacy-friendly.
The system is capable of using individual 3D cameras or several 2D cameras, from whose perspectives the 3D joint points are reconstructed. The cameras can be mounted in any position as long as the view of the respective persons is sufficient.
Gesture recognition
The positions of the eyes, the elbows as well as the wrists also emerge from the determined body pose skeleton. This makes it possible to interpret both the direction of the forearm and the pointing direction of the eye-hand extension as pointing gestures. Both variants are available as a 3D vector. Pointing gestures can therefore be mapped directly and with centimeter accuracy to known objects inside or outside the vehicle. Since pointing gesture detection is based on 3D body pose recognition, left arms can also be distinguished from right arms and pointing gestures can be recognized anywhere in the interior - not just at conventional seating positions or in prescribed interaction areas.
Gesture recognition for interaction with (partially) automated vehicles
It is to be expected that as the vehicle becomes increasingly automated, the driver and other occupants will also be allowed and offered more freedom for secondary activities in the interior. Concept vehicles are already breaking with the classic seating arrangement and extending the driver's range of movement and action, for example, to the entire interior by allowing seats to be turned and moved. Against the background of such freedom of movement, the question arises as to how interaction with services in the interior should be realized. Pointing gestures and voice input are obvious options, but they must offer both the necessary robustness and freedom to allow occupants the room to maneuver in the interior.
Fraunhofer IOSB's Advanced Occupant Monitoring System enables the capture of free-space gestures in 3D from all occupants.
Activity detection in the vehicle interior
In manual driving situations, the driver ideally focuses his full attention on the road. He drives and steers the car and does not engage in any secondary activity. As the level of vehicle automation increases, however, the driver is freed from his driving responsibility. He is given the freedom to pursue secondary activities. Partially automated vehicles must take this fact into account when dealing with situations in which driving responsibility is to be handed back to the driver. The driver may be distracted, asleep or even have a medical emergency.
Fraunhofer IOSB's Advanced Occupant Monitoring System detects the activity of all occupants inside the vehicle. It is able to distinguish between up to 35 activities, including drinking, eating, sleeping, reading, making phone calls, and more. For this purpose, state-of-the-art machine learning processes and fuses the 3D body skeleton recognition of the occupants in combination with object detection and intelligent analysis of the movement behavior of all detected persons. This makes it possible to reliably distinguish whether someone is reaching for a cell phone and making a call or opening a bottle and bringing it to their mouth. The Advanced Occupant Monitoring System thus provides important information on the driver's distracted state. In addition, the system also provides important information on the prevailing situation in the vehicle interior and the context of a person's actions. This makes it possible, for example, to distinguish unintentional from intentional pointing gestures or to offer innovative assistance functions tailored to the individual needs of the occupants.
Intention recognition of the driver and the vehicle occupants
Activity recognition forms the basis for predicting the intention of the driver or vehicle occupants. Because the driver's activity recognition tells us what the driver is doing or with whom, the driver's next actions can be predicted or narrowed down.