Unobtrusive tracking of interpersonal orienting and distance predicts the subjective quality of social interactions

Interpersonal coordination of behaviour is essential for smooth social interactions. Measures of interpersonal behaviour, however, often rely on subjective evaluations, invasive measurement techniques or gross measures of motion. Here, we constructed an unobtrusive motion tracking system that enables detailed analysis of behaviour at the individual and interpersonal levels, which we validated using wearable sensors. We evaluate dyadic measures of joint orienting and distancing, synchrony and gaze behaviours to summarize data collected during natural conversation and joint action tasks. Our results demonstrate that patterns of proxemic behaviours, rather than more widely used measures of interpersonal synchrony, best predicted the subjective quality of the interactions. Increased distance between participants predicted lower enjoyment, while increased joint orienting towards each other during cooperation correlated with increased effort reported by the participants. Importantly, the interpersonal distance was most informative of the quality of interaction when task demands and experimental control were minimal. These results suggest that interpersonal measures of behaviour gathered during minimally constrained social interactions are particularly sensitive for the subjective quality of social interactions and may be useful for interaction-based phenotyping for further studies.

To validate the performance of the Kinects to more traditional wearable sensors, we compared the face orientation data from the head IMUs and Kinects (Supplementary Figure 1 C and D). Because the reference frame of the IMU angles (magnetic field of the earth) and the Kinects (physical orientation of Kinect #2) are different, we focus here only on the correlation of the angle timecourses rather than direct difference in the angles. The correspondence of the orientations is very high for almost all participants recorded by Kinect #1 (r-values approximately 0.8-0.9). Correlations are reduced for most participants recorded by Kinect #2 (r-values approximately 0.4-0.7) while for a few participants the orientation correspondence was relatively poor (r~0.1-0.3) reducing the overall similarity of the data (r=0.61±0.23 over all participants). The replicability of the acceleration timecourses was lower (r=0.37±0.14) probably due to the short durations of the event-like accelerations compared with the smooth and continuous orientation and location changes, and the differences in sampling rate and sensitivity to small and short accelerations between the sensor types.

Classification of gaze behavior based on face orientation and location
The accuracy of classification of gaze behaviors based on the head location and orientation data depended on the features included in the classifier. As is seen in the confusion matrices ( Supplementary Figure 2 A), based on only the yaw of the face, only eye contact was predicted with a high accuracy while the other classes were confused with eye contact and each other. Similarly, over participants, the prevalence of gaze behaviors in individual subjects was well-predicted only for eye contact from yaw information alone.

Relation of face and body orienting
Both face and body orientation showed significant correlation of effort and joint orienting behavior (Supplementary Figure 3). Faces were oriented more directly toward the communication partner as evidenced by the effect being closer to the origin (bottom-left corner, corresponding to direct joint faceto-face orienting) in the bottom panel than in the top panel. This reduced orienting of the body is particularly evident for Participant #1 of the dyads, presumably due to the way the participants are seated in the room. Generally, body and face orientations were correlated for all participants (Supplementary Figure 4), but the range of body angle was more limited than that of face angles particularly for Participant #1 of each dyad (displayed in green in Supplementary Figure 4).

Supplementary discussion
Limitations Unexpectedly, we saw few differences in the interpersonal synchrony between conditions or as a function of behavioral ratings compared with the effects in the proxemic measures. The experimental conditions, particularly during the gameplay trials, introduce anticorrelations (or delayed synchrony) due to the turn-taking behavior, which could mask some of the differential synchrony effects across dyads. Methodologically, calculating the windowed cross-correlations requires some critical choices to be made, such as the sliding window size within which the cross-correlations are calculated and the maximum time delay at which synchrony is estimated, which might differ between conditions if particular turn-taking structure is enforced by the task. Moreover, the method produces a 2-dimensional matrix of correlation values that needs to be summarized for easier interpretation. Various measures could be used to find a representative value for the synchrony over the trials or the entire experiment. Previously, peak-picking algorithms have been suggested [1] to find the most appropriate synchrony values at a near-zero lag. However, the assumption of near-zero lag could be violated by the gameplay task, where the turn-taking introduces non-zero lags that depend on the pace of the gameplay. We did see (near) zero-lag synchrony in all conditions, but the synchrony was lower during gameplay, where we also saw synchrony peaks at non-zero lags. Because the pace of gameplay could be different between dyads, these additional peaks might be misaligned between participants and this difference in pace of turn-taking might also be of interest in some situations. In the future, it may be of interest to characterize what particular types of (delayed) synchrony and experimental conditions could reveal differences related to subjective evaluations of interactions. However, exploring the most sensitive time delays, time windows and methods for summarizing the data is beyond the scope of the current study.
In the current implementation of the tracking system, the transformations between Kinect sensors were calculated based on one of the participants. This was done due to a lack of landmarks that would be trackable by the Kinects. Additionally, although the Kinects were fixed to stands, there were sometimes small movements to the sensors between successive measurement days, which required a separate transformation matrix to be estimated for each data set separately. While the transformations were similar for all participants, using the same mean transformation for all subjects yielded results that were slightly inferior to the individual ones for the majority of the subjects. In the future, adding reflective landmarks visible to the IR camera of all Kinect sensors could improve the registration results and free the sensor placing.
While the spatial location tracking worked consistently with the two Kinects, the facial orientation tracking proved to be sensitive to errors in some participants with the current spatial layout. On average, both Kinects' facial orientation estimates were correlated with the data from the IMUs, but the results differed in their accuracy: one sensor gave very consistent estimates for 17 of the 18 participants while the second sensor's estimates were generally less accurate. To optimize the accuracy of the system, the distance to the tracked person as well as the orientation of the face in relation to the Kinect should be optimized carefully when using such setups for facial orientation tracking.
In addition to the spatial limitations, while for some dyads the temporal sampling rate stayed relatively constant at the 30 frames per second reported by the manufacturer, for other dyads the sampling rate varied considerably sometimes dropping well below the theoretical maximum. With these technical issues in mind, quality control experiments should be designed any time a new system is installed or changes have been made to an existing installation to maximize the data quality.
Lahnakoski et al. 2020 Predicting the quality of social interactions Royal Society Open Science Supplementary material Finally, the correspondence of the acceleration data between the Kinects and the IMUs was relatively low compared with the orientation correspondence. This is likely caused by the higher sensitivity to short accelerations and the higher sampling rate of the IMUs compared with the Kinects, which has been reported previously [2]. In addition, the IMU data of four participants contained a sustained acceleration lasting for a several minutes apparently caused by a problem in subtracting the acceleration due to gravity by the sensor software (these accelerations were not apparent in the raw acceleration data). To remove these effects, we calculated the correlations in the time window not affected by these obvious artefactual accelerations. Checking for such artefactual accelerations or misestimations of locations in the Kinect as well as IMU data require careful quality control, which should ideally be automated in the future.