How infant‐directed actions enhance infants’ attention, learning, and exploration: Evidence from EEG and computational modeling

Abstract When teaching infants new actions, parents tend to modify their movements. Infants prefer these infant‐directed actions (IDAs) over adult‐directed actions and learn well from them. Yet, it remains unclear how parents’ action modulations capture infants’ attention. Typically, making movements larger than usual is thought to draw attention. Recent findings, however, suggest that parents might exploit movement variability to highlight actions. We hypothesized that variability in movement amplitude rather than higher amplitude is capturing infants’ attention during IDAs. Using EEG, we measured 15‐month‐olds’ brain activity while they were observing action demonstrations with normal, high, or variable amplitude movements. Infants’ theta power (4–5 Hz) in fronto‐central channels was compared between conditions. Frontal theta was significantly higher, indicating stronger attentional engagement, in the variable compared to the other conditions. Computational modelling showed that infants’ frontal theta power was predicted best by how surprising each movement was. Thus, surprise induced by variability in movements rather than large movements alone engages infants’ attention during IDAs. Infants with higher theta power for variable movements were more likely to perform actions successfully and to explore objects novel in the context of the given goal. This highlights the brain mechanisms by which IDAs enhance infants’ attention, learning, and exploration.

enhance infants' attention to the demonstrated action (Brand & Shallcross, 2008;Koterba & Iverson, 2009) and facilitate action learning (Schreiner et al., 2020;Williamson & Brand, 2014). Evidence for such attentional enhancement comes from looking time studies with infants and young children. Brand and Shallcross (2008) tested 6-to 8-montholds and 11-to 13-month-olds in a preferential looking paradigm, presenting them with IDAs and ADAs. Their results show that infants of both age groups preferred looking at IDAs compared to ADAs. Similarly, Koterba and Iverson (2009) investigated the looking behavior of 8-to 10-month-old infants when observing their caregiver demonstrate actions on novel toys. In this study, parents were instructed to demonstrate an action with high or low amplitude movements and repeat it or not. Infants looked longer at IDAs if they were performed with high amplitude movements, were repeated or had high amplitude movements which were repeated, compared to non-repeated low amplitude movements.
In addition to enhanced attention during IDAs, several studies have demonstrated that IDAs affect infants' action exploration and learning (Koterba & Iverson, 2009;van Schaik et al., 2020;Williamson & Brand, 2014). For instance, Williamson and Brand (2014) assessed the imitation performance of 2-year-old children; a baseline group who did not see any action demonstrations, an ADA group and an IDA group.
Children who had IDA demonstrations were more likely to perform the demonstrated action compared to the ADA and baseline groups.
While this body of research suggests that IDAs play an influential role in infants' attention to and learning of actions, it is unknown how IDAs capture infants' attention and guide their learning.

How do IDAs draw infants' attention?
One prototypical feature associated with IDAs is the use of larger movements than usual. One possibility is, therefore, that IDAs draw infants' attention because they are larger than normal actions. Studies in which adults were instructed to make large movements when demonstrating an action found attentional and learning effects in infants (Koterba & Iverson, 2009;Williamson & Brand, 2014). However, movement amplitude was manipulated together with other features, like repetition, emotional engagement, and proximity to the infant (Koterba & Iverson, 2009;Williamson & Brand, 2014). Two other findings render it less likely that large movements are the driving force of infants' attention during IDAs. That is, Koterba and Iverson (2009) saw increased looking time to IDAs performed repeatedly with small amplitude compared to a control condition and large movements were found to be dependent on the action at hand rather than universal across actions . While making large movements remains a possible factor underlying infants' attention to IDAs, recent findings suggest another potential driving factor: variability in movement.
In a study by Fukuyama et al. (2015) parents were found to dynamically modulate the variability in their IDAs dependent on their children's behavior. More specifically, the authors measured mothers' movement kinematics and their 11-to 13-month-olds' behavioral per-

RESEARCH HIGHLIGHTS
• 15-month-olds watched actions with normal, high, and variable movement amplitude • Infants show increased frontal theta power when actions contain variable movements • Frontal theta during variable movements relates to infants' subsequent learning and exploration • Computational modelling suggests that surprise, induced by variability, drives theta formance during a dyadic interaction in which mothers demonstrated a cup-nesting action. Mothers increased the variance in their movements after their infant had performed irrelevant actions with the cups in between demonstrations and decreased their movement variance after their infant performed the target actions. This suggests that movement variability might be exploited by parents to direct their infants' attention to the demonstrated action when needed. Performing movements that deviate from the usually small and efficiently performed movements parents make might represent such a natural variation as well. Thus, movement variability and amplitude remain confounded in everyday life. Furthermore, this role of variability is supported by findings from other lines of research, such as statistical learning work which suggests that infants pay more attention to variable, less predictable input (Johnson & Munakata, 2005;Tummeltshammer & Kirkham, 2013). In sum, instead of solely large movement amplitudes, variability in movements could be responsible for capturing infants' attention, thereby influencing learning and exploration in IDAs.

The current study
In this EEG study, we investigated the roles of variability and movement amplitude in enhancing infants' attention to IDAs. We hypothesized that variability in movement attracts infants' attention more than large movements per se. We also investigated the relation between infants' attention to variable movements and their learning.
So far, studies on IDAs have relied on looking time measures as proxy for infants' attention. In the current study, we made use of a neural indicator of infants' attention which allows for a more sensitive measure which does not depend on behavioral changes. More specifically, we examined modulations in frontal theta band oscillations. Theta band activity in frontal brain regions has previously been linked to top-down and bottom-up attention in infants (Begus et al., 2015;Orekhova et al., 2006), young children (Meyer et al., 2019), and adults (Clayton et al., 2015). Frontal theta power has been proposed to signal the need for cognitive control (Cavanagh & Frank, 2014), often elicited by attention capturing events. Frontal theta band modulations are also associated with memory processes (Begus et al., 2015;Jensen & Tesche, 2002) and are thought to reflect infants' learning of new information (Begus & Bonawitz, 2020). For instance, findings on neural processing of infantdirected speech provide evidence for an increase in infants' frontal theta power when listening to infant-directed speech compared to control conditions (Orekhova et al., 2006;Zhang et al., 2011). Thus, frontal theta band power is a promising neural measure to assess infants attentional processing during IDAs.
Our experimental set-up consisted of a demonstration and an exploration phase. We measured 15-month-old infants' brain activity while they observed action demonstrations executed with normal, high, and variable movement amplitudes. After the demonstration phase, infants had the opportunity to perform the actions themselves. To investigate how the different conditions affected infants' attentional processing, we compared infants' theta power (4-5 Hz) in fronto-central midline channels (Fz, FCz, Cz) between conditions, controlling for multiple comparisons. In addition, we computed a linear regression model to examine the link between infants' attention as reflected by their frontal theta power to variable amplitude movements and their exploration and learning behavior. As an exploratory analysis, we further investigated which aspect of the movement variability might be driving any potential effects. Variable movements are both more surprising and more complex. To disentangle whether surprise or complexity better described infants' frontal theta power, we additionally used a computational modelling approach.

Participants
In the final sample 23 infants (10 girls) are included.

Stimuli
For this study, we created avatar stimuli based on motion-tracking recordings of an adult model (see Figure 1). The advantage of using avatars and a virtual environment instead of video recordings was the precise manipulation of movement amplitude in the stimuli. That is, it allowed us to create instances of low and high amplitude movements with human kinematic features that could be combined into different stimulus movies. Using a Qualisys motion-tracking system (a Qualisys Oqus 5 + system with seven infrared cameras for motion-tracking and one video camera) we recorded the movements of an adult model performing three different actions; stacking rings on a peg, building a tower with cups and putting balls in a bucket. For each of these actions, the model reached out to, picked up, and moved the target object (e.g., ring) to the corresponding goal base (e.g., peg). The model performed the same action five times, for instance putting five rings on the peg.
Crucially, once converted to avatar stimuli, we manipulated the amplitude of the goal-directed movements which preceded the final step of each action (e.g., the amplitude of lifting the ring to put it on the peg), resulting in two versions of each action, one with a normal amplitude and one with a high amplitude.
From this manipulation, we made stimulus videos in three different conditions (see Figure 2) with five normal amplitude movements, five high amplitude movements, and five variable amplitude movements for each action type (using balls, rings, cups). We created the variable condition by combining the high and normal amplitude movements into different orders (e.g., a sequence of normal, high, normal, high, high; four pseudo-randomized sequences, counterbalanced across participants).
Each stimulus video with five movements lasted for about 32 s including an initial 2 s still frame phase followed by the five goaldirected movements of one action type. The avatar was shown with a cap that covered the upper half of the face to ensure infants would pay attention to the action rather than the actor's face and gaze. The initial visual scene of each experimental video included the model making eye-contact and sitting at a table with all five target and ten distractor objects as well as the specific goal base in front of her (see Figure 1).
The experimental videos were silent. The objects and bases used in the stimuli were also available for infants to explore in real life at the test session (see Procedure for details).
Additionally, we created two other types of video stimuli, intro videos and peekaboo videos. A short intro videos were made in which the infant was greeted by the actor saying "Hey, baby". The intro video lasted for about 2 s and preceded each of the experimental videos.
Besides this, peekaboo videos were recorded with the actor playing the classic peekaboo game by hiding her face behind her hands for 1-4 s. The peekaboo videos served as attention getters between trials.
Together the intro, the experimental and the peekaboo videos made up blocks of trials as described in more detail in the Procedure section. During the demonstration phase, infants were presented with three actions, each shown in one of the three conditions (normal, high, and variable). Across subjects we counterbalanced which action type (acting on balls, cups, rings) was demonstrated in which condition. Each experimental video (including five goal-directed movements) was presented in four blocks making up a total of 20 goal-directed movements per condition. Each experimental video was preceded by a 1-s fixation cross (white cross on a grey screen) and the intro video to orient the infant to the screen. Each block contained all three experimental videos in pseudo-randomized order (i.e., one normal, one high, and one variable), followed by a peekaboo video. After the last block, all four peekaboo videos were played (with 1-4 s of hiding the actor's face) before F I G U R E 2 Illustration of the three conditions of interest. In the normal amplitude condition (top) the action was demonstrated with normal amplitude movements for five subsequent times. In the high amplitude condition (middle) the action was demonstrated with high amplitude movements for five subsequent times. In the variable amplitude condition (bottom) the order of high and normal amplitude movements was varied during action demonstration. It was counterbalanced between participants which action (acting on rings, balls, pegs) was associated with which condition (normal, high, variable) commencing the exploration phase. All stimuli were presented using Presentation Software (Neurobehavioral Systems, Albany, CA) and the timing of the stimuli was automatically time-locked with the EEG signal.
After the demonstration phase, the EEG recording was stopped, and both experimenters entered the test booth again to present infants with the objects previously shown. In this exploration phase, infants had the opportunity to act on the objects themselves. The exploration phase had two parts, the All Objects part and the Target Objects Only part. First, each of the three goal bases was presented with two target and four distractor objects (All Objects part). For instance, the peg (goal base) was presented with two rings (target objects), two balls and two cups (distractor objects) reachable for the child (see Figure 3, top right). After all three goal bases were presented in the All Objects part (order counterbalanced), each of the three goal bases was again presented one-by-one (order counterbalanced), but this time only with the three corresponding target objects (i.e., Target Objects Only part; see To start each exploration trial (i.e., for both All Object and Target Objects Only parts), E1 first mounted the target base in the middle of the

EEG data analysis
EEG data analysis was conducted using MATLAB (Mathworks, Inc.) and the open-source toolbox FieldTrip (Oostenveld et al., 2011). The During pre-processing, the data were first band-pass filtered between 1 and 30 Hz and the mean signal was subtracted from each time point to account for potential differences in offset. In the variable condition, the first trial did not offer any information about the variability in movement amplitude, and it was thus excluded. Based on video coding of infants' gaze behavior, trials during which infants did not look at the screen were also discarded from further analysis. Then, four rounds of artifact rejection were conducted (blind to condition) and subsequently any missing channels were interpolated.
Target Objects Only.

Total N = 22
Successful Target Performance In the Target Objects Only part, infants had access only to the target objects corresponding to the respective goal base.

Behavioral data analysis
The exploration phase of the experiment was video coded offline using ELAN (ELAN, 2018

Analysis of relation between EEG and behavioral data
To investigate whether there is a link between infants' attentional processing of variable movements in IDAs and their subsequent exploration and learning, we calculated partial correlations. The neural measure (frontal theta power) was correlated separately with each of the behavioral measures, that is with exploration (Object First Touched) in the All Objects Part, and with learning (Successful Target Performance) in the All Objects and Target Objects Only part, while controlling for the other behavioral measures, respectively. All behavioral measures were taken during the time infants could explore the objects together with the goal base matching the variable condition. Due to both the focus of this study being on the variable condition and the limited data spread of frontal theta power at Fz in the high and normal conditions, correlations for these conditions were not included in the main manuscript. However, for completeness, they are available in the Supplementary Materials. Figure 4 illustrates the grand average power across participants at predefined channels Fz, FCz, and Cz as a function of frequency (3-30 Hz) and condition (normal, high, variable). In addition to the inherent 1/f distribution of power, the figure shows a frequency-specific modulation of condition in the theta band (4-5 Hz). That is, power in the theta band is higher for the variable compared to both the normal and high amplitude movement conditions. This difference appears to decrease from frontal to central sites. Figure 5 shows Note that we used a Bonferroni-corrected alpha level of 0.016 to correct for multiple comparisons. Figure 6 shows the topographic distribution of the theta power difference across the scalp between the variable and normal as well as the variable and high conditions. In line with indications from the three predefined channels, the plot suggests a frontal distribution of the effect.

EEG results
Together, the results provide evidence for an effect in the theta frequency range between conditions such that IDAs with variable movements elicit more frontal theta power than IDAs with normal or high amplitude movements.

Results on the relation between EEG and behavioral data
We examined the relation between neural measures of attention during the variable condition (theta power) and behavioral measures using partial correlations. The results show a relation between higher theta F I G U R E 4 Power as a function of frequency is plotted per condition (normal, high, variable) for electrodes Fz, FCz, and Cz. Mean power values (solid lines) ± 1 SE are displayed in shaded areas for all three electrodes power and successfully performing the target actions in the Target Objects Only part at least once, r(18) = -0.50, p = .025. Moreover, we found that higher theta power was related with touching those objects first that were novel in the context of the given goal in the All Objects part, r(18) = -0.49, p = .028. There was no evidence for a significant relation between theta power and successful target performance in the All Objects part, r(18) = −0.36, p = .12.

INTERIM DISCUSSION I
We found that frontal theta was significantly higher, indicating  ., 2015).

F I G U R E 6
Topographic maps showing the difference in power of the theta frequency range (4-5 Hz) between variable versus normal (left), variable versus high (middle), and high versus normal (right). Warm colors represent more power in theta for the variable (left and middle) and high (right) conditions, respectively We further examined whether theta power in the variable condition was related to infants' learning (indexed by correctly performing the target action) and exploration (indexed by novel object-goal associations). The results show that infants with higher theta power were more likely to successfully perform the target actions and to explore objects that were novel in the context of the given goal. One might have expected that infants with higher theta power first touch the objects corresponding to the goal base rather than objects novel in the context of that goal. Still, the findings fit well with the learning progress framework (Oudeyer et al., 2016), which posits that infants engage in an activity as long as they can still learn from it, but switch to new activities when they cannot learn from the activity anymore (see also Poli et al., 2020). For infants showing higher theta power the input might have fulfilled their learning, which in turn led infants to invest their cognitive resources on exploring novel object-goal associations. Importantly, all toys (balls, cups, rings) were presented equally often and whether object-goal associations were shown in variable, high or normal movement demonstrations was counterbalanced across participants. Therefore, this effect cannot be attributed to differences in object or goal saliency per se. In sum, the current exploration effect linking frontal theta power and infants' first object exploration might reflect habituation processes during the demonstration phase. shown that when talking to their infants, caregivers use more surprising, less predictable intonation in their speech than when talking to another adult (Räsänen et al., 2018). Given these previous findings, one might expect surprise in the variable IDAs to elicit enhanced attention in infants as indicated by frontal theta increase.

What is so special about variability?
From an information-theoretic perspective, both the complexity as well as the surprise level are higher in the variable condition than in the other two conditions. Therefore, whether infants are attracted by the complexity or surprise of these stimuli is an open question. Although complexity and surprise both affect the predictability of the stimulus, they make different assumptions on how infants process IDAs. To disentangle these two alternatives, we used a computational modelling approach in a post-hoc analysis to examine whether complexity or surprise in the current stimuli were predictive of infants' attentional processing as reflected in frontal theta power.

COMPUTATIONAL MODELLING INVESTIGATING THE ROLE OF COMPLEXITY AND SURPRISE FOR INFANTS' ATTENTION TO IDAS
We computed the amount of complexity and surprise of each stimulus infants saw, excluding all trials on which infants looked away. As a proxy for complexity, we adopted the measure of local redundancy (Jamieson & Mewhort, 2005) as modified and used by Addyman and Mareschal (2013): (1)

F I G U R E 7 Surprise (left) and
Local Redundancy (right) as a function of trial number in example sequences of the normal, high, and variable condition. The variable condition had several counterbalancing orders. The values were computed considering the conditions as independent of each other. Trials of each condition were treated as consecutive events in a sequence of goal-directed movements making the assumption that infants dissociate the three different actions despite their presentation in separate blocks of videos where k normal and k high indicate how many times normal amplitude and high amplitude trials have occurred, and N indicates the maximum number of trials to be considered at each timepoint (i.e., the movingwindow size). Following Addyman and Mareschal (2013), we set N = 6.
An example of how local redundancy scores change over trials is illustrated in Figure 7. High values of Local Redundancy reflect low complexity in the stimulus. Local Redundancy is higher for normal and high amplitude conditions compared to the variable amplitude condition, thus capturing the difference in complexity between conditions.
As second measure of interest, we computed conditional surprise at each trial as the amount of Shannon Information, I: where x is the type of movement that occurred in the current trial (high or low amplitude), X consists of all the past actions that have been observed, and α captures the prior expectations at the start of the task.
More specifically, the prior expectation of the model at the start is that either a high or a low movement amplitude can occur with equal probability. This can be described in the prior Dirichlet distribution P(p|α) where all elements of alpha are one, that is, α = [1, 1]. This alpha is thus unbiased and expectations can be quickly updated with incoming observations. The past history of all actions X is defined by the number of observations for each event. In other words, how many high and how many low movement amplitudes were observed until the current trial.
The likelihood of a high or low amplitude movement to occur at a certain trial is defined by the total number of times this type of movement has occurred in the past in addition to the value of α for that movement type (i.e., 1), divided by the total number of observations up until this point, plus two (i.e., the sum of values of α). For example, this could be used to determine the likelihood that infants will see a high movement amplitude at trial 5 when previously they have seen high, low, low, high, movements (i.e., trial 1 = high, trial 2 = low, trial 3 = low, trial 4 = high, current trial 5 = high). Then, the likelihood of trial 5 being a high movement amplitude is given by the total number of times this type of move-ment has occurred in the past (i.e., 2), plus alpha for this movement type  Our modeling work allows us to disentangle which of these factors is related to the variation in infants' theta power. Moreover, it is important to stress that redundancy and surprise measures make very different assumptions on how infants process IDAs. Redundancy is based on stimulus frequency, and thus it assumes that infants keep track of how often a stimulus is presented. Instead, surprise is based on probabilities, which implies that infants make probabilistic models of the world and are surprised when such models are violated.
To test whether we replicate previous research showing conditionindependent increase of theta power over time (Braithwaite et al., 2020), in our analysis we also used the trial number as proxy for time.
Importantly, Trial Number did not vary as a function of condition. Then, trial-based theta power at channel Fz was extracted per participant for each trial and standardized using z-transformation. Using a Cullen and Frey (1999) to examine our data distribution showed that a lognormal distribution fit the normalized theta power values best. Accordingly, we used generalized linear models (GLMs) to fit our data. The GLMs were fitted using the gamlss package (Stasinopoulos & Rigby, 2007) in R. We ran models using z-transformed values of Local Redundancy, Surprise and Trial Number to estimate their relation to normalized frontal theta power.
The results are represented in Table 2

INTERIM DISCUSSION II
The findings of our post-hoc computational modeling analysis replicate previous findings of increasing frontal theta power over time (Braithwaite et al., 2020) and show that surprise more so than complexity induced by the variability in the action demonstrations is linked to infants' higher theta power. The role of surprise in IDAs and infants' attentional processing is consistent with Event Segmentation Theory (Kurby & Zacks, 2008;Zacks et al., 2007), which explains how we extract meaningful units from action streams. This theory states that, as observers, we use event models to constantly predict what happens next. When a stimulus is perceived as surprising, it leads to updating of the model in order to improve future predictability, also a core idea of the predictive-processing framework for infant learning (Köster et al., 2020). The temporary increase in processing at surprising moments thereby leads to more robust encoding and memory formation. While Event Segmentation Theory is focused on predictability troughs inherently present at action boundaries, parents artificially introduce predictability troughs in IDAs when interacting with their infants. This may suggest a similar underlying principle for both highlighting action goals in IDAs and segmenting events from a continuous action sequence (Baldwin et al., 2001). In other words, variable and thereby less predictable movements increase infants' attentional processing such that their attention level is high at the moment the goal of the action is demonstrated.
Additionally, the current findings are consistent with evidence showing that task-unrelated, unexpected stimuli increase children's arousal, which, in turn, leads to an improvement of task performance (Pozuelos et al., 2014). Increased movement variability in IDAs can also be interpreted as unrelated (or at most only marginally related) to the achievement of a goal, such as stacking rings on a peg. This additional layer of information tangential to the actual goal may increase the level of perceptual surprise, thereby increasing infants' arousal. Heightened attention might in turn lead to more efficient learning. Interestingly, the beneficial effect of task-unrelated stimuli for task performance disappears in late childhood (Pozuelos et al., 2014) and even reverses in adulthood, as the presence of task-unrelated stimuli hinders adults' performance on a given task (Wetzel et al., 2012). Hence, surprising stimuli might aid infants' learning in a unique way.

General discussion
Infants' attention is drawn to actions that are demonstrated in an infant-directed way to them (Brand & Shallcross, 2008). Such IDAs also seem to benefit infants' action learning (Williamson & Brand, 2014

Neural processes: Open questions and future directions
Based on our modeling findings, we might assume that surprise in movement variability elicits frontal theta power synchronization. This synchronization, in turn, leads to higher power during the subsequent goal attainment. The question arises whether the attentional benefit gained from the induced surprise has a temporal limitation and whether this limitation is based on the rhythm of theta and the size of the surprise. For example, would the time-window during which theta power is increased last longer if the surprise value was higher? Previous findings with adults suggest that the surprise systematically affects the magnitude of the theta increase (see e.g., Mas-Herrero & Marco-Pallarés, 2014). Also, one might speculate that the frequency range of the oscillations determines the duration needed to return to a baseline level activity. These constraints in turn will have implications for teaching, both in terms of the degree of variability used and the temporal vicinity between the use of variability and the relevant aspect of the action. Future investigations are needed to address this question by systematically varying surprise magnitude and temporal arrangement between the (goal-unrelated) variability and the following (goalrelated) action unit.
Besides this, little is known about potential maturational effects of the medial prefrontal cortex (mPFC), which is the proposed neural generator of frontal theta oscillations (Ishii et al., 1999). Indeed, mPFC significantly develops throughout early childhood (Casey et al., 2000).
Questions include, for instance, whether structural maturation of this brain area underlies changes in processing of surprise and whether there are beneficial effects of variability for task performance. Interestingly, despite the maturational changes of mPFC in the first years of life, research suggests that this area is already crucially involved in social-cognitive abilities from early on in infancy (Grossmann, 2013).
Thus, it remains up to future research to unravel whether this might suggest any functional changes in processing surprise across development. Interdisciplinary research combining strengths of cognitive neuroscience techniques like fMRI and EEG as well as longitudinal developmental research is needed to address these questions.

Implications for teaching novel actions
Assuming that surprise is guiding infants' attention and driving their learning during IDAs several implications arise. By introducing variability (and thus surprise) in the movement that precedes the goal of the action, parents can utilize the attentional and mnemonic effects of surprise to teach their infant new actions more effectively. This might also explain the prominent use of repetitions in IDAs. Repetitions are omnipresent in parental IDAs (Brand et al., 2002;van Schaik et al., 2020). They may serve two purposes at once, (1) showing the consistency of the goal of the action while, and (2) allowing for variability across repetitions in the goal-unrelated movement of the IDAs.
Another important consideration regards individual differences in surprise levels, since the level of surprise is based on the prior probability distribution. In other words, what is surprising to an infant may largely depend on the infant's prior experience. For instance, when an infant observes the same actions frequently, like lifting a cup in the same way over and over again, any deviation from that action will elicit surprise. In contrast, an infant who has hardly ever seen a particular movement or has had experience with variable movements, will react to the same action with a different level of surprise. When trying to optimize teaching of a new action to an infant it might therefore be crucial to individually adapt the degree of variability to induce surprise and thus increase attention appropriately. Parents, who typically know a large extent of their infants' prior experiences, have a good prerequisite to estimate their infant's surprise level to a particular situation. In which aspect of their behavior caregivers introduce variability is another interesting point to consider. In other words, how specific are the current findings to movement amplitude? For instance, in the current design, as in everyday life, variability in movement amplitude covaries with variability in movement duration. This is not surprising, as a larger movement typically takes longer to perform. Change in movement amplitude has been highlighted as one of the characteristic features of IDAs, but also other kinematic features of IDAs have been identified (see e.g. van Schaik et al., 2020). We may speculate that it is the surprise induced by variability in input from the social partner more generally that is driving infants' attention, rather than this effect being limited to a specific behavior (e.g., movement amplitude). Yet, this remains an open question for now.
Whereas during parents' action demonstrations, certain parts of the actions are performed with variations, it is noteworthy that in a different context, dyadic interactions with infants benefit from the opposite, namely, predictability. That is, success in joint action coordination is achieved by making one's actions temporally predictable (Vesper et al., 2011). This research by Vesper et al. (2011) with adults suggests that reducing one's movement variability is used as strategy for successful coordination with another person. Hence, when trying to coordinate with an infant, predictability in action performance is likely a more promising approach than introducing more variability.

CONCLUSIONS
Together, our findings show that the use of variability in movements induces surprise which in turn serves as attentional tool during IDAs and which may have beneficial downstream effects for infants' action learning. As such, these findings advance our understanding of how IDAs capture infants' attention and how they affect infants' subsequent exploration and action learning. Making use of variability to elicit surprise might thus be a promising teaching tool to increase attention and foster memory formation.