Determining kernels in linear viscoelasticity

In this work, we investigate the inverse problem of determining the kernel functions that best describe the mechanical behavior of a complex medium modeled by a general nonlocal viscoelastic wave equation. To this end, we minimize a tracking-type data misfit function under this PDE constraint. We perform the well-posedness analysis of the state and adjoint problems and, using these results, rigorously derive the first-order sensitivities. Numerical experiments in a three-dimensional setting illustrate the method.


Introduction
When elastic waves propagate through complex media such as biological tissues, their amplitude is attenuated according to a frequency-dependent law.In recent years, it has become evident that these attenuation laws are more complicated than initially thought and that nonlocal wave equations are needed to model such behavior.Indeed, elastic wave equations with weakly singular kernels arise in various important medical and industrial applications of wave propagation in complex media.For example, frequency power laws with powers between zero and two are often encountered in shear-wave elastography; see [1][2][3].
In this work, we tackle the inverse problem of determining the kernel functions that best describe the mechanical behavior of a complex medium from over-specified data.To this end, we first derive and analyze a general nonlocal elastic wave model containing three kernels, cf.(2.2) below.By taking the Laplace transform, it can be seen that, in general, not all three kernels can be uniquely determined independently of each other.Therefore, we consider the inverse problem of determining two of these kernels (cf.(2.2) below) from over-specified data and study the PDE-constrained optimization problem resulting from the (optionally regularized) minimization of the data misfit.Using an adjoint-based calculation, we then rigorously derive the first-order sensitivities and develop a numerical kernel recovery method.
The remaining of our exposition is organized as follows.In Section 2, we derive a general nonlocal elastodynamic wave equation and discuss the setup of the inverse problem.Section 3 is concerned with the well-posedness analysis of the state problem.In Section 4, we derive and analyze the corresponding adjoint problem.Section 5 is focused on the estimation of the kernels from additional observations.Finally, in Section 6, we present numerical experiments that illustrate our theoretical results.

Modeling and problem setup
In this section, we provide the setup for the problem and derive the viscoelastic equation from the conservation law and the constitutive relation of stress and strain.
Consider a bounded domain ⊂ R d , d ∈ {1, 2, 3}, with Lipschitz smooth boundary ∂ acted upon by force.Let u = u(x, t) be the displacement field and ρ = ρ(x) the density.The linearized strain due to the deformation is defined by ∇u + ∇u T .
The balance of momentum in the body is given by where σ denotes the stress tensor.In this paper we consider the following three-and two-kernel constitutive relations involving hydrostatic and deviatoric parts of the strain tr ε(u and σ = Cε(u) + k ε * Aε(u t ) + k tr ε * Itr ε(u t ). (2.3) We point to Remark 1 for their relation to existing models in the literature.The operator * denotes the convolution on the positive half-line with respect to the time variable.

The viscoelastic wave model
The model for the wave propagation in complex media considered in this work is derived by coupling the conservation law with a more appropriate constitutive relation, as described above.The two hyperbolic models obtained by coupling (2.1) with two forms of constitutive relations, (2.2) and (2.3), are as follows.
• Model 1: We can rewrite both of these equations as ρz tt − div [Cε(z) + D * ε(u t )] = g, by introducing the auxiliary variable z, tensor D, and the forcing function g: (2.4) where k σ = 0 for Model 2. We note that the scalar counterpart of this model has been considered in the literature with fractional kernels; see, for example, [14].We mention in passing that third-order in time models of viscoelasticity were introduced in [15] and have been recently a topic of extensive research, often referred to as the Moore-Gibson-Thompson viscoelasticity; see, for example, [16,17] and the references therein.
Assumptions on the medium parameters, kernels, and tensors.To allow for heterogeneous viscoelastic materials, we assume that ρ ∈ L ∞ ( ) and that there exists ρ > 0, such that ρ(x) ≥ ρ > 0 a.e.in . (2.5) The fourth-order tensors C and A are assumed to be constant and symmetric, C, A ∈ Sym 4 (R d ), and C is positive definite.
(2.6)Moreover, we assume that the kernel k σ and tensor D according to (2.4) are such that the following non-negativity conditions hold: and H −δ (0, t) is the space of all functions whose extension by zero to R is in H −δ (R).In the scalar case with M ∈ Sym 4 (R d ) positive semidefinite and using the M bilinear form w, v = Mw : v as well as seminorm, condition (2.7) follows from the integrated version of Lemma B.1, provided k σ = k * m with k ≥ 0 and k ≤ 0, using w = m * y, which satisfies w(0) = 0 and assuming w ∈ H 1 (0, T ).In the same setting, assuming strict positivity of M and (Fm)(ω) ≥ γ (1 + ω 2 ) −δ/2 , ω ∈ R, (where denotes the real part and F the Fourier transform), we can conclude from Lemma B.2 that (2.8) holds.
Remark 1.We comment on the relation of (2.2) and (2.3) to other models in the literature.Assuming the material is homogeneous and isotropic, a very general relation between deviatoric and hydrostatic parts of stress (σ d , tr σ ) and strain (ε d (u), tr ε(u)) is given by (2.9b) where μ, λ are the Lamé constants and k σ , k tr σ , k ε , k tr ε are four scalar-valued kernels.
Assuming ε(u)(0) = 0 and using Laplace transform, as well as σ = σ d + 1 d Itr σ , it is not difficult to see that (2.9a) and (2.9b) can be reduced to a form with three kernels non-uniquely.One such reduction is with the kernels (k σ , k ε , k (1)  tr ε ), where k (1) where skipping the superscript (1) we obtain the form (2.2) in case of the special choice (2.10) Using the same technique once again, the constitutive relation (2.2) can be reduced to a form with only two kernels (k which we again denote by k ε , k tr ε and obtain the form (2.3).Existing works consider one of the above three forms (2.2), (2.3), or (2.9), as constituent relation between stress and strain in 3D with either fractional kernels or the finite sum of these fractional kernels.The constitutive relation of the form (2.9) with k σ = k ε , k tr σ = k tr ε is used in, e.g., [18,19].The form (2.2) with k σ = k ε = k tr ε is studied in [20].The two-kernel representation form (2.3) is employed in [21] with fractional kernels.The four-kernel form (2.9) with the kernels being a sum of fractional kernels is considered in [22].

The inverse problem
We next discuss the inverse problem of determining the kernels from the measurements of the displacement u on finite number of points on the surface of the boundary.We assume that the domain boundary ∂ is the disjoint union of D = ∅, where homogeneous Dirichlet conditions will be imposed, and N with Neumann boundary excitation.
We use Tikhonov regularization, (see, e.g., [23][24][25]) for recovering the unknown kernels from over-specified data and end up with a PDE-constrained optimization problem: min k,u∈ X×U J ( k, u), (2.11) where is satisfied in a weak sense, with D and g defined in (2.4).The regularity assumptions on the Neumann data h and initial conditions (u 0 , u 1 ) needed for the state problem to be well-posed will be specified in the upcoming Propositions 3.1 and

3.2.
There exists a large variety of optimization applications in the context of viscoelasticity.For this reason, we here consider a general cost function J .In Section 5 below, we will specify some particular choices that can be made in the context of parameter identification problems arising from the estimation of the kernels from additional measurements.

Notation and theoretical preliminaries
Before proceeding with the analysis, we shortly introduce the function spaces and analytical techniques which will be used in the following sections.
We recall that we have assumed For notational brevity, we introduce the Sobolev space that incorporates homogeneous Dirichlet boundary conditions: We often use x y to denote x ≤ C y, where C > 0 is a generic constant that may depend on the final time.Throughout the paper •, • denotes the duality pairing between (H Auxiliary results.We recall the useful Leibniz integral rule, integration by parts, and the transposition identity: for some Banach space X .The last identity is obtained by changing the order of integration as follows:

Analysis of a nonlocal viscoelastic equation
In this section, we analyze the state problem (2.12) associated with our inverse problem, which can be written in terms of u only as follows: with g defined in (2.4), where the Neumann boundary condition on N results from the traction condition σ • n = h.
The well-posedness analysis follows in spirit of the arguments in [20,19], where the main novelty arises from handling the k σ term, which is not present in these references as a separate kernel, possibly different from the other kernels.In particular, we need to distinguish two cases in our well-posedness analysis based on whether the kernel k σ is singular or not.This condition will influence the regularity of u.Proposition 3.1.Let assumptions (2.5) and (2.6) Then there exists a unique ) . Furthermore, the solution satisfies the estimate Proof.The proof follows by employing a Galerkin discretization in space based on the eigenvectors {φ j } j≥1 , where with the bilinear form cf. [20,Theorem 4.1].The basis can be chosen as orthonormal in the weighted Lebesgue space We seek an approximate solution in the form The semi-discrete problem can then be rewritten as the system for i = 1, . . ., n and t ∈ [0, T ].In vector notation ξ = [ξ 1 . . .ξ n ] T , we have with the right-hand side vector given by where ξ 0 and ξ 1 are the vectors of coordinates of u n (0) and u n t (0), respectively, in the basis.Then since we have assumed that k σ (0) > −1, we can rewrite equation (3.4) as We can then see this problem as a system of Volterra integral equations of the second kind According to existence theory for the Volterra integral equations of the second kind (see, e.g.[26, Ch. 2, Theorem 3.5]), there exists a unique χ ∈ L 1 (0, T ), which solves (3.5).Combined with the imposed initial conditions, we recover a unique ξ ∈ W 2,1 (0, T ) and thus u n ∈ W 2,1 (0, T ; H 1 D ( )).Note that then the combined function z = k σ * u n t + u n due to (2.13) and the identities as well, which we will use in the energy analysis below.For ease of notation, we omit the superscript n in the semi-discrete solution from now on.
Energy analysis.Testing the semi-discrete problem (3.8) given in terms of z = k σ * u t + u: and integrating in space and time leads at first to the following identity: By Hölder's inequality and Young's inequality xy ≤ x 2 + 1 4 y 2 , we have Employing integration by parts with respect to time yields By recalling the definition of z, we further have We can employ our assumption (2.8) on the kernels, which guarantees that .
We can further use assumption (2.7) to conclude that t 0 Thus, from (3.9), by relying on the assumptions on ρ and C, we have the estimate at first in a discrete setting, which can be extended to the continuous setting via standard (weak) compactness arguments.
Applying the L ∞ (0, T ; L 2 ( ) d ) norm to (3.6) with monotonically decreasing kernel k σ and using the estimate , we can obtain an additional estimate on z tt as follows.
From here, using the identity ) and u t (t = 0) = 0, then we can also use p = ∞.
Usual compactness arguments then lead to a solution of the state problem, where the initial conditions u(0) = u 0 and Uniqueness follows by proving that the only solution of the homogeneous problem with zero initial and boundary data, is u = 0. To prove uniqueness, we cannot test the above weak form with z t as in the semi-discrete setting due to its insufficient spatial regularity.Instead, following, e.g., [27,Ch. 7.2], we fix 0 ≤ s ≤ T and take Since v(s) = 0 and (with zero initial data) z t (0) = 0, taking this test function in (3.12) yields where we have also used that Integration by parts with respect to time thus yields 1 2 Due to our assumptions on D, we know that Thus, from (3.13), we conclude that z = 0 and thus Together with u| t=0 = 0, this implies that u = 0.
We next consider the well-posedness of the state problem for a singular k σ .Note that the analysis will lead to the same estimate (3.10) of the combined quantity z, but the resulting estimate for u crucially depends on whether k σ is singular or not.We also point out that even for a singular kernel k σ (t) ∼ t −γ as t → 0, where (k σ * u tt ) t behaves similarly to a Riemann-Liouville derivative of order γ of u tt , no initial condition on u tt is imposed, which is in line with the theory of Riemann-Liouville fractional ODEs; see, e.g., [28,29], where initial conditions are imposed on the fractional derivatives up to one order less than the order of the fractional ODE, which is γ < 1 here resulting in no initial conditions.Since k σ is singular, we can now introduce The fact that ∈ L 1 (0, T ) is due to Tauberian Theorems [30, Theorems 2, 3, Chapter XIII.5] Additionally we will assume Proposition 3.2.Given T > 0, let assumptions (2.5), (2.6) and assumptions (2.7)-(2.8)on the involved kernels hold.Assume that k σ ∈ L 1 (0, T ) is singular; that is, there exists γ ∈ (0, 1], such that k σ (t) ∼ t −γ as t → 0 with additionally (3.16) being satisfied and (3.15) holding for as in (3.14).Furthermore, assume that k ε , k tr ε ∈ L 1 (0, T ).Consider problem (3.1) with the initial conditions Further, assume that the source term and boundary data satisfy the assumptions of Proposition 3.1.Then there exists a unique . The solution satisfies the following estimate: Proof.The proof in the singular case follows via a Galerkin approximation similarly to before.However, to prove the existence of the solution of the semi-discrete problem (3.3), we use the kernel ∈ L 1 (0, T ) as defined in (3.14)We then set as the new unknown.From (3.17), we have with ξ 0 and ξ 1 , being the coordinates of u n (0) and u n t (0) in the basis, respectively.Equation (3.4) can thus be rewritten as the following equation for χ We can see this problem as a system of Volterra integral equations of second kind Similarly to the proof of Proposition 3.1, existence theory for the Volterra integral equations of the second kind yields a unique χ ∈ L 1 (0, T ) and thus z, u ∈ W 2,1 (0, T ; H 1 D ( )).
Energy analysis.We can derive the uniform estimate (3.10) for z as before.In order to conclude additional regularity of u, we use the fact that the auxiliary function v = k σ * u t , which vanishes at t = 0, satisfies, along with its derivatives, the Volterra We then take the L ∞ (0, T ; To extract additional temporal regularity of u from these bounds, we consider k σ * u t = v as an Abel integral equation for u t and specify the assumption k σ ∼ t −γ for t → 0 to (3.16).Then from, e.g., [31, Theorem 1] we can conclude The rest of the arguments follow as in Proposition 3.1.
Remark 2. Note that the use of a Hilbert space argument like [31, Theorem 1] in the last part of the proof of Proposition 3.2 leads to a loss as compared to some possibly most sharp result, and also the assumption of differentiability of k γ σ can probably be relaxed.
We note that higher spatial regularity of the solution can be obtained in case N = ∅; a discussion of this case is included in Appendix A.

The adjoint problem
In order to allow for efficient gradient computation in the iterative solution of the optimization problem for (2.11), (2.12), we next wish to derive the adjoint problem.Recall that we are considering the optimization problem min k,u∈ X×U J ( k, u) such that u solves (3.1), where k = (k σ , k ε , k tr ε ) and U is the solution space for u; cf.Propositions 3.1 and 3.2, where In here, the cost function J is a general Fréchet differentiable mapping from X × U to R. In Section 5 it will be specified for the purpose of parameter identification, but more generally one could think of other criteria to be optimized for, such as certain properties of the state u or the kernels k σ , k ε , and k tr ε .The Lagrange function reads In order to derive first order optimality conditions by setting all derivatives of L to zero, we use a solution space that incorporates not only the homogeneous boundary conditions but also the initial data, by writing u = {u 0 Formally writing the first order optimality conditions at ( k, u, p) = ( k, {u 0 + tu 1 } + ũ, p), the fifth and fourth condition yield the state and adjoint equation, respectively.We will now focus on the second to last condition, that is, the adjoint equation: for all v ∈ Ũ which is supposed to uniquely determine the adjoint state p ∈ P with the test space P ⊆ L 2 (0, T ; L 2 ( ) d ) yet to be determined.Knowing that the adjoint state p will solve a backwards in time PDE with end conditions at T , we make a timeflip right away and write p(t) = p(T − t).This allows us to use the elementary integration by parts and transposition identities (2.14) and (2.15).This method yields Therefore, in case of a singularity in k σ at t = 0, we need to impose p t (0) = 0 and where we have used Furthermore, we define ∇ u J ( k, u) ∈ Ũ * by the variational equation Altogether, (4.1) becomes Since v is arbitrary, this implies that p solves (in the weak sense) the following adjoint problem: This way, we can not only characterize a minimizer of (2.11), (2.12), but also compute the gradient of the reduced cost function , where u( k) solves (2.12) by using the fact that j( k) = L( k, u( k), p( k)).By means of the Chain Rule, we further have Here we have used the fact that u( k) and p( k) solve the state and adjoint equations, respectively.Analogously for Well-posedness of the adjoint problem (4.2) follows from Proposition 3.2 in case of a singular kernel k σ , provided

Estimation of the kernels from additional observations
In this section we turn to the inverse problem of parametrizing the model, that is, of determining the kernels.For this purpose we will need additional observations of the state u.Since these are given in addition to the initial and boundary conditions, they are usually denoted as overspecified data.
By taking Laplace transforms, we have shown in Remark 1 that at least in the isotropic case (2.10) the two-and three kernel formulations are equivalent.Therefore, without loss of generality, concerning parameter identification we will focus on the two-kernel model (formally setting k σ = 0): (5.1) Applying Tikhonov regularization for the inverse problem kernel identification problem mentioned above, we end up with a PDE-constrained optimization problem of the form (2.11) under the constraint (5.1).
After having fixed the PDE constraint in (5.1), we will now define the cost function J in such a way that it takes into account the observations as well as some additional regularization that might be needed in order to solve this infinitedimensional parameter identification problem.Measurements may be taken for multiple excitations, for example a pulling and a shearing experiment in order to distinguish between k ε and k tr ε .To this end, note that the derivation from Section 4 easily extends to the case of multiple PDE constraints (5.1) that arise from considering states resulting from multiple different excitations f n , n = 1, . . .N. The corresponding overposed data are measurements u n,meas i (t), t ∈ (0, T ) of state u n at I given points x n i ∈ (typically at the boundary), i ∈ {1, . . ., I}.In case of limited regularity of u n -note that according to Propositions 3.1, 3.2, we have u n ∈ L ∞ (0, T ; H 1 D ( )) -point evaluations in space cannot be justified.We therefore consider a variant that locally averages over a boundary patch n i ⊆ ∂ concentrated at x n i and possibly weighted with some L ∞ ( n i ) Resulting examples of cost functions (abbreviating k = (k ε , k tr ε )) are then for point measurements or for locally averaged observations, where R , R tr ε are regularization functionals and γ , γ tr ε > 0 are the corresponding regularization parameters.Before specifying these, we point out that we can already now compute the derivatives of J with respect to the state as needed for defining the adjoint state according to (4.2).For J as in (5.2), we have and for J as in (5.3), we obtain where for any Chapter 15] for well-definedness of the trace.
Definition of the regularization terms γ R (k ε ), γ tr ε R tr ε (k tr ε ) ist closely related to the choice of function spaces for the kernels.A canonical choice of the space for the kernels is L p (0, T ), with p > 1 close to one, since this space is still reflexive and also applicable to singular kernels as in Proposition 3.2.This choice allows forward solutions that are regular enough to admit locally averaging measurements according to (5.3) (but not point measurements as in (5.2) -for this we would need u ∈ L 2 (0, T ; H s ( )) with s large enough so that H s ( ) embeds continuously into C ( )).Thus, well-definedness of the forward operator then allows us to apply regularization theory in (reflexive) Banach spaces (see, e.g., [33][34][35] and the references therein) with quite general regularization functionals R , R tr ε .A Hilbert space setting is most convenient for defining a gradient, though.
Thus in place of L p (0, T ), we alternatively consider a weighted L 2 space X = L 2 w (0, T ) with a weight function w that vanishes at t = 0 (at a certain rate as t → 0) if k ε is expected to have a singularity that is stronger than t −1/2 .The simplest example of a regularization term is then just but one might also make more sophisticated choices, such as to promote closeness to a multi-term fractional derivative kernel with as few components as possible.As a matter of fact, in our numerical experiments we did not need any regularization term to be added as the problem appears to be only mildly ill-posed and so the regularization induced by discretization of the kernel was sufficient to deal with even high noise levels in the data.
Starting from the expression (4.3), we can then define the gradient of the reducted cost function by using the inner dt and the variational equation Due to the fact that where the function ∇R (k ε )(t) is determined by the variational equation Note that in case of a strongly singular kernel k ε , also the gradient will contain a singularity via the 1 Likewise, we obtain Remark 3 (On uniqueness).To discuss uniqueness of identification of two kernels from two additional observations, we consider the equivalent reformulation in terms of the deviatoric and hydrostatic parts of the strain and focus on the case of A and k tr ε being scalar multiples of C and y, respectively.An example for this is the isotropic case (5.5) Additionally we assume to have a separable source term so that we arrive at the form (5.6) Moreover, we consider an idealized setting in which we can excite the system via two different eigenpairs (λ 1 , ϕ 1 ), (λ 2 , ϕ 2 ) (equipped with mixed Dirichlet-Neumann boundary conditions) that are selfadjoint and positive with respect to the weighted L 2 inner product (v 1 , v 2 ) = ρv 1 • v 2 dx.We assume the eigenfunctions to be orthogonal to each other as well as normalized with respect to this inner product and take measurements of the displacements resulting from excitations (5.6) with ϕ k , after integration by parts with respect to space and using the eigenvalue equation, as well as orthogonality (ϕ 1 , ϕ 2 ) = 0 yields , and due to linearity we have u (i) (t, x) = u i (t)ϕ i (x).Assuming that T = ∞, we can apply the Laplace transform to obtain from which we further have By injectivity of the Laplace transform, this yields uniqueness of k i , i = {1, 2}.
Of course, this setting is heavily idealized, but can be expected to be achieved -at least approximately -in the isotropic case (5.5) when excitation is done via longitudinal and shear forces, respectively.This is the setup we consider in our numerical experiments.

Numerical simulations
Let us consider a 3-dimensional rectangular viscoelastic beam of the size 1 × 0.1 × 0.04.The beam is clamped on the left end and excited on the right during the time interval [0, t load ] with t load = 0.8 and then released.The final time is T = 4.In our experiments, we consider two types of load: bending in (x 1 , x 2 ) plane and uniaxial extension (x 1 -direction), respectively, B(t) = [0, B 0 t/t load , 0], (6.1) T (t) = [T 0 t/t load , 0, 0], (6.2) where the magnitudes are B 0 = 1 and T 0 = 100.The Young's modulus is 10 3 and the Poisson ratio is 0.3.There is no external volumetric force.The beam is at rest at the initial time.Thus, the dynamical system is given by the balance equation (2.1) and the constitutive law (2.3):We employ the Newmark method (β = 0.25, γ = 0.5) for time-stepping in combination with a sum-of-exponentials kernel representation; see, e.g., [36,37,47].More precisely, we are looking for the kernels k ε , k tr ε in the form with unknown weights w ε,k , w tr ε,k and exponents λ ε,k , λ tr ε,k .For simplicity, we have fixed m ε = m tr ε .Let us denote by θ := {w ε,k , w tr ε,k , λ ε,k , λ tr ε,k } the vector of all learnable parameters.We assume that the measurements for the tip displacement (more precisely, the mean displacement over the right end face) are available at some discrete time points in the interval [0, t meas ] with t meas = T /2.We minimize the objective function with respect to the parameters θ , using the 2 -norm on the discretized interval [0, t meas ].
We use the FEniCS platform [38] for the finite elements implementation of the forward problem and the dolfin-adjoint package [39] for the adjoint problem in combination with PyTorch [40] through the Torch-FEniCS interface [41].The minimization is performed using the LBFGS method [42] with the strong Wolfe line search.The beam is discretized using 60 × 10 × 5 linear tetrahedral elements (Fig. 1).The time interval is discretized uniformly with the step size t = T /100.
Here, we have been assuming that the measurement data are given for all discrete time points of the time discretization scheme in the interval [0, t meas ].Moreover, in our synthetic setting, we assume that for a given θ * , it holds that u meas tip = u tip (θ * ) + η, where η is an additive white Gaussian noise.We consider in our numerical test cases different noise levels ranging from 2% to 8%.First, we consider the case when k ε (t) = k tr ε (t) = k(t).We define the target kernels k(t) = k true (t) as sum-of-exponentials being an accurate (22 modes) approximation of the fractional kernel t α−1 / (α) with α = 0.7.The expansion is obtained by a rational approximation of the Laplace spectrum using the AAA-algorithm [43].Using this kernel, we numerically generate synthetic measurements of the tip displacement on the interval [0, t meas ], using only bending type load (6.1).Then, we predict the kernel k(t) = k pred (t) only from these measurements.To do this, we infer the parameters θ = {w k , λ k } minimizing J (θ).We use a rational approximation (moderate accuracy with 8 modes) of the fractional kernel t α 0 −1 / (α 0 ) with α 0 = 0.5 as initial guess.
In Fig. 2, the resulting tip displacements are plotted on the complete interval [0, T ] for the models calibrated from noisy data with distinct noise levels.We observe that fitting the noisy data on a shorter interval quite accurately results in a solution slightly deviating from the truth if considered on a larger time interval.This can be explained by the fact that the measurement time interval [0, t meas ] is not sufficiently large to fit accurately the tail of the kernel.Indeed, in Fig. 3, where the resulting kernels are compared with the target, we can see that the prediction fits the target at mid-range but is less accurate in the tail.Besides, we measured in L 1 ([ t, t meas ])-norm the error between the target and the predicted kernels: 2% noise -0.032207, 4% noise -0.085839, 6% noise -0.144768, 8% noise -0.159670.
The convergence of the loss function during the optimization process is depicted in Fig. 4.There we can see that for all the noise level cases, the calibration process converges after 8 optimization steps.Besides, the minimum of the loss function grows with the noise level.The evolution of the energy in time is depicted in Fig. 5 for the case of a 2% noise level.We observe that the energy of the calibrated model fits accurately the truth on the measurement interval [0, t meas ].Moreover, the calibrated kernel allows predicting the following evolution.
Next, we consider the case when k true tr ε (t) is given by a 22 modes rational approximation of t α 1 −1 / (α 1 ) with α 1 = 0.9, and k true ε (t) is given by a 22 modes rational approximation of t α 2 −1 / (α 2 ) with α 2 = 0.7.We use a 8 modes rational approximation of t α 0 −1 / (α 0 ) with α 0 = 0.5 as initial guess for both kernels.Using this kernel, we numerically generate synthetic measurements of the tip displacement on the interval [0, t meas ], using both loading types, bending (6.1) and    , where the superscripts bend and ext correspond to the bending and extension excitation types, respectively; the norm is again the 2 -norm on the discretized interval [0, t meas ].Above, the weight ω is introduced to balance the contributions due to the bending and the extension terms.In our experiment, we fix ω = 10 in order to take into account the tip displacement measurements under extension after the time t meas , which have a scale of an order of magnitude smaller than the global scale u ext,meas tip [0,t meas ] (see Fig. 6b).In Fig. 6, the norm of the resulting tip displacements is plotted on the complete interval [0, T ] for the model calibrated from the noisy data.Again, we can see that the calibrated model can predict the evolution after the time t meas .In Fig. 7, where the resulting kernels are compared with the target, we can observe that providing measurements for different loading types results in a good approximation of the integral kernels.

Declaration of competing interest
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Barbara Kaltenbacher reports financial support was provided by Austrian Science Fund FWF under the ), where by using the L ∞ (H 1 ) estimate on u that we obtain from both Propositions 3.1 and 3.2 we further have k ε * Aε(u t ) + k tr ε * Itr ε(u t ) ∈ L ∞ (0, T ; L 2 ( ) d 2 ).

Appendix B. Auxiliary inequalities
Analogously to [44, Lemma 1], one can prove the following lower bound.

Fig. 1 .
Fig. 1. 3D beam mesh.Color denotes the displacement magnitude at t = t load computed using the target kernel.(For interpretation of the colors in the figure(s), the reader is referred to the web version of this article.)

Fig. 2 .
Fig. 2. Evolution of the tip displacement comparing the "true" model and the model calibrated from the noisy data with the noise level 2, 4, 6, 8%.

Fig. 5 .
Fig.5.Evolution of the kinetic and elastic energies for the "true" and calibrated models (noise level 2%).

Fig. 6 .
Fig.6.Evolution of the tip displacement comparing the "true" model and the model calibrated from the noisy data with the noise level 2%, using two loading types: (a) -bending, (b) -extension.extension (6.2), and adding a noise of level 2%.Then, we predict the kernel k pred tr ε (t) and k pred ε (t) from these measurements with the number of modes m tr ε = m ε = 8.To do this, we infer the parameters θ = {w tr ε,k , λ tr ε,k , w ε,k , λ ε,k } minimizing the loss function J (θ) defined as

Fig. 7 .
Fig. 7. Comparison of the resulting kernels k tr ε (t) (left) and k ε (t) (right) in the log-scale.grants P30054 and DOC 78.Ustim Khristenko, Barbara Wohlmuth reports financial support was provided by the Deutsche Forschungsgemeinschaft (DFG) within the project WO 671 11-1.Mabel Lizzy Rajendran reports financial support was provided by Laura Bassi Postdoctoral Fellowship (Technical University of Munich).
d and L p ( ) d .