## Introduction

The Competence Centre on Modelling (CC-MOD) of the Joint Research Centre (Directorate General of the European Commission) has been developing the present toolbox in order to help modellers perform uncertainty analysis and sensitivity analysis of their models.

The issue of uncertainty: Computer models can be highly parameterised and subject to different plausible assumptions. Therefore, defining the precise value of each input variable can be challenging. Meanwhile, good modelling practice requires that uncertainty in model inputs be accounted for in simulation studies. GSA (Global Sensitivity Analysis) can facilitate this task by pointing out those uncertain input variables (denoted $\mathbf{x}=(x_1,\dots,x_{d})$) mostly responsible for the model output (denoted $y$) uncertainty. Knowing this, modellers can focus their attention on the sensitive variables.

## About SIML@B

SIML@B provides a set of online tools to perform uncertainty analysis and sensitivity analysis (UASA) of model output. The diagram below provides a classification of some existing methods to perform global sensitivity analysis. For the time being, only the method in the green boxes are implemented. While the data-driven methods only require an input/output Monte Carlo sample, design-driven methods require specific design of experiments which are provided by the module Sampler. See tabs, Moment-Free, Variance-based or Screening for a short description of the methods and the definition of the associated sensitivity indices. By clicking on a green box, you will be redirected to the corresponding R-Shiny online tool from which you can upload your dataset and perform the analysis.

Requirements for using SIML@B: The dataset must be an array $[\mathbf{X},\mathbf{y}]$ of size $N(d+1)$. The first $d$ columns must contain the Monte Carlo sample of $\mathbf{x}$ (i.e. input vector), the last column the corresponding values of the model response $y$ (i.e. scalar output). These data, stored in a csv (sep. ; or ,) or txt (sep. tabulation) file on the user’s computer, may contain the variables’ name in the first row. Note that Excel files (xls, xlsx,…) are not supported.

Dependent vs independent inputs: in principle, UASA can be carried out whether the input variables are dependent on each other or not. However, sensitivity indices are easier to interpret in the case of independent input variables. Nevertheless, it is shown in Mara and Becker (2021) that any GSA method can be applied to the case of dependent inputs provided that, beforehand the latter have been transformed into independent variables (with one of the possible Rosenblatt transformations - i.e. see also tab GSA). Therefore, in SIML@B only dataset with independent sample $\mathbf{X}$ should be uploaded.

## Diagram comments (see diagram above)

Sensitivity indices: To perform sensitivity analysis, one has to compute some statistical measure of importance called sensitivity index. The sensitivity index to be estimated depends on the purpose of the analysis. Sensitivity indices for GSA can be roughly classified into three categories: i) Screening, ii) Variance-Based and iii) Moment-Free.

Screening methods: these methods are used for factor fixing setting. Screening methods are also named qualitative methods because they do not allow the ranking of the input variables by order of importance. Morris method (1991) is an instance of screening method. See tab Screening for more details.

Variance-based methods: Variance-based methods come from the work of Ilya M. Sobol' (1993). They assess the so-called Sobol' indices which are the quantitative measure of importance of interest. These indices stem from the analysis of variance (ANOVA decomposition) of $y$. See tab Variance-Based for more details.

Moment-free methods: Moment-free methods do not rely on any specific moment of $y$ (like its variance). They quantitatively assess the difference between the unconditional random variable $y$ and the conditional random variable $y\vert x_i$. One such quantitative measure of importance is the one proposed by Borgonovo (2007). See tab Moment-Free for more details.

## References

• Rosenblatt M., Remarks on a multivariate transformation, Ann. of Math. Stat., Vol. 23(3), 470-472, 1952.
• Morris M.D., Factorial sampling plans for preliminary computational experiments, Technometrics, Vol. 33(2), 161-174, 1991.
• Sobol' I.M., Sensitivity estimates for nonlinear mathematical models, Math. Mod. and Comput. Exp., Vol 1, 407-414, 1993.
• Borgonovo E., A new uncertainty importance measure, Reliab. Eng. & Syst. Saf., Vol. 92, 474-496, 2007.
• Mara T.A. & Becker W.E., Polynomial chaos expansion for sensitivity analysis of model output with dependent inputs, Reliab. Eng. & Syst. Saf., SAMO Special Issue, 2021.

## What is GSA?

GSA: GSA is the acronym for Global Sensitivity Analysis to be differentiated from Local Sensitivity Analysis (LSA). In LSA, the input variables are assumed known fairly well so that their associated uncertainty is narrow. Therefore, the relative importance of the input variables can be judged from the estimated Pearson correlation coefficients or Spearman rank correlation coefficients among others. On the opposite, in GSA the input variables can be largely uncertain and are varied simultaneously when propagating their uncertainty through the model response of interest. The consequence is that GSA requires sophisticated statistical methods to infer important and non-important input variables.

GSA settings: Good practice requires that GSA be framed into a setting in order to clarify what is the objective of the analysis so that ranking of input variables by order of importance makes sense. Instances of GSA settings are (Saltelli & Tarantola 2002 and Saltelli et al. 2004):

• Factor fixing setting: Also called screening, the objective is to identify the non-important variables. In this setting input ranking doest not matter much. Any class of methods in the diagram below can be chosen to address this issue. For variance-based methods, the total-order Sobol' index is to be preferred.
• Factor prioritization setting: What would be the expected reduction in the output variance (or reduction of the uncertainty range) if an input variable (or subset of inputs) were known accurately (or its uncertainty drastically reduced)? In this setting, the input variables can be ranked by order of importance. One possible sensitivity index that allows to address this issue is the first-order Sobol' index (variance-based).

GSA methods: There are basically two categories of methods: 1) design-driven and 2) data-driven. The former requires a specific design of experiment (i.e. in the way of generating the input sample). The latter does not require any specific design, any Monte Carlo sample is just fine.

Monte Carlo: SIMl@B requires an input/output Monte Carlo sample. There are different samplers proposed in the literature: simple random sampling, Latin hypercube sampling, low-discrepancy sequences, orthogonal arrays etc. Some of them are proposed in the module Sampler.

Which method to use? The method to be used depends on the sensitivity indices that one wants to compute which depends on the GSA settings (see wokflow below). SIMl@B does not allow to address all possible GSA settings. In SIMl@B, data-driven methods do not require any specific sampling design which means that the same (input-output) dataset can be used with the different data-driven methods.

## Independent vs dependent inputs

In the sequel, it is assumed that $y=\mathcal{M}(\mathbf{x})=f(\mathbf{u})$ is the computer model response of interest. The input vector $\mathbf{u}=(u_1,\dots,u_d)$ is a vector of random variables uniformly distributed over the unit hypercube $(0,1)^d$.

The model response is function of a random vector $\mathbf{x}$ arbitrarily distributed. The $u$-variables uniformly distributed over $(0,1)^d$ can be inferred with one of the following invertible transformations (depending on the case),

• Independent inputs: the $u$-variables are obtained by the integral transform, $u_{i_1} = F_{i_1}(x_{i_1})$
• Dependent inputs: the $u$-variables are obtained by one of the possible Rosenblatt transforms (Rosenblatt 1952), $\begin{matrix} u_{i_1} =& F_{i_1}(x_{i_1})\\ u_{i_2} =& F_{i_2\vert i_1}(x_{i_2}\vert x_{i_1})\\ \vdots&\\ u_{i_d} =& F_{i_d\vert \sim i_d}(x_{i_d}\vert x_{i_1},\dots,x_{i_{d-1}}) \end{matrix}$ where $F_{i_1}$ is the marginal cumulative distribution function (cdf) of $x_{i_1}$ and $F_{i_j\vert i_k,\dots,i_l}$ is the conditional cdf of $x_{i_j}$ onto $(x_{i_k},\dots,x_{i_l})$.

Interpetation of the sensitivity indices: In the case of independent inputs, the interpretation is simple, the sensitivity indices of $u_{i_1}$ are those of $x_{i_1}$. When the inputs are dependent of each other, the interpretation is more tricky (see Mara & Tarantola 2012, Mara et al. 2015, Tarantola & Mara 2017, Mara & Becker 2021)

## References

• Rosenblatt M., Remarks on a multivariate transformation, Ann. of Math. Stat., Vol. 23(3), 470-472, 1952.
• Saltelli A. & Tarantola S., On the relative importance of input factors in mathematical models: safety assessment for nuclear waste disposal, J. Am. Stat. Assoc., Vol 97, 702-709, 2002.
• Saltelli A., Tarantola S., Campolongo F. and Ratto M. Sensitivity analysis in practice, Probability and Statistics, Chichester, UK: J. Wiley and Sons, 2004.
• Mara T.A. and Tarantola S., Variance-based sensitivity indices for models with dependent inputs, Reliab. Eng. & Syst. Saf., Vol. 107, 115-121, 2012.
• Mara T.A., Tarantola S., Annoni P., Non-parametric methods for global sensitivity analysis of model output with dependent inputs, Envir. Modell. & Softw., Vol. 72, 173-183, 2015.
• Tarantola S. and Mara T.A., Variance-based sensitivity indices of computer models with dependent inputs: the Fourier amplitude sensitivity test, J. of Uncert. Quantif., Vol. 7, 511-523, 2017.
• Mara T.A. & Becker W.E., Polynomial chaos expansion for sensitivity analysis of model output with dependent inputs, Reliab. Eng. & Syst. Saf., SAMO Special Issue, 2021.

## Theory

Ilya M. Sobol' (1993) shows that it is possible to decompose $f(\mathbf{u})$ (provided it is square-integrable) into summands of different dimension as follows, $f(\mathbf{u}) = f_0 + \sum_{i_1=1}^{d}f_{i_1}(u_{i_1})+\sum_{i_2>i_1}^{d}f_{i_1,i_2}(u_{i_1},u_{i_2})+\dots+f_{1,\dots,d}(u_1,\dots,u_d)$ The decomposition is unique if one further imposes that the functions are orthogonal. The decomposition is then called ANOVA-HDMR.

Under the orthogonality assumption, it is straightforward to show that the total variance of $y=f(\mathbf{u})$ is equal to, $V_y = \sum_{i_1=1}^{d}V_{i_1}+\sum_{i_2>i_1}^{d}V_{i_1,i_2}+\dots+V_{1,\dots,d}$ where $V_{i_1}$ is an individual contribution of $x_{i_1}$ while $V_{i_1,i_2}$ is a mutual contribution of $(x_{i_1},x_{i_2})$ called first-order interaction, etc.

It is usually more convenient to normalised the previous relationship, which leads to the definition of the variance-based sensitivity indices (or Sobol’ indices), $1 = \sum_{i_1=1}^{d}S_{i_1}+\sum_{i_2>i_1}^{d}S_{i_1,i_2}+\dots+S_{1,\dots, d}$

The Sobol’ indices range within $[0,1]$. They measure the amount of the variance of $y$ due to $x_{i_1}$ alone (i.e. $S_{i_1}$) or by its interactions with the other variables (e.g. $S_{i_1,i_2}$, $S_{i_1,i_2,i_3}$). The higher $S_{i_1}$ the more $y$ is sensitive to $x_{i_1}$. It is also convenient to introduce the total-order sensitivity index that captures the overall contribution of $x_{i_1}$ (Homma and Saltelli 1996),

$ST_{i_1} = S_{i_1}+\sum_{i_2\neq i_1}^{d}S_{i_1,i_2}+\sum_{i_3\neq i_1,i_2}^{d}S_{i_1,i_2,i_3}+\dots+S_{1,\dots, d}$ Therefore, if $ST_{i_1}=0$, then $x_{i_1}$ is deemed non-important for the model response.

Bayesian notation: First-order and total-order sensitivity indices can also be defined from the law of total variance, as follows, $S_{i}=\frac{\mathbb{V}[\mathbb{E}[y\vert u_{i}]]}{\mathbb{V}[y]}$ $ST_{i}=\frac{\mathbb{E}[\mathbb{V}[y\vert \mathbf{u}_{\sim i}]]}{\mathbb{V}[y]}$ where $\mathbf{u}_{\sim i}= \mathbf{u}/u_{i}$.

N.B.: It is also possible to define first-order and total-order sensitivity indices for groups of inputs.

## Data-driven method

### Polynomial chaos expansion

Polynomial chaos expansion is a spectral method that casts the model response onto orthonormal polynomial basis, as follows, $f(\mathbf{u})=\sum_{r_1=0}^{+\infty}\dots\sum_{r_d=0}^{+\infty} c_{r_1\dots r_d}\psi_{r_1}(u_1)\dots\psi_{r_d}(u_d)$ where $\psi_{r_{i}}(u_{i})$ is the normalised and shifted-Legendre polynomial of degree $r_{i}$ as $u_{i}\sim\mathcal{U}(0,1)$. This method was introduced by B. Sudret (2008) for global sensitivity analysis. Indeed, the orthonormality of the polynomials yields the following variance decomposition, $\mathbb{V}[f(\mathbf{u})]=\sum_{r_1=0}^{+\infty}\dots\sum_{r_d=0}^{+\infty} \vert c_{r_1\dots r_d} \vert^2 - \vert c_{0\dots 0} \vert^2$ from which it is straightforward to infer any partial variance of interest (and so, any variance-based sensitivity index).

First-order partial variance $\mathbb{V}[\mathbb{E}[y\vert u_{i}]]=\sum_{r_{i}=1}^{+\infty}\vert c_{0\dots r_{i} 0\dots}\vert^2$ Total-order partial variance $\mathbb{E}[\mathbb{V}[y\vert \mathbf{u}_{\sim i}]]=\sum_{r_{i_1}=0}^{+\infty}\dots\sum_{r_{i}=1}^{+\infty}\dots\sum_{r_{i_d}=0}^{+\infty}\vert c_{r_{i_1}\dots r_i\dots r_{i_d}}\vert^2$

In practice, only sparse polynomial chaos expansions are investigated. In SIML@B, the Bayesian sparse polynomial chaos expansion algorithm of Shao et al (2017) is implemented. This method does not need any specific sampling design. A merely single (quasi) Monte Carlo sample is enough, the size of which depends on the complexity/smoothness of $f(\mathbf{u})$. Hence, faster convergence can be achieved when working with the transformed input variables $\mathbf{x}$ which are the ones really fed into the model. Because SIML@B ignores which are the input sample provided, the orthonormal polynomials are derived with the Gram-Schmidt procedure.

## Design-driven methods

### The Fourier amplitude sensitivity test

The Fourier Amplitude Sensitivity test was introduced by Cukier et al. (1973) to compute the variance-based first-order sensitivity index. Later on, Saltelli et al. (1999) extended the approach to the computation of the total-order sensitivity index. FAST exploits the Parseval-Plancherel theorem, that is, $\mathbb{V}[f(\mathbf{u})]=\sum_{r_1=-\infty}^\infty\dots\sum_{r_d=-\infty}^\infty \vert c_{r_1\dots r_d} \vert^2 - \vert c_{0\dots 0} \vert^2$ where $c_{r_1\dots r_d}$ is the Fourier coefficient defined as follows, $c_{r_1\dots r_d}=\frac{1}{2\pi}\int_{0}^{2\pi}f(u_1(s),\dots,u_d(s))e^{-j(r_1\theta_1+\dots+r_d\theta_d)s}\textrm{d}s$ where, $j$ is the complex number, and the input variables are varied periodically within $(0,1)^d$, that is, $u_{i}(s)=\frac{1}{2}+\frac{1}{\pi}\arcsin(\sin(\theta_{i} s))$

Although the Fourier coefficients (i.e. the $c_{r_1\dots r_d}$'s) can be estimated by the Monte Carlo integral approximation, SIML@B uses the Fast Fourier Transform algorithm of R. FAST requires the selection of a frequency set $\mathbf{\theta}=(\theta_1,\dots,\theta_d)$ composed of incommensurate integer numbers. However, only frequencies that do not interfere up to a given factor $M$ is feasible in practice.

- First-order partial variance with scrambled FAST

The random balance design trick introduced in Tarantola et al. (2006) overcomes the issue of selecting the frequency set. The idea is to select one single frequency (assuming no variable grouping), say $\theta=1$, and randomised the search path $s$. One merely generates $d$ random search paths $(s_1(\omega),\dots,s_d(\omega))$ and sampled the $u$-variables as follows, $u_{i}(\omega)=\frac{1}{2}+\frac{1}{\pi}\arcsin(\sin(s_{i}(\omega)))$ and the (first-order) Fourier coefficients are given by, $c_{0\dots r_{i}0\dots }=\frac{1}{2\pi}\int_{0}^{2\pi}f(\mathbf{u}(\omega))e^{-jr_{i}s_{i}(\omega)}\textrm{d}s_i(\omega)$ The first-order partial variance is then obtained as follows, $\mathbb{V}[\mathbb{E}[y\vert u_i]]=2\sum_{r_{i}=1}^{+\infty}\vert c_{0\dots r_{i}0\dots 0}\vert^2$ Only first-order Sobol' indices can be computed with RBD but with only one single sample of size $N>2M$, in SIML@B, $M=5$. A generalisation of RBD, called EASI, has been proposed by Plischke (2010) with a de-biased estimator [see also Tissot and Prieur (2012) for the de-biased estimator].

- First-order and total-order partial variances with scrambled FAST

By noticing that, $\mathbb{V}[\mathbb{E}[y\vert u_i]]=2\sum_{r_{i}=1}^{+\infty}\vert c_{0\dots r_{i}0\dots 0}\vert^2$ and, $\mathbb{E}[\mathbb{V}[y\vert \mathbf{u}_{\sim i}]]=2\sum_{r_{i_1}=0}^{+\infty}\dots\sum_{r_{i}=1}^{+\infty}\dots\sum_{r_{i_d}=0}^{+\infty}\vert c_{r_{i_1}\dots r_i\dots r_{i_d}}\vert^2$ Saltelli et al. (1999) propose to estimate the total-order sensitivity index of $x_{i}$ by assigning to the latter a very high frequency, e.g. $\theta_{i}=2 M \max(\mathbf{\theta}_{\sim i})$+1. The partial variances above are then approximated by summing the Fourier coefficients from frequency $M \max(\mathbf{\theta}_{\sim i})+1$ to frequency $M \max(\theta_{i})$. The drawback is that to comply with Nyquist criterion, the sample size must be such that $N=2 M \max(\theta_{i})+1$. Therefore EFAST is not implemented in SIML@B.

Mara (2009) proposes to use a scrambled version of FAST that has the advantage to allow variables grouping. The idea is to assign to each variable a frequency taken as uniformly spanning the range $[1,\theta_{m}]$, with $\theta_{m}=\frac{N-200}{2M}$. If $\theta_{m}\leq d$, then the frequency set is $(1,2,\dots,d)$. To compute the sensitivity indices of $u_i$ one has to scramble the values of $u_i$ before running the model. Then, the model response is analysed with the Fast Fourier Transform to get the first- and total-orders sensitivity indices of $u_i$ [see Mara (2009) for more details]. To compute the overall first- and total-orders indices, $N_r=N d$ model runs are required.

### Monte Carlo estimators

- First-order sensitivity indices with RBD Mara and Rakoto Joseph (2008) propose a sampling strategy to compute the overall first-order sensitivity indices with $N_r=2N$ model runs. The idea is to generate a first input sample $\mathbf{U}$ of size $N$ and then to infer a second input sample $\mathbf{U}‘$ by columnwisely permuting the elements in $\mathbf{U}$. $\mathbf{U}$ and $\mathbf{U}’$ are thus two independent replicates of $\mathbf{u}$. By denoting $\mathbf{U}^{(s_i)}$ the sample rearranged such that the values of $u_i$ in the i-th column are sorted, the first-order sensitivity index of $u_i$ is obtained with the estimator studied in Sobol and Levitan (1999), $S^{MC-RBD}_i=\frac{\mathbb{COV}\left(f(\mathbf{u}^{(s_i)}),f(\mathbf{u}‘^{(s_i)})\right)}{\sqrt{\mathbb{V}[f(\mathbf{u}^{(s_i)})]\mathbb{V}[f(\mathbf{u}^{’(s_i)})]}}$ where $\mathbb{COV}$ is the covariance operator. We note that $S^{MC-RBD}_i$ is defined as the Pearson correlation coefficient of $f(\mathbf{u}^{(s_i)})$ and $f(\mathbf{u}^{‘(s_i)})$.

Grouped variables: Suppose that one wants to compute the first-order sensitivity index of the group of inputs $\mathbf{u}_{ij}=(u_i,u_j)$. This sensitivity index is also called the closed second-order effect of $(u_i,u_j)$ Saltelli (2002). Then, $\mathbf{U}’$ is obtained by columnwisely scrambled the elements in $\mathbf{U}$ but with columns $i$ and $j$ scrambled simultaneously.

- First- and total-order sensitivity indices

@Azzini21EMS propose the following fast convergent estimator relying on $N_r=2N(d+1)$ model runs to compute the overall set of first- and total-order sensitivity indices, $S^{IA}_i=\frac{2\int_{[0,1]^{2d}}\left(f(\mathbf{u})-f(u_{i}‘,\mathbf{u}_{\sim i})\right)\left(f(u_{i},\mathbf{u}_{\sim i}’)-f(\mathbf{u}‘)\right)\textrm{d}\mathbf{u}\textrm{d}\mathbf{u}’}{\int_{[0,1]^{2d}}\left[\left(f(\mathbf{u})-f(\mathbf{u}‘)\right)^2+\left(f(u_{i}’,\mathbf{u}_{\sim i})-f(u_{i},\mathbf{u}_{\sim i}‘)\right)^2\right]\textrm{d}\mathbf{u}\textrm{d}\mathbf{u}’}$ $ST^{IA}_i=\frac{\int_{[0,1]^{2d}}\left[\left(f(\mathbf{u})-f(u_{i},\mathbf{u}_{\sim i}‘)\right)^2+\left(f(u_{i}’,\mathbf{u}_{\sim i})-f(\mathbf{u}‘)\right)^2\right]\textrm{d}\mathbf{u}\textrm{d}\mathbf{u}’}{\int_{[0,1]^{2d}}\left[\left(f(\mathbf{u})-f(\mathbf{u}‘)\right)^2+\left(f(u_{i}’,\mathbf{u}_{\sim i})-f(u_{i},\mathbf{u}_{\sim i}‘)\right)^2\right]\textrm{d}\mathbf{u}\textrm{d}\mathbf{u}’}$ with the desired property that $ST^{IA}_i\geq S^{IA}_i$ whatever the sample size $N\geq1$ and the equality hold if and only if $u_i$ does not interact with the other inputs.

## References

• Azzini, I., Mara, T. A., and Rosati, R., Comparison of two sets of Monte Carlo estimators of Sobol’ indices, Envir. Mod. and Soft., Vol. 144, 105167, 2021.
• Cukier, R. I., Fortuin, C. M., Shuler, K. E., Petschek, A. G., and Schaibly, J. H., Study of the sensitivity of coupled reaction systems to uncertainties in rate coefficients. I. theory, J. Chemical Physics, Vol. 59, 3873–3878, 1973.
• Mara, T. A. and Rakoto Joseph, O., Comparison of some efficient methods to evaluate the main effect of computer model factors, J. of Stat. Comput. and Simul., Vol. 78, 167–178, 2008.
• Plischke, E., An effective algorithm for computing global sensitivity indices (EASI), Reliab. Eng. & Syst. Saf., Vol. 95, 354–360, 2010.
• Saltelli, A., Making best use of model evaluations to compute sensitivity indices. Comput. Phys. Commun. Vol. 145, 280–297, 2002.
• Saltelli, A., Tarantola, S., and Chan, K., A quantitative model independent method for global sensitivity analysis of model output, Technometrics, Vol. 41, 39–56, 1999.
• Shao Q., Younes A., Fahs M., Mara T.A., Bayesian sparse polynomial chaos expansion for global sensitivity analysis, Comput. Meth. in Appl. Mech. & Eng., Vol. 318, 474-496, 2017.
• Sudret B., Global sensitivity analysis using polynomial chaos expansions, Reliab. Eng. & Syst. Saf., Vol. 93(7), 964–979, 2008.
• Sobol' I.M., Sensitivity estimates for nonlinear mathematical models, Math. Mod. and Comput. Exp., Vol 1, 407-414, 1993.
• Sobol’, I. M. and Levitan, Y. L., On the use of variance reducing multipliers in monte carlo computations of a global sensitivity index, Comput. Phys. Commun., Vol. 117, 52–61, 1999.
• Tarantola, S., Gatelli, D., and Mara, T. A., Random balance designs for the estimation of first-order global sensitivity indices, Reliab. Eng. & Syst. Saf., Vol. 91, 717–727, 2006.
• Tissot, J. Y. and Prieur, C., Bias correction for the estimation of sensitivity indices based on random balance designs, Reliab. Eng. & Syst. Saf., Vol. 107, 205–213, 2012.
• Homma, T., and Saltelli A., Importance measures in global sensitivity analysis of nonlinear models, Reliab. Eng. & Syst. Saf., Vol. 52, 1–17, 1996.

## Introduction

Moment-independent importance measures do not rely on any particular statistical moment of the model response of interest $y$. They usually take the following form, $\alpha_{i}=E_{x_i}\left[A(y,y\vert u_{i})\right]$ where $u_{i}\in(u_1,\dots,u_d)$, $y \vert u_{i}$ is the output variable conditioned onto $u_{i}$ and the inner operator $A$ is a measure of the distance between $y$ and $y \vert u_{i}$ that is not related to any statistical moment (Borgonovo et al. 2014). There are many sensitivity measures that can be defined in this way (see also Da Veiga 2015). In SIML@B only two of them are implemented.

## PDF-based importance measure

The PDF-based sensitivity measure computed by the Web-App is the one introduced by E. Borgonovo (2007). Let us denote by $p_y$ the (unconditional marginal) probability density function (pdf) of the model response $y$ and $p_{y\vert u_{i}}$ its pdf conditioned onto the value of some input $u_{i}$. The relative sensitivity index measures how far $p_{y\vert u_{i}}$ is from $p_{y}$. It is defined by the following statistics, $D(u_{i}) = \frac{1}{2}\int_{\mathbb{R}}\vert p_{y} - p_{y\vert u_{i}} \vert \text{d}y$ which measures the half-distance between the two pdfs for some given value of $u_i$ (N.B.: the $\frac{1}{2}$ factor is for the sake of normalization). Finally, the average distance, namely the PDF-based sensitivity index, is inferred as follows, $\delta_{i} = \int_{0}^{1} D(u_{i}) \text{d}u_{i}$

where $\delta_{i}\in[0,1]$ and variable $u_{i}$ is deemed non-important for $y$ (and reciprocally) if $\delta_{i}=0$.

## CDF-based importance measure

The CDF-based sensitivity measure computed by the Web-App is the one called PAWN in Pianosi and Wagener (2015) with the mean-statistics. Let us denote by $F_y$ the (unconditional marginal) cumulative distribution function (cdf) of the model response $y$ and $F_{y\vert u_{i}}$ its cdf conditioned onto the value of some input $u_{i}$. The relative sensitivity index measures how far $F_{y\vert u_{i}}$ is from $F_{y}$. This can be measured by computing the Kolmogorov-Smirnov distance, $T(u_{i}) = \max_{y}\vert F_{y} - F_{y\vert u_{i}} \vert$ which represents the maximum distance between the two cdfs for some given value of $u_{i}$. The CDF-based sensitivity index is defined as the average maximum distance over $u_{i}$, $\tau_{i} = \int_{0}^{1} T(u_{i}) \text{d}u_{i}$

and we have $\tau_{i}\in[0,1]$. Variable $u_{i}$ is deemed unimportant for $y$ (and reciprocally) if $\tau_{i}=0$.

## Data-driven method

### The partitioning approach

Estimation of these sensitivity measures from given data are based on the partitioning approach described in Plischke et al. (2013). In short, given the Monte Carlo sample $[\mathbf{U},\mathbf{y}]$ of size $N$, the idea is to partition the sample associated to $u_{i}$ into $n$ subsamples and to divide $\mathbf{y}$ accordingly. Then, the conditional pdf $p_{y\vert u_{i}=u_{i}^{\ast}}$ and cdf $F_{y\vert u_{i}=u_{i}^{\ast}}$ are estimated roughly from the Monte Carlo subsample containing the value $u_{i}^{\ast}$. Note that while estimating the empirical cdf is quite easy, for the empirical pdf we resort to the kernel density estimator (Parzen 1962).

## References

• Parzen E., On estimation of a probability density function and mode, Ann. of Math. Stat., Vol.33(3), 1065-1076, 1962.
• Borgonovo E., A new uncertainty importance measure, Reliab. Eng. & Syst. Saf., Vol. 92, 474-496, 2007.
• Plischke E., Borgonovo E., Curtis C. L., Global sensitivity measures from given data, Europ. J. of Operat. Res., Vol 226, 536-550, 2013.
• Borgonovo E., Tarantola S., Plischke E., Morris M.D., Transformations and invariance in the sensitivity analysis of computer models, J. Roy. Stat. Soc., Serie B, 1-23, 2014.
• Pianosi F., Wagener T., A simple and efficient method for global sensitivity analysis based on cumulative distribution functions, Environ. Model. & Softw., Vol 67, 1-11, 2015.
• Da Veiga S., Global sensitivity analysis with dependence measures, J. of Statist. Comput. & Simul., Vol. 85(7), 1283-1305, 2015.

## Introduction

Screening methods are qualitative approaches, rather computationally cheap, that in principle only allow us to address factor fixing setting. By qualitative it is meant that one cannot rely on the computed sensitivity indices to rank the inputs by order of importance. Obviously, in factor fixing setting, ranking makes no sense.

## Data-driven method

### Statistical test

Wu and Mohanty (2006) have proposed a partitioning approach for screening purposes. Let $[\mathbf{U},\mathbf{y}]$ be a Monte Carlo sample of size $N$. Let consider instead $\mathbf{Z}$ the standard normally distributed sample of $\mathbf{U}$, obtained with the following transformation, $z_i = \Phi^{-1}(u_i)$ where $\Phi$ is the standard normal cumulative density function.

The idea is to divide the range of variation of the model response into $n$ partitions. Each partition is associated with a subsample $\mathbf{Z}^{(k)}$ of size $N_k$, $k=1,\dots,n$. By computing the following statistics for each subsample,

• mean, $\mu^{(k)}_{i}=\frac{1}{N_k}\sum_{j=1}^{N_k}Z^{(k)}_{j,i}$
• variance minus one, $\nu^{(k)}_{i}=\frac{1}{N_k}\sum_{j=1}^{N_k}\left(Z^{(k)}_{j,i}\right)^2$-1

one can then test whether $\mu^{(k)}_{i}$ and $\nu^{(k)}_{i}$ are significantly different from zero. If not for any $k=1,\dots,n$, then it can be concluded that $u_i$ is not important (although only the two first-order moments are considered). Defining the null hypothesis as follows, $H_0:\mu^{(k)}_{i}=\nu^{(k)}_{i}=0$, the following probability statements can be made, $Prob\left[-\frac{Z_{\alpha/2}}{\sqrt{N_k}}\leq\mu^{(k)}_{i}\leq \frac{Z_{\alpha/2}}{\sqrt{N_k}}\right]=1-\alpha$ $Prob\left[\frac{\chi^2_{\alpha/2,N_k}}{N_k}-1\leq\nu^{(k)}_{i}\leq \frac{\chi^2_{1-\alpha/2,N_k}}{N_k}-1\right]=1-\alpha$ where $Z_{\alpha/2}$ is the $\alpha/2$-th quantile of the standard normal variable and $\chi^2_{\alpha/2,N_k}$ is the $\alpha/2$-th quantile of the chi-square variable with $N_k$ degrees of freedom. SIML@B computes the following statistic as sensitivity indicators (the amplitude of which has no meaning, thereby cannot be used for ranking), $\hat{S}_i^{WM} = \frac{1}{n}\sum_{k=1}^n\sqrt{\left(\hat{\mu}^{(k)}_{i}\right)^2+\left(\hat{\nu}^{(k)}_{i}\right)^2}$ with, $\hat{\mu}^{(k)}_{i}=\left\lbrace\begin{matrix}0,& { \rm if\;} \vert\mu^{(k)}_{i}\vert\leq\frac{Z_{\alpha/2}}{\sqrt{N_k}}\\ \mu^{(k)}_{i},& {\rm otherwise}\end{matrix}\right.$ and $\hat{\nu}^{(k)}_{i}=\left\lbrace\begin{matrix}0,& {\rm if\;} \frac{\chi^2_{\alpha/2,N_k}}{N_k}-1\leq\nu^{(k)}_{i}\leq \frac{\chi^2_{1-\alpha/2,N_k}}{N_k}-1\\ \nu^{(k)}_{i},& {\rm otherwise}\end{matrix}\right.$ A value of $\hat{S}_i^{WM}$ equal to zero indicates the irrelevance of $u_i$.

## Design-driven method

### Morris method

Proposed by Max Morris (1991), it is based on the computation of elementary effects, defined as follows $EE_{k,i} = \frac{f(u_{k,1},\dots,u_{k,i}+\Delta_i,\dots,u_{k,d})-f(u_{k,1},\dots,u_{k,d})}{\Delta_i}$ where $\Delta_i:(u_{k,i}+\Delta_i)\in(0,1)$ and $(u_{k,1},\dots,u_{k,d})\in(0,1)^d$ is drawn randomly $r$ times. The sensitivity index of $u_i$ is then assessed by the following statistic [@Campolongo07EMS], $\mu_i^{\ast}=\frac{1}{r}\sum_{k=1}^r \vert EE_{k,i}\vert$ Note that this sensitivity index is not normalised, i.e. $\mu_i^{\ast}\notin [0,1]$. Applying Morris method requires a specific sampling design that consists of varying one factor at a time. By using winding-stairs design, the computational cost is $N_r=rd+1$ while with radial sampling $N_r=r(d+1)$ [see @Saltelli10CPC]. Note that the Morris method in SIML@B does not take into account input factors grouping.

## References

• Wu Y.-T. and Mohanty S., Variable screening and ranking using sampling-based sensitivity measures, Reliab. Eng. & Syst. Saf., Vol. 91, 634-647, 2006.
• Morris, M. D., Factorial sampling plans for preliminary computational experiments, Technometrics, Vol. 33, 161–174, 1991.

SIML@B

Version: 1.0

© European Union 2022

## Applicable terms

Access to, and use of, any part of SIML@B, any document, material or other information, including data, text, images, sound and video are made available on the website and any of the informatics tools provided as services via SIML@B are governed by these terms and conditions of use and constitute acceptance by the User.

BY USING SIML@B, YOU (HEREINAFTER “THE USER”) AGREE TO BE BOUND TO THESE TERMS AND CONDITIONS. IF YOU DO NOT AGREE TO ANY OF THE TERMS OR CONDITIONS PROVIDED HEREIN, PLEASE DO NOT USE SIML@B OR ANY INFORMATION OR TOOLS HEREIN.

The European Commission reserves the right to amend these terms and conditions and any other specific terms on SIML@B at any time by posting amended terms and conditions on the website. Such amendments will take effect on the date on which they are posted.

## Ownership

Information contained in SIML@B may be protected by intellectual property rights. Proprietary rights, including copyright, subsisting in any of the information available via this website are vested in their respective owners, being these the European Union, contributors [you may list other categories if applicable] or any other third-party credited as such. Access to SIML@B does not give the User any ownership title in the information and informatics tools made available to them via SIML@B. All rights not specifically granted herewith are reserved by the respective owner(s).

Any third-party materials which may be offered in the website, as may have been identified in the accompanying release notes, copyright notices or within any other written documentation provided along, are the properties of their respective owners and their use may be subject to separate or additional terms which the User accepts. The User undertakes to carefully observe the rights and obligations applicable to each of the materials used within the analysis performed.

## Disclaimer

Disclaimer of Warranty: SIML@B is a work in progress, which may be continuously improved by numerous contributors. It is not a finished work and may therefore contain defects or 'bugs' inherent to this type of development. For the above reason, SIML@B is provided on an 'as is' basis and without warranties of any kind, including, without limitation, for merchantability, fitness for a particular purpose, absence of defects or errors, accuracy and non-infringement of intellectual property rights. This disclaimer of warranty is a condition for the grant of any rights to the tool. Should errors or omissions concerning the results provided via SIML@B be brought to its attention, the European Commission will endeavour to correct them to the extent possible.

Whilst the European Commission is committed to ensuring that the availability of SIML@B and the access to the results will be essentially uninterrupted and that transmissions will be error-free, this cannot be guaranteed. Access to SIML@B may also occasionally be suspended, restricted or impeded in order to perform repairs, maintenance operations or to introduce new services. The European Commission will not be liable for any incidental, consequential, direct or indirect damages including but not limited to the loss of data, lost profits, or any other financial loss arising from the use of, or inability to use SIML@B even if the European Commission has been notified of the possibility of such loss, damages, claims or costs or for any claim by any third party. The entire risk as to the use, quality, and performance of SIML@B is with the User.

Disclaimer of Liability: Except in the cases of wilful misconduct or damages directly caused to natural persons, the European Union and the European Commission will in no event be liable for any direct or indirect, material or moral, damages of any kind, arising out of the use of SIML@B, including without limitation, damages for loss of goodwill, work stoppage, computer failure or malfunction, loss of data or any commercial damage, even if the European Union or the European Commission has been notified of the possibility of such damage.

While it strives to keep SIML@B and the results therein generated accurate, the European Commission makes no claims, assurances, or guarantees about the accuracy, completeness, or adequacy of the results available via SIML@B and, therefore, expressly disclaims its liability for errors and omissions to the maximum extent permitted by law. This disclaimer is not intended to limit the liability of the European Union or the European Commission in contravention of any requirements laid down in applicable national law nor to exclude its liability for matters which may not be excluded under that law.

CC-MOD has not reviewed, and cannot review, all of the data made available in SIML@B and cannot, therefore, be responsible for that data’s content, use or effects. By operating SIML@B, CC-MOD does not represent or imply that it endorses the data or results there generated, or that it believes such data or results to be accurate, useful or non-harmful.

## Rights and limitations

Subject to the terms and conditions provided herein, access to SIML@B is granted to the User on a non-exclusive, non-transferrable and royalty-free basis within the scope and objectives pursued by SIML@B and in accordance with any applicable law thereto.

The User may not use SIML@B in a manner which may mislead or confuse people into believing that any products and services provided by the User are in some way endorsed or certified by the European Union. Under no circumstance shall the User use the European Commission logo or otherwise spend the official name and credentials of the European Commission to imply affiliation, endorsement or other official link to European Commission unless explicitly permitted under the European Commission's visual identity policy (https://ec.europa.eu/info/resources-partners/european-commission-visual-identity_en). The User shall comply with any all applicable laws and regulations. The User must refrain from attempting or performing any activity that could harm or violate the website's network performance and/or security, or any other activity driven by unlawful purposes.

By submitting any information (i.e. datasets) through SIML@B the User warrants that he/ she holds all necessary rights in the information provided including but not limited to copyright where applicable and, therefore, that he/ she is entitled to submit the said information for the purposes listed herein and for its use by the European Commission and by all users of SIML@B. The User also warrants that he/ she is fully compliant with any third-party rights or licenses relating to the information and have taken all necessary steps to successfully pass through any required terms. Users uploading or otherwise submitting content to SIML@B, grant the European Union a worldwide licence to use, host, store, reproduce, modify, create derivative works (such as those resulting from analysis, translations, adaptations or other changes), communicate, publish, publicly present, publicly display and distribute such content, within the scope and objectives pursued by the SIML@B and in accordance with any applicable law thereto.

The User undertakes not to decompile, reverse engineer, disable and attempt to decrypt, make any modification to, copy or reproduce in any form the software underlying the informatics tools made available through SIML@B. The European Union reserves any other right not expressly granted herein to the User. Any violation of these terms and conditions shall automatically terminate the licence granted herein.

Without limiting any of those representations or warranties, the European Commission has the right (though not the obligation) to, in the European Commission sole discretion (i) refuse or remove any content that, in the European Commission’s opinion, violates any official policy or is in any way harmful or objectionable, or (ii) terminate or deny access to and use of SIML@B to any individual or entity for any reason.

CC-MOD may terminate access to all or any part of SIML@B with or without cause, with or without notice, effective immediately. In order to terminate this agreement or any account, the User may simply discontinue using SIML@B. The termination shall not relieve the User from its liability to respect all the obligations claimable before the termination date. In particular, the provisions of the obligations relating to the performance, the disclaimer of guarantees and warranties and limitations of liabilities shall survive the termination of the authorisation under these terms and conditions, howsoever caused, but this shall not imply or create any continued right to use SIML@B after the termination.

## Acknowledgement

Unless otherwise indicated (e.g. in individual copyright notices), content owned by the European Union on SIML@B is made available to the User under the terms of the Commission's reuse policy, implemented by Commission Decision 2011/833/EU of 12 December 2011 on the reuse of Commission documents and is licensed under the Creative Commons Attribution 4.0 International (CC BY 4.0) licence. Reuse is authorised, provided the source is acknowledged. Any modifications shall be clearly indicated. The European Commission shall not be liable for any consequence stemming from the reuse.

The User may be required to clear additional rights if a specific content depicts identifiable private individuals or includes third-party works. To use or reproduce content that is not owned by the European Union the User may need to seek permission directly from the right holders. Software or documents covered by industrial property rights, such as patents, trade marks, registered designs, logos and names, are excluded from the Commission's reuse policy and are not licensed to the User.

In addition to the above, where information generated or retrieved via SIML@B is utilised by the User in a publication of scientific, technical or academic nature, the User undertakes to duly reference SIML@B in accordance with standard academic practices.

## Data processing and protection

Information hosted on this website is treated according to the principles governing the protection of personal data in the European Union Regulation (EU) 2018/1725.

## Acknowledgement

SIML@B has been developed with the R Shiny package. The diagram displayed in the tab Siml@b was made with the DiagrammeR package. We also make use of the fantastic shinyjs package. SIML@B makes use of the randtoolbox package for generating random numbers. The authors are indebted to the developers of the aforementioned packages.

## References

• Winston Chang, Joe Cheng, JJ Allaire, Carson Sievert, Barret Schloerke, Yihui Xie, Jeff Allen, Jonathan McPherson, Alan Dipert and Barbara Borges (2021). shiny: Web Application Framework for R. R package version 1.6.0. https://CRAN.R-project.org/package=shiny
• Richard Iannone (2020). DiagrammeR: Graph/Network Visualization. R package version 1.0.6.1. https://CRAN.R-project.org/package=DiagrammeR
• Dean Attali (2020). shinyjs: Easily Improve the User Experience of Your Shiny Apps in Seconds. R package version 2.0.0. https://CRAN.R-project.org/package=shinyjs
• Christophe D, Petr S (2020). randtoolbox: Generating and Testing Random Numbers. R package version 1.30.1.