The selection of the DMUs is an intrinsic and important step in a non-parametric model and involves two issues: (1) the number of DMUs and (2) the level of the DMUs. Firstly, consider the number of DMUs. Similar as in parametric regressions, the researcher should try to include as many observations as possible to obtain meaningful estimations. Indeed, the relative nature of DEA makes it vulnerable to problems with the degrees of freedom. The number of degrees of freedom will increase with the number of DMUs in the dataset, and decrease with the number of input and output variables. Banker et al. (1989) suggest a rough rule of thumb. Let p be the number of inputs and q be the number of outputs used in the analysis, then the sample size n should satisfy n ? max{p × q, 3(p + q)}. In addition, if observations are added, the ‘world best practice frontier’ will be better approached (Estache et al., 2004), although due to the sample size bias average efficiency will decrease (see below; Zhang and Bartels, 1998). Secondly, consider the level of the DMUs which influences the shape of the production possibility set (i.e., the frontier; and is therefore included in this phase). If the analysis is performed on a different level (e.g., macro versus micro units), different results can be obtained. For example, when comparing universities, we may select universities that are research focused, or teaching focused or all universities. Each case results in a different production possibility set, and as such, a different efficiency score.

Selecting different input and output variables could heavily influence the results of the DEA model. Indeed, DEA estimates relative efficiencies (i.e., relative to a best practice frontier) and allows for specialization in one or another input or output variable. The researcher should be aware of this important choice. The inputs and outputs can be justified by the existing literature, by managerial analysis (i.e., what are the best inputs and outputs according to the entities), by multivariate analysis (e.g., is there multicolinearity between the different inputs and outputs) or by simple ratio analysis. Cook and Zhu (2008) suggest to use a ratio when it is not clear whether a variable should be classified as an input or an output. The ratio form generalizes one-dimensional engineering-science definition of efficiency (which considers the simple ratio ), to a more general and multidimensional ratio: . If an increase in the value of the variable results in an increase in the efficiency score then it belongs to the numerator and it is an output variable. If an increase in its value results in a decrease in the value of the efficiency ratio then it belongs to the denominator and it is an input variable.

As a rule of thumb, Dyson et al. (2001) suggest that the selected inputs and outputs should cover the full range of resources used and outputs produced, among the evaluated entities. We pointed already on the importance of exogenous variables. If the researcher wants to provide an accurate picture of reality (i.e., without assigning higher efficiency scores to observations operating in a more favourable environment), he/she needs to include exogenous characteristics. Similar to the selection of inputs and outputs, exogenous variables can be selected by considering managerial information or getting information from the previous studies in the literature.

As DEA assumes free disposability and convexity assumptions (see Fried et al., 2008), it is further restricted by making an assumption on the shape of the convex hull or convex cone (Kleine, 2004). The initial DEA model of Charnes, Cooper and Rhodes (1978) (so-called CCR model) assumed a convex cone. As such, in a two dimensional picture, the production frontier corresponds to a piecewise linear frontier (i.e., the observation with the highest average efficiency as measured by the ratio of outputs to inputs). The technical inefficiencies can be due to the ineffective operation of the production process in transforming inputs to outputs and due to the divergence of the entity from the Most Productive Scale Size (MPSS). As indicated in Banker (1984) the most productive scale size is that scale for which the average productivity measured is maximized (i.e., operating at optimal returns to scale). The DEA model with variable returns to scale is often referred to as the BCC model after Banker et al. (1984) who introduced a convex hull instead of a convex cone around the data. More recently, by the work of Kerstens and Vanden Eeckaut (1999) and by Podinovski (2004), also in the non-convex FDH returns to scale were introduced. The returns to scale can be tested by bootstrap procedures (Simar and Wilson, 2002) or statistical tests (Kittelsen, 1993, Banker and Natarajan, 2004). In particular, the procedure tests by the use of bootstrapping whether there is a significant difference between CRS and VRS. Obviously, in most applications the returns to scale specification (CRS versus VRS) can deliver significantly different outcomes and, as such, a well considered model should be selected. Also the consistency of the estimates depends on the model specification. If the ‘true’ underlying production function exhibits VRS, then only the VRS-assumption delivers consistent results. However, if the true underlying model is CRS, both VRS and CRS assumption deliver consistent results. Remark that the non-convex FDH model delivers consistent results, however, at a lower rate of convergence due to less structure in the model (Daraio and Simar, 2007).

The DEA model basically weights the heterogeneous inputs and outputs such that the highest efficiency score can be obtained. The researcher can also decide to attach specific weight restrictions to the DEA model. These weight restrictions function as value judgements on the different inputs and outputs (Allen et al., 1997; Pedraja-Chaparro et al., 1997; and for a caveat Podinovski, 1999).

Once some assumptions on the production possibility set are made and tested, the researcher can focus on the orientation of the model. Different options are possible. The input-oriented framework minimizes the input set for a given output production. The output-oriented model maximizes the potential output production for a given input set. Under the CRS assumption, the input-oriented efficiency scores are the reciprocal of the output-oriented efficiency scores. Obviously, this is no longer the case under VRS. In many interesting real life applications, the managers of an entity are not considering input reductions and output expansions separately. Non-oriented models consider simultaneous input reductions and output expansions. The literature developed several procedures to estimate efficiency non-oriented: see, e.g., the additive model of Charnes et al. (1985), the Russell measure of Färe and Lovell (1978), the range-adjusted measure of Cooper et al. (1999) or the geometric distance function of Portela and Thanassoulis (2002) (for a survey, see Fried et al., 2008).

The non-oriented measures are non-radial measures of efficiency. This branch of measures does not preserve the input-output mix in the efficiency score. This contrasts to the input- and output-oriented measures which are typically radial measures of efficiency. In a radial approach, the input-output mix is preserved. In most situations, a radial efficiency score is easier to work with (De Borger and Kerstens, 1996).

If panel data are available, it could be worthwhile to examine the efficiency in the larger panel dataset. In contrast to a cross-section analysis (only variables for one specific year), more observations will be available as typically the observations are evaluated against their previous performance. To handle panel data non-parametrically several procedures have been developed. First, there are the productivity measures such as the Tornquist index, the Fisher index or the Malmquist index (Cooper et al., 2004). The Malmquist index differs from the others because it decomposes efficiency changes into productivity growth (i.e., best practice frontier improvements) and efficiency growth (i.e., changes relative to best practice frontier). Malmquist indices can be bootstrapped to obtain statistical inferences (Simar and Wilson, 1999). Second, in sequential methods the entity is assessed against all entities (including itself) in the current period and in all periods before. As such, sequential models reflect their history (see Grifell-Tatjé and Lovell, 1999). However, sequential models suffer from sample size problem as the number of potential reference units changes as time progresses. The average and individual efficiency scores will decrease if the number of observations in the sample increases (Zhang and Bartels, 1998), which happens in sequential models if time progresses. As alternative to sequential DEA, dynamic DEA (Emrouznejad, 2003 and Emrouznejad and Thanassoulis, 2005 and 2010) and network DEA (Chen, 2009) can be used specially for entities with capital input or when the data include inter-temporal input/output variables.

A third procedure to handle panel data is a “window analysis” (Cooper et al., 2004). The procedure works in manner analogues to ‘moving averages’ as the evaluated observation in period t is evaluated with observations from period t-s to period t+s (with s the size of the window for which normally a sensitivity analysis is performed). Obviously, the best procedure to handle panel data depends on the research question and on the available data (see also Fried et al., 2008 for an extensive discussion).

Finally, once the various decisions on the model specifications are taken, these are combined and the model is run. In the final description, it is important to justify each of the previous phases (e.g., why did the researcher opt for a VRS model with input-orientation in a window analysis sample). The efficiency scores are initially reported and for each of the observations the weights, targets and slacks are carefully examined.

Besides evaluating heterogeneity, (one-stage) bootstrap procedures are applied to obtain statistical inference (Simar and Wilson, 1998). In particular, the bootstrap estimates the noise (and bias) which arises from using the observed sample. By estimating the bias between the ‘true’ unobserved variables and the ‘biased’ observed variables, biased-corrected efficiency estimates can be obtained. By bootstrapping procedures also standard deviations and confidence intervals can be computed. This allows the researcher to report statistical inferences on the estimates.

Finally, the evaluation phase is concluded by setting some list of possible actions for further improvement. If necessary, the researcher has to start again in the first phase and check again each of the sub-phases. Only when this loop of continuous improvements is finished, the next phase can be started.