ORGAPET Section D1:
Integrating and Interpreting Results of Evaluations
Nic Lampkin
Aberystwyth University, UK
Version 6, April 2008
Many of the evaluation tools
presented in ORGAPET have been developed in the context of relatively simple
programmes where the use of one or a few indicators presents no major problems.
The challenge with complex policy programmes, such as organic action plans with
their multiple objectives, action points and policy instruments, as well as
multiple stakeholders and beneficiaries, is to reach a conclusion that reflects
all the different elements fairly and appropriately.
This section examines how a diverse range
of results can be integrated to provide an overall assessment or
evaluative judgement of action
plans, allowing for trade-offs and conflicts between objectives and the differing
priorities of stakeholders. A range of approaches, from
stakeholder feedback and expert judgement to formal methods such as
multi-criteria and cost-benefit analysis, building
on both the qualitative information and quantitative indicators, are described.
D1-2
Issues to be considered
It is important when synthesising the results of an
evaluation to remember the original purpose of the evaluation (formative or summative) and the stage in the policy cycle (ex-ante, mid-term,
ex-post),
as this will have a significant influence on the way results are interpreted.
Key issues that may need to be considered include:
-
Is the quality of the evaluation
acceptable so that it can provide a sound basis for learning and future
actions?
-
How can the indicator results be
interpreted and do all stakeholders perceive the results in the same way?
-
How can consensus-building
approaches be employed to bring the different stakeholder
perspectives together?
-
Can combinations of indicator
results provide greater insights?
-
Were certain policy instruments
more cost-effective than others? (i.e. would the same amount of resources
achieve greater results when used in another way?
-
Were the overall environmental and
economic benefits positive?
-
Were the results obtained actually
due to the policy programme?
-
What external factors (economic or
policy shocks, animal health epidemics) might have influenced the outcomes
significantly?
-
In the case of EU policies, how
might national policies in the same area have affected the outcome?
-
What would have happened if there
had been no action plan in place (the
counter-factual situation)?
-
Are there unmet needs that still
need to be addressed?
-
How can the impact of the evaluation itself
on achieving change, learning etc. be assessed, and did this reflect stakeholder
goals and expectations?
D1-3
Interpreting and comparing
indicator results
Indicators can rarely be interpreted in isolation and need
to be looked at in a comparative framework in conjunction with qualitative
findings. It is necessary to consider the context as a whole, the factors which
help to facilitate or hinder the performance of the programme, the
rationales of the programme and
the process of implementation.
Evalsed provides
some indications as to how the process of
interpretation and comparison can be approached. Indicators can be compared
with context or programme indicators,
a procedure that can be carried out concisely by expressing the result as a per cent of the
relevant context indicator (see
Section C3 for examples). Comparison with other related indicators may also
be worthwhile: for example, if planned expenditure, committed expenditure and
actual expenditure are looked at together, is there evidence of implementation
failure that might not be noted when the indicators are examined individually.
The results and conclusions can be summarised in a
synoptic table (Table D1-1), where policy measures or actions are
shown in individual rows and the impacts (e.g. economic, environmental) are identified in columns. The available qualitative/quantitative data and
(provisional) conclusions can be summarised in each cell.
Table
D1-1: Extract from synoptic
presentation of conclusions relative to an impact
Measure
|
Description of the equal opportunities
impact |
1 |
Training intended for the long-term
unemployed |
The training had a positive impact in
terms of orientating women towards secure jobs traditionally occupied by
men |
2 |
Aid for new business creation |
45% of the new businesses were created by
women, as compared to the national average which is 30%. The difference
can be imputed to the programme. |
|
Other measures |
Etc. |
Source:
Evalsed
Whilst it might be tempting to combine the results from
several indicators into a single overall score or index, this is not advisable
as important details may be lost, and the weightings used (if any) are likely to
reflect only one particular perspective amongst many.
An alternative way of dealing with multiple indicators,
especially when comparing different policy options, is to visualise them using
radar or ‘cobweb’ diagrams (Figure D1-1).
One policy option may perform very well with respect to one indicator but
relatively poorly with respect to the others, while another option may not score
so well on that particular indicator but a higher score achieved for the others
gives the better result overall.
Figure D1-1: A hypothetical radar or ‘cobweb’ diagram

When considering performance or
cost-effectiveness, certain policy
interventions may appear to perform less well than others, but there is also the need
to consider how difficult the specific environment is where the
intervention is being applied, and the relative size of the effect that is being
achieved. For example, schools in poorer areas may achieve lower grades than
schools in richer areas for the same investment but, if they are starting from a
lower position, the value added could be greater. This may be resolved by a
benchmarking approach using a cluster of similar
situations as the basis for comparison.
Evalsed also
identifies how indicators can be used to
improve management. Here the focus is on managing performance rather than resources: the focus of assessments is
not on controlling the use of resources and the output
achieved, but on overall performance in terms of results and
impacts. Using this performance management approach, operators are
endowed with greater autonomy in the use of their resources. In
return, they commit themselves to clearer objectives as regards
the results and impacts to be obtained. The indicators are used to measure
their performance in this context.
However, the use of indicators in this way may be limited
by potential
adverse effects, which include:
-
Skimming-off or
creaming,
-
Convergence to the
average,
-
Unanticipated
effects where results are subordinated to indicator scores.
Skimming-off or creaming effects can occur
when organisations preferentially select beneficiaries who are
most likely to provide good results or high indicator scores.
For example, high examination performance grades can be achieved
if only the best students are allowed to be entered. This effect
is undesirable because it focuses assistance on those who are in
relatively less need.
Convergence towards the average can occur
if undue weight is given to poorly performing areas of activity.
If resources are moved from better performing areas of activity
to try to improve the poorly performing ones, improvement at the
bottom end may be achieved at the cost of reduced performance at
the top end, resulting in convergence towards the middle, rather
than focusing on excellence.
Unanticipated effects can occur where
indicators reward undesired results, or where operators work to
deliver the indicator rather than the objective that the indicator is
intended to reflect. For example, if the objective is to provide
information to a target group but the indicator focuses
strongly on the number of meetings to be held, then other forms
of communication may be ignored if the operator adheres strictly
to the target number of meetings.
Adverse effects inevitably appear after
a system of indicators has functioned for two or three years, no
matter how well it is designed. These undesirable effects are
generally not foreseeable, but the possible appearance of
such effects should not be an argument for refusing to
measure performance. It is possible to minimise adverse effects,
either by amending the indicator causing the problem, or by
creating a procedure for interpretation of the indicator by
expert panels. It is then important to watch out for the
appearance of adverse effects and to correct the system when
these effects appear.
The consideration of indicators so far has focused on
specific issues of interpretation that might be addressed by the evaluation team or
programme managers. However, their interpretations may be very different from
those of beneficiaries and other stakeholders. In order to address this, one
option is to present the results to the action plan steering group or a special workshop of
stakeholders, and to invite comments on the interpretations that have been made.
There is, of course, no guarantee that the different perspectives that might be
presented will be taken up by the evaluators in the final report. There is also
a potential conflict between the need for the evaluation team to be impartial
and the perception that stakeholder views may be partial, i.e. be focused on
their specific (particularly business or political) interests.
An alternative approach
to constructing a synthetic judgement of the programme being evaluated is to use
specially constituted
expert panels,
through procedures similar to those described in ORGAPET Sections A4 and
C4. In this context, however, the
expert panel is not being used to develop policy proposals or evaluate
impacts in the context of individual indicators, but to collectively produce a
value judgement on the programme as a whole. Expert panels are used to reach
consensus on complex and ill-structured questions for which other tools do not
provide univocal or credible answers. They are a particularly useful tool in
relation to complex programmes, when it seems too difficult or complicated, in
an evaluation, to embark on explanations or the grading of criteria in order to
formulate conclusions. Expert panels can take account of the quantitative and
qualitative information assembled as part of the evaluation, as well as the
previous and external experiences of the experts. The practical steps involved
in setting up expert panels are outlined in
Evalsed.
The experts should be chosen to represent all points of view
in a balanced and impartial way. These experts are independent specialists,
recognised in the domain of the evaluated programme. The
definition of expert can include
stakeholders that meet pre-defined selection criteria. They are asked to examine
all data and analyses made during the evaluation, in order to
highlight areas of consensus on the conclusions that the evaluation must draw and,
particularly, on the answers to the key evaluative questions. The panel does not
fully explain its judgement references nor its trade-off between criteria, but
the credibility of the evaluation is guaranteed by the fact that the conclusions
result from consensus between people who are renowned specialists and represent
the different 'schools of expertise'. The advantage of this type of approach is
that it takes account of the different possible interpretations of results
that might be made by different experts.
However, Evalsed also identifies
potential weaknesses of an expert panel approach. The experts must have extensive experience in the
field and, therefore, are at risk of bias and unwillingness to criticise the
relevance of objectives or to focus on any undesirable effects. Moreover,
the comparison of opinions often leads to the under-evaluation of minority
points of view. The consensual mode of functioning on which the dynamics of the
panel is based, produces a convergence of opinions around majority values which
are not necessarily the most relevant. To some extent the potential weaknesses
of expert panels can be avoided by taking precautions in the way they are
assembled and organised. This could include:
-
limiting its work to only a part
of the evaluation in order to ensure a clear focus and that its significance
will be recognised;
-
having a broad range of interests
represented, including independent experts who are objective;
-
using suitable
workshop facilitation
and consensus-forming techniques.
Evalsed provides an example of the use of
scoring systems in an expert panel context, whereby an evaluation
team had evidence of the impact of a
range of measures, but where the impacts
were not directly comparable and the opinion of an expert panel was
required. In a half-day seminar, the evaluation conclusions were
presented to the participants, measure
by measure, so that they could be validated and their credibility
verified. The experts were then presented with a synoptic table of conclusions
and each
participant was asked to, intuitively, situate the impacts of each measure on a
scale (from maximum positive impact, through neutral impact, to
maximum negative impact). The participants' classifications were
compared,
discussed with them and made to converge as much as possible. After the seminar, the classifications
were converted into scores ranging
from -10 (maximum negative impact) to +10 (maximum positive impact),
through 0 (neutral impact). The synoptic table was converted into a table of
ratings (impact scoring matrix). The construction of scoring scales
allows for comparisons to be made within the same column (e.g. one
particular measure has a better rating than another with respect to a
particular impact). On the other hand,
scores in different columns cannot be compared (e.g. a score of 5 for
employment is not comparable to the same score for an environmental
impact). Many of the procedures described here are similar to those used
in the more formalised Nominal Group Technique described in detail in ORGAPET Section C4.
D1-5
Formal methods
More
formalised techniques for making
evaluative judgements include multi-criteria,
cost-benefit and cost-effectiveness
analysis, as well as benchmarking and
Environmental Impact Assessment. The application of
many of these approaches to agri-environmental policy evaluation was reviewed
as part of an OECD-sponsored workshop (OECD, 2004). Some
involve the
allocation of monetary values to outcomes that are normally
unpriced (due to the absence of a market for the 'goods'),
potentially making them more difficult to apply. However, if it can be done,
it might be worth determining a measure of the return to resources invested in the action plan, provided that the analysis focuses on the multiple objectives
that organic farming and the organic action plan are seeking to deliver, not
just on a single measure or objective.
D1-5.1
Multi-criteria analysis
Multi-criteria analysis
(see also Bouyssou et al., 2006)
is a decision-making tool
used to assess alternative projects or heterogeneous policy measures, taking several criteria into account
simultaneously in a complex situation. The method is designed to reflect the
opinions of different actors – their participation is central to the approach.
It may result in a single synthetic conclusion, or a range reflecting the
different perspectives of partners. The approach described here was used as part
of the EU-CEE-OFP project to evaluate different organic farming
policies (Annex D1-1).
Many of the
stages in applying the multi-criteria analysis approach, in
particular the definition of the actions to be judged and the
relevant performance criteria, are similar to the procedures
outlined in ORGAPET Sections
C1 and
C2 for structuring objectives and defining indicators.
The key issue at the synthesis stage is how the weightings for (and trade-offs
between) performance criteria are determined by the evaluators and by the
stakeholders.
The first step is the
construction of a multi-criteria evaluation matrix which should
have as many columns as there are criteria and as many rows as
there are measures to be compared. Each cell represents the
evaluation of one measure for one criterion. Multi-criteria
analysis requires an evaluation of all the measures for all the
criteria (no cell must remain empty), but does not require that
all the evaluations take the same form, and can include a mix of
quantitative criteria expressed by indicators, qualitative
criteria expressed by descriptors, and intermediate criteria
expressed by scores (similar to a synoptic
table comparing measures and impacts).
The
relative merits of the different measures can then be compared
by one of two scoring techniques:
compensation or outranking. Outranking does not always
produce clear conclusions, whereas analysis based on
compensation is always conclusive. From a technical point of
view, the compensation variant is also easier to implement. The
most pragmatic way of designing the multi-criteria evaluation
matrix is for the evaluation team to design scoring scales to
all the evaluation conclusions, whether quantitative or
qualitative. The multi-criteria evaluation matrix is then
equivalent to the impact scoring matrix. Usually the
compensation method is used unless members of the steering
group identify a problem which might justify the use of the veto
system.
The next
step is to evaluate the impacts or effects of the actions in
terms of each of the selected criteria. If the compensation
method is used, the process involves allocating scores and a
simple analysis using a basic spreadsheet. For the outranking
variant, the approach will differ according to the
type of analysis. The process could be based on quantitative
data or undertaken, more subjectively, by experts or the
stakeholders of the evaluation themselves. In reality, the
technique usually combines factual and objective elements
concerning impacts, with the points of view and preferences of
the main partners or 'assessors' (e.g. evaluation steering
group, using individual or focus group interviews). The
assessors' preferences may be taken into account by:
-
direct expression
in the form of weighting attributed to each criterion (e.g.
distributing points in a voting system);
-
revealing
preferences by classification of profiles, where successive
pairs of profiles are presented, preferences for one
compared with the other in the pair are expressed as weak,
average, strong or very strong, and the results are analysed
using dedicated software;
-
revealing
preferences through the ranking of real projects, which may
be seen by participants as more realistic than the
classification of profiles approach.
In the final step,
computer software can be used to sort the actions in relation to
each other. A single weighting system
for criteria can then be deduced, or the evaluation team and
steering group can decide to establish average weightings which
has the effect of downplaying the different points of view among the
assessors. There are three different approaches to the
aggregation of judgements:
-
Personal judgements: the different judgement
criteria are not synthesised in any way.
Each participant constructs their own
personal judgement based on the analysis and
uses it to argue their point of view.
-
Assisting coalition: the different judgement
criteria are ranked using a computer
package. An action will be classified above
another one if it has a better score for the
majority of criteria (maximum number of
allies) and if it has less 'eliminatory
scores' compared to the other criteria
(minimum number of opponents).
-
Assisting compromise: a weighting of the
criteria is proposed by the evaluator or
negotiated by the participants. The result
is a
classification of actions in terms of their
weighted score.
It is now possible to
calculate global weighted scores for the
different measures. The results and impacts of
each measure will have been evaluated in
relation to the same criteria; all these
evaluations will have been presented in the form
of scores in an impact scoring matrix; and there is
a weighting system which expresses the average
preferences of assessors for a particular
criterion. The global score is calculated by
multiplying each elementary score by its
weighting and by adding the elementary weighted
scores. Based on weighted average scores, the
evaluation team can classify measures by order
of contribution to the overall success of the
programme.
The synthesised judgement
on the effectiveness of measures is usually
considered sound and impartial provided that:
-
the evaluation criteria have been validated
by the steering group;
-
the conclusions on the impacts of each
measure, as well as the impact scoring
matrix summarising them, have been
validated;
-
the weighting coefficients for criteria have
been established with the assistance of the
assessors and the agreement of the steering
group.
Experience also shows that
the partners are far more willing to accept the
conclusions of the report if the evaluation team
has recorded their opinions carefully and taken
the trouble to take their preferences into
account in presenting its conclusions. If, on
the contrary, the evaluation team chooses and
weights the criteria itself, without any
interaction with its partners, the impartiality
of the results will suffer and the
multi-criteria analysis will be less useful.
Cost-benefit analysis (CBA) (see also
Pearce et al., 2006) is a method of evaluating the
net economic impact of a public project which has some similarities to
multi-criteria analysis, but with the aim of expressing the
result in monetary terms.
Various techniques can be applied to the valuation of
non-financial benefits so that externalities can also be taken
into account. Projects typically involve
public investments but, in principle, the same methodology is
applicable to a variety of interventions, for example, subsidies
for private projects, reforms in regulation, new tax rates. CBA
is normally used in ex-ante evaluation to make a selection
between projects, typically of a large infrastructure nature. It
is not normally used to evaluate programmes and policies, even
though, in principle, it could be used to study the effect of
changes in specific political parameters (for example customs
tariffs, pollution thresholds, etc.).
Cost-effectiveness analysis (CEA)
(see also OECD, 2004 and
Annex D1-1) is a tool that can help to
ensure efficient use of resources in sectors where
benefits are difficult to value. It is a tool for the selection
of alternative projects with the same objectives (quantified in
physical terms). CEA can identify the alternative that, for a
given output level, minimises the actual value of costs or,
alternatively, for a given cost, maximises the output level.
This might, for example, be relevant if organic farming is being compared with
other agri-environment schemes in terms of biodiversity outputs and the costs of
achieving those outputs. CEA is used
when measurement of benefits in monetary terms is impossible,
where
the information required is difficult to determine, or where any attempt to make a precise monetary
measurement of benefits would be open to considerable
dispute. It does not, however, consider subjective judgements and is not
helpful in the case of projects with multiple objectives. In the
case of multiple objectives, a more sophisticated version of the
tool could be used, weighted cost-effectiveness analysis,
which gives weights to objectives in order to measure their priority
scale. Another alternative is a multi-criteria analysis. The
CEA technique, which looks at the cost of an intervention and
relates it to the benefits created, is also closely related to
the use of a Value for Money Assessment (though value for money does not
necessarily mean achieving outcomes at the lowest cost).
Adopted from the private sector,
benchmarking
has become an increasingly popular tool for improving the policy
implementation processes and outcomes of the public sector.
Benchmarking was originally developed by companies operating in
an industrial environment to improve competition and has
therefore been applied most widely at the level of the business
enterprise. The technique is based on the exchange and
comparison of information between organisations in a given
field, one or more of which is regarded as an example of good or
best practice. This is potentially relevant in a policy
framework, including organic action plans, where, for example,
comparisons are being made between countries or regions.
In certain situations,
Environmental Impact Assessment (EIA) may be relevant as a
method of assessing the environmental impact of a project before
it is undertaken. This might be relevant where a significant
capital investment in processing or distribution facilities
might be involved, in particular where there might be concerns
that the negative environmental impacts of a development of this
type might outweight the benefits to be derived from organic
land management/production of the raw materials. It is seldom
applied in a mid-term or ex-post situation, where
appropriate environmental indicators analysed using the other
techniques outlined above is likely to be more relevant.
An evaluation is incomplete if it only
includes monitoring results for a series of indicators. There is a need for
evaluative or synthetic judgements to be derived, a process which needs the
input of stakeholders and impartial experts, so that different perspectives on
interpreting the results can be considered. Where possible, a consensus on the
overall effect of the programme is desirable - this should also include the
answering of the key evaluation questions identified at the outset. To achieve
this, adequate resources need to have been allocated to the evaluation process
from the outset, to ensure monitoring systems can be put it place and to permit
the final stages of the evaluation to take place as outlined in this section.
It is pointless,
however, successfully completing an evaluation if the report is then filed away
and nothing is done with it. There is a need to reflect and act on the results
in an appropriate stakeholder context, such as that of an action plan steering
group, and
there is a need to be clear about who is responsible for
taking actions arising from the evaluation and for monitoring that the actions
have been taken. In an ex-ante or
mid-term review, this may involve adjusting objectives, improving monitoring
procedures, refining the measures or re-targeting resources. In an ex-post, summative
context, the emphasis might be more on highlighting best practice and the
general lessons learned (see ORGAPET
Section A5-4).
The results
of the evaluation also need to be communicated effectively, for example through seminars and
publications, to a range of groups:
-
Programme administrators, particularly where
adjustments to programmes are required or lessons need to be learned to avoid
implementation problems that may have arisen;
-
Beneficiaries and other industry stakeholders, to
demonstrate that lessons have been learned and that feedback has been taken
seriously and acted upon;
-
Policy-makers who may be involved in the design of
future programmes.
Ideally, the
impact of the evaluation itself on achieving change, learning etc. should be
assessed, including whether the evaluation reflected stakeholder goals and
expectations.
-
Has
stakeholder/expert input into the evaluative
judgements been included?
-
Are
formal methods for assessing the overall effects
relevant?
-
Have the
key issues to be considered identified above been
addressed?
-
Have the key
evaluative questions (defined as part of
the scope of the evaluation) been answered?
-
Has a process
been put in place to ensure that the results of the evaluation are communicated
and applied?
Bouyssou, D., T. Marchant, M. Pirlot, A. Tsoukiās and P. Vincke (2006)
Evaluation and decision models with multiple criteria: Stepping stones for the
analyst. International Series in Operations Research and Management
Science, Volume 86. Springer,
Boston.
Pearce, D.,
G. Atkinson and S. Mourato (2006)
Cost benefit analysis and the environment.
Organisation for Economic
Co-operation and Development, Paris.
OECD (2004)
Evaluating Agri-Environmental Policies: Design, Practice and Results.
Organisation for Economic
Co-operation and Development, Paris.
Annex D1-1 Application of multi-criteria analysis to the
evaluation of organic farming policies in the EU-CEE-OFP project