MY GOODNESS - AND YOURS
A history, and some possible futures, of DALY meanings and valuation procedures.
Erik Nord, PhD, National Institute of Public Health, Oslo, Norway.
Paper for WHO’s Global Conference on
Summary Measures of Population Health, Marrakech, Dec 6-9 1999
There is a somewhat disturbing lack of continuity in the thinking about valuation in the Global Burden of Disease Project. The change from a rating scale to the person trade-off in 1995 seemed to be based on careful reflection at the conceptual level: DALYs are supposed to inform resource allocation decisions. Therefore the person trade-off was chosen, since this technique consists in asking resource allocation questions. The essence of the person trade-off is that it captures not only concerns for producing as much health as possible, but also societal concerns for fair distribution of health care. Later events showed that the specific version of the person trade-off that that was designed in the GBD project (PTO1 and PTO2) was unfortunate. This, however, does not in itself give reason for changing the underlying concept or aim of the DALY measure from societal value back to individual utility, as is now being suggested by Chris Murray and colleagues. I personally think that DALYs de facto will tend to be used as measures of societal value, i.e. of worthiness of receiving resources. For this reason I am inclined to think that disability weights should incorporate societal concerns for fairness. I indicate possible ways of doing this.
Regardless of how disability weights are determined, DALYs face a problem when gains in life years are concerned. If value is measured in terms of DALYs gained, the value of life extension in chronically ill or disabled people is smaller than the value of life extension in otherwise healthy people. This is in most people’s eyes unethical and offensive in the same way as PTO1 is unethical and offensive. This critique applies as much to QALYs as it does to DALYs. A proposed solution is to count each lost or gained life year as 1 irrespective of disability as long as life is preferred to being dead by the person concerned. This is already the case in the calculation of burden of disease in terms of DALYs lost. In my view it should also be the case in the valuation of health programs in terms of DALYs (and QALYs).
Summary measures of population health are measures that combine information on mortality and non-fatal outcomes to represent the health of a particular population as a single number.
QALYs and DALYs are examples of such measures.
This paper is written for the session on ’Goodness: conceptual and ethical issues’ at the Conference. The session is supposed to address the following issues:
The session comes before sessions on ’Social value choices’ and ’Fairness and Equity’. It is not clear to me that these are entirely separate issues. I think what I have to say touches on all three of them.
I concentrate on burden of disease and health outcome measurement in terms of DALYs.
I first distinguish between goodness in terms of objective health gain, individual utility and societal value. I then show the role of these concepts, and their operationalisation, in the history of DALYs generally and in an ongoing European DALY project specifically. I proceed to discussing the latest proposal for defining health in the GBD project (Murray, Salomon and Mathers, 1999), and its possible implications for procedures for determining disability weights. I discuss pro and cons of different candidate procedures. Finally I argue against the use of disability weights in evaluations of programs that extend lives rather than increase quality of life.
Goodness can be discussed with respect to a given state of affairs in population health or with respect to improvements in population health. Since ’benefit’ is a key word in the session I for simplicity focus on the latter in the following.
It is common to distinguish between three kinds of good or goodness in health care:
Objective health gains: Gains in functioning or life expectancy.
Individual utility: Individuals’ valuations of health improvements for themselves (determined by their subjective perceptions of quality of life and personal valuations of gained life years).
Societal value: Judgements by representatives of society at large of the relative goodness of different health programs (determined by objective health gains, gains in subjectively perceived quality of life and concerns for fairness and equity across individuals).
There is also a fourth concept: Individuals’ valuations of health programs from behind a so-called ’veil of ignorance’about consequences for themselves. This may be seen as a mixture of individual utility judgement and societal value judgement. For a thorough discussion, see Menzel, 1999. I return to this below. Work hitherto on summary measures of health has mainly focused on the former three concepts of goodness, and I start by making some general observations about these.
Quantities and values
Some objective health gains may be compared in terms of size. For instance, ten gained life years is twice as much as five gained life years, and ten life years gained in a healthy person may be said to be a bigger health gain – meaning a bigger production of ’well-life’ - than ten life years gained in a disabled person.
On the other hand, far from all objective health gains can be compared in terms of size. One cannot for instance say that getting one’s eye sight back is a ’bigger’ or ’smaller’ gain than getting one’s hearing back, or ’bigger’ or ’smaller’ than getting to live five extra years. Nor can one say that a movement from ’bedridden’ to ’bound to a wheel chair’ is ’bigger’ or ’smaller’ than a movement from ’bound to a wheel chair’ to ’dependent on crutches’. To make such comparisons, one has to look at the value of the gains – to the individuals concerned (=individual utility) or to society (=societal value). Both these concepts of value are amenable to quantification. This can be done at an ordinal level of measurement, for instance by means of a rating scale, or at a cardinal level of measurement by means of techniques like the standard gamble, the time trade-off, the person trade-off and willingness to pay (all of which measure individuals’ willingness to sacrifice some ’countable’ good in order to obtain a given health outcome).
There is no simple linear relationship between ’quantifiable’ objective health gains and individual utility. Ten extra years may be valued less than twice as much as five added years (in terms of willingness to sacrifice to obtain them), and disabled people may value avoided premature death (= gained life years) just as much as able people do.
There is also no simple linear relationship between individual utility and societal value. For given gains in health and utility, the general public tends to value programs more the more severe the initial state of illness is (Nord, 1993; Callahan, 1994; Richardson, 1997; Ubel, 1999). There is also evidence that society may not want to discriminate very much between groups of patients who are equally ill, but have different potentials for becoming better (= obtaining utility), as long as significant improvements can be provided to each of them (Daniels, 1985; Nord, 1993b; Nord and Richardson et al, 1995; Pinto and Perpinan, 1999).
There is agreement that ’because of their potential influence on international and national resource allocation decisions, summary measures must be considered as normative measures’ (Field and Gold, 1998; Murray, Salomon and Mathers, 1999).
Taken together, these observations suggest that summary measures of population health (somewhat paradoxically) cannot be based on the measurement of health as ’objective functioning’. They must be based either on the concept of individual utility or on the concept of societal value (or as we shall see later, a mixture of these), and it makes a difference which of them is chosen.
In the history of DALYs, views on this choice have changed over time.
The GBD valuation history
In the original DALY work, it seems that disability weights were meant to express the ’badness’ of different states of illness (i.e. the loss of quality of life) to the individuals concerned (= individual disutility). The weights were then obtained by asking panels of people to locate selected states of illness on a rating scale from zero to unity (Murray, 1994).
Since one of the purposes of DALYs was to inform resource allocation decisions, it was later felt that disability weights should encapsulate not only individual (personal) quality of life considerations, but also a broader set of societal distributive values (Murray, 1996). For this reason (as I understand it), the valuation protocol was changed entirely in 1995, and the rating scale was replaced by the person trade-off (PTO) technique. Preliminary PTO-valuations for 22 indicator conditions were subsequently published by Murray and Lopez (1996), and the 1995 protocol has since been used in various settings in the GBD enterprise.
Independently of the GBD enterprise, the 1995 PTO-protocol (with some minor modifications) was used in a fairly large pilot study in Holland with a view to estimating disability weights for the most common diseases in that country (Stouthard et al, 1997). A grant was later obtained from the European Union’s Biomed Program to proceed with a comparative European study, still using in the main the 1995 protocol, in collaboration with researchers from six other countries (van der Maas, 1997).
At the first meeting in the European study in Rotterdam in May 1998 there were widespread concerns about the specific versions of the person trade-off technique adopted in the 1995 DALY protocol and the Dutch pilot study (Arnesen and Nord, 1999). The protocol includes two different ways of asking person trade-off questions - ’PTO1’ and ’PTO2’. PTO1 basically asks: If program A can extend the life of 1000 healthy people, and program B can extend the life of N people with a given disability (e.g. blindness), how big must N be for you to be indifferent between the two programs? Only responses higher than ’1000’ are accepted. There is in other words an underlying assumption that the value of extending the life of disabled or chronically ill people is less than the value of extending the life of healthy people. Researchers in the European project rejected this idea as unethical. PTO2 , on the other hand, compares health improving programs with life saving ones. There is nothing unethical about PTO2, but participants in the European project found it difficult to understand. The European research teams were thus forced to go back to the drawing board.
Following the strong reactions to PTO1 and PTO2 in the European project, Murray, Salomon and Mathers (1999) argue that the change in 1995 in the construction and perspective of DALYs may have turned it into a too complex concept, with a less clear, common sense meaning. They suggest that perhaps it is better after all to keep the DALY as a summary measure of health only and to ’distance the development of this measure from the complex value that must be considered in the allocation of scarce resources’ (p.8). This again could mean that the current method for establishing disability weights will have to be modified or replaced by a new one (p.15).
In the European project, a different phrasing of the person trade-off question was developed that overcomes the problems of both ethics and understandability in the DALY protocol of 1995 (’PTO3’). The format and contents are essentially as follows:
Imagine that you are a decision maker. You have a choice between two programs that will reduce the incidence of disease in a few years from now.
Program A will prevent the occurrence of a rapidly fatal disease in 100 people in your country.
Program B vil prevent the occurrence of disease X (chronic state described in detail) in N people in your country.
The programs are in all other respects equal.
Choose the value for N that would make you indifferent between the two programs. When answering, please disregard possible economic aspects.
This phrasing maintains the societal value perspective. The prevailing inclination among researchers in the European project is to use a phrasing like this as the main valuation technique in a series of valuation panels in their respective countries next year.
Update Oct 18 2000: The European project ended up doing both person trade-off and time trade-off valuations and in a decision to base DALY calculations in an initial joint publication on time trade-off values. Results were presented at the EUPHA meeting in Paris in December 2000.
The current proposal from the GBD-project
Murray, Salomon and Mathers (1999) tentatively propose a specific way of defining health that arguably would allow disability weights to be estimated without societal distributive concerns being invoked. The proposal is based on an application of the principle of a veil of ignorance (Harsanyi, 1953). In this construct, an individual faces a choice of being one of the individuals in either population A or population B. He does not know which individual he would be in either of these two. Murray et al propose that population A should be deemed healthier than population B if and only if an individual behind a veil of ignorance would prefer to be in population A rather than in population B, when all non-health characteristics of A and B are the same (p. 9).
Murray et al presumably have in mind the preference of not only one individual, but rather the collective preference of all those who actually would be either in population A or B. Determining this collective preference is in itself not without difficulties, but I leave that issue aside here.
It should be noted that Murray et al’s proposal is in terms of individual preference. Preference follows from valuation. The proposal is thus in agreement with my own suggestion above that summary measures of health should be based on assignments of value rather than on measurements of objective functioning.
Since the purpose of the proposal is to avoid the inclusion of societal distributive concerns in a summary measure of health, Murray et al must be assuming that individual preferences expressed behind a veil of ignorance reflect self-interest only. Under this assumption, Murray et al are in effect saying that summary measures of health should be understood in terms of individual utility. More precisely, they are suggesting that if population A scores higher than population B on a summary health measure, the meaning should be that expected health related individual utility is higher in A than in B. They stress that the procedure for establishing disability weights should be consistent with this interpretation.
I think the proposal of Murray et al is an interesting one, and I can see a fairly straightforward way of operationalising it. Suppose one wants to obtain a disability weight for severe asthma. Subjects can then be faced with two hypothetical cohorts A and B of 100 people each:
Asthma at 40 (at specified level
Each subject can then be asked: B ehind a veil of ignorance, to which cohort would you rather belong? One can then change the number of asthma cases until the subject is indifferent. If the median indifference number in an appropriate sample of subjects is in fact 20, the disability weight for asthma will be 5:20 = 0.25.
This operationalisation may be seen either as a variant of the person trade-off technique or as a probability trade-off resembling the standard gamble technique (Menzel, 1990; Menzel, 1999; Nord, 1999). Whether it should be seen as the former or the latter depends on the considerations that people take into account when they respond.
Murray et al are assuming that by using the veil of ignorance construct in their definition of health they exclude societal concerns for fairness and thus capture individual utility only. But the veil of ignorance construct does not necessarily work that way. Suppose in the asthma example above that a subject is indifferent between A and B. This could be because he is a maximiser of expected utility and thinks the disutility to him personally of severe asthma is one fourth of that of fatal disease. He would then be thinking in terms of personal risk, and the measurement would in effect capture a probability trade-off. But his indifference could also be the net result of two considerations: Personally he thinks the disutility of fatal disease Y is less than four times that of severe asthma, but on the other hand, being a person with a social conscience, he prefers to belong to the cohort where the distribution of health is more equitable. In the example, he might feel that that is cohort A. Both aspects considered, he is indifferent. The judgement of personal value (utility) then mixes with concerns for equity, and the response resembles more a person trade-off.
This example is more than a theoretical speculation. In an Australian study, preferences for belonging to a fair system seemed to override interests in utility maximisation when people were asked what role treatment costs should have in priority setting across patient groups (Nord and Richardson et al, 1995). Similarly, preferences for letting severity of illness strongly influence priority setting (as opposed to setting priorities on the basis of expected benefit only) seem to be strong even when people are asked behind a veil of ignorance (Nord, 1994; Richardson and Nord, 1997; Ubel, 1997).
The point is that in many, perhaps most countries, people are inclined not to think only about themselves when they think about health policy or ’which population they would prefer to belong to’. To avoid ambiguities and difficulties of interpretation, the veil of ignorance approach therefore needs some further specification. To achieve the goal of Murray et al, the instructions to the valuation procedure outlined above could for instance say that ’in this thought experiment we ask you to think only of your own self-interest and to have in mind how different kinds of health problems would affect quality of life for you personally’. This seems quite feasible. Alternatively, subjects could be instructed to think of both self-interest and concerns they might have for fairness and equity. The operationalisation above could then be seen as an alternative to PTO3 in asking about societal value. For a more detailed discussion, see Menzel 1999.
Comparison of methods
With the above operationalisations there seems to be (at least) three interesting candidate procedures for establishing disability weights in future Global Burden of Disease statistics: A revised person trade-off with an explicit societal perspective (PTO3 above), a veil of ignorance approach with an individual, self-interest perspective and a veil of ignorance approach with a broader ethical perspective. The following are some points to consider in a choice between these three. Some of them are factual or logical, others are merely my own subjective impressions and views.
1. I cannot see any ethical problems with either of the approaches.
2. All three valuation tasks seem quite easy for subjects to understand.
3. I agree that a summary measure of population health should as much as possible accord with ordinary people’s use of the word ’health’ (= common sense). It is possible that the introduction of societal concerns for fairness in a summary measure may run counter to this goal. On the other hand, it is not clear that ordinary people will immediately nod their heads understandingly when they see health defined as above by means of the veil of ignorance construct.
4. One might argue that ordinary people will never understand summary measures of health, and that these are primarily aids for reasonably well educated and informed policy makers. If this is true, it is the common sense of the latter that is of interest. Murray et al are then in effect suggesting that population health understood as ’average population health’ is more in keeping with policy makers’ common sense than ’average health adjusted for skewedness in distribution’. I am not so convinced that this is true in most countries. Are we dealing with culture specific values and a myth of objectivity here?
5. The previous point highlights the value side of DALYs. Murray et al argue that ’we can quite reasonably choose to measure population health in one way and conclude that scarce resources should not be allocated simply to maximize population as so measured’ (p.9). On the other hand, they previously stress that ’summary measures must be considered as normative measures’ and that ’great care must be taken in the construction of summary measures precisely because they have far reaching effects’ (p.4). So perhaps the former point is more a theoretical than a real one. It will be tempting to use DALYs as a measure of benefit in cost-effectiveness analysis. Perhaps they should then not fail to incorporate concerns for fairness.
6. There is a somewhat disturbing lack of continuity in the thinking about valuation in the Global Burden of Disease Project. The change from a rating scale to the person trade-off in 1995 seemed to be based on very careful reflection at the conceptual level: What are DALYs supposed to measure, given their purported use in health policy making? Later events showed that the specific operationalisation that was then chosen (PTO1 and PTO2) was unfortunate. This, however, does not in itself give reason for changing the underlying concept or aim of the measure from societal value back to individual utility. PTO3 was developed to cover what PTO1 and PTO2 were meant to cover, and thus to be in keeping with the idea of the 1995 DALY. A veil of ignorance approach that draws attention to distributive concerns in addition to self-interest would also be in keeping with this idea. One would perhaps expect that the feasibility of these approaches be carefully examined before a more fundamental conceptual change in burden of disease measurement is proposed.
Assigning value to life in measures of health
Regardless of how disability weights are determined, DALYs face a problem when losses and gains in life years are concerned. An example will demonstrate this. Consider a person who lives with a disability that has a weight of 0.2, for sixty years and then dies. The burden of disease is 20 DALYs + 60x0.2 DALYs = 32 DALYs. That may be fair enough. But assume that the premature death is prevented. The benefit is 20x0.8 = 16 DALYs. This is problematic in two ways. First, the value of life extension is smaller than the loss caused by early death (16 versus 20). This seems illogical. Second, the value of life extension in a moderately disabled person is smaller than the value of life extension in an otherwise healthy person (16 versus 20). This is in most people’s eyes unethical and offensive in the same way as PTO1 is unethical and offensive.
This critique applies as much to QALYs as it does to DALYs. A proposed solution is to count each lost or gained life year as 1 irrespective of disability as long as it is preferred to being dead by the person concerned (Menzel et al, 1999; Nord et al, 1999). This is already the case in the calculation of burden of disease in terms of DALYs lost. In my view it should also be the case in the valuation of health programs in terms of DALYs (and QALYs). Both the ethical and the inconsistency problem would then be overcome.
I personally think that DALYs de facto will tend to be used as measures of societal value. For this reason I am inclined to think that disability weights should incorporate societal concerns for fairness. I definitely think that their use in evaluating life extending programs should be avoided.
Arnesen T, Nord E. The value of DALY life. British Medical Journal 1999,319,1423-5.
Callahan D. Setting mental health priorities: Problems and possibilities. The Milbank Quarterly 1994,72,451-470.
Daniels N. Just Health Care. Cambridge, MA: Harvard University Press, 1985.
Field M, Gold G (eds). Summarizing population health: Directions for the development and application of population metrics. Institute of Medicine, Washington D.C. National Academy Press 1998.
Harsyani JC. Cardinal utility in welfare economics and in the theory of risk taking. Journal of Political Economcy 1963,61,434-435.
Menzel P. Strong Medicine. New York: Oxford University Press 1990.
Menzel P. How should what economists call ’social values’ be measured? Journal of Ethics 1999, vol 3 no 3 (in press).
Menzel P, Marthe Gold, Nord E, Pinto, JL, Richardson J, Ubel P. Toward a broader view of values in cost-effectiveness analysis of health. Hastings Center Report 1999,29,7-15.
Murray C. Quantifying the burden of disease: the technical basis for disability adjusted life years. Bulletin of the World Health Organisation 1994, 72,429-445.
Murray C. Rethinking DALYs. In Murray C and Lopez A (eds). The Global Burden of Disease. WHO/Harvard University Press 1996.
Murray C, Salomon J, Mathers C. A critical examination of summary measures of population health. GPE discussion paper no 2 1999. Geneva: WHO.
Nord E. The trade-off between severity of illness and treatment effect in cost-value analysis of health care. Health Policy 1993, 24,227-238.
Nord E. The relevance of health state after treatment in prioritising between patients. Journal of Medical Ethics 1993b, 19,37-42.
Nord E. Workshops on a value table for prioritising in health care. Working paper 1/1994. Oslo: National Institute of Public Health, 1994 (text in Norwegian).
Nord E. Cost-value analysis in health care. Making sense out of QALYs. Cambridge University Press 1999.
Nord E, Richardson R et al. Maximising health benefits versus egalitarianism: An Australian survey of health issues. Social Science & Medicine 1995,41,1429-1437.
Nord E, Richardson J et al. Who cares about cost? Health Policy 1995,34,79-94.
Nord E, Pinto JL, Richardson J, Menzel P, Ubel P. Incorporating societal concerns for fairness in numerical valuations of health programmes. Health Economics 1999, 8, 25-39.
Pinto JL, Perpinan JMA. Health state after treatment: A reason for discrimination. Health Economics, 1999 (in press).
Richardson J. Critique and some recent contributions to the theory of cost-utility analysis. Working paper 77. Melbourne: Centre for Health Program Evaluation.
Richardson E, Nord E. The importance of perspective in the measurement of quality adjusted life years. Medical Decision Making 1997,17,33-41.
Stouthard MEA, Essink-Bot M-L, Bonsel GJ et al. Disability weights for diseases in the Netherlands. Rotterdam: Erasmus University, Department of Public Health 1997.
Ubel P. How stable are people’s preferences for giving priority ot severely ill patients? Social Science & Medicine 1999,49,895-903.
Van der Maas P. Disability weights for diseases in Europe. Project programme. Mimeo. Rotterdam: Erasmus University, Department of Public Health 1997.
Acknowledgement: I am grateful to Paul Menzel for helping me clarify concepts in this paper.