Expected Ratio of Relevant Units:
A Measure for Structured Information Retrieval
Benjamin Piwowarski Patrick Gallinari LIP 6, Paris, France LIP 6, Paris, France 1]. The expected search length [4] measures the amount Since the 60's, evaluation has been a key problem for In- of irrelevant documents a user will consult before finding formation Retrieval (IR) systems and has been extensively a certain amount of relevant documents. Some measures discussed in the IR community. New IR paradigms, like are based on the definition of a metric over some predefined Structured Information Retrieval (SIR), make classical eval- statistics [2, 15], some derive from rank correlation [10]. But uation measures inappropriate. A few tentative extensions the most famous measures in IR are recall and precision.
to these measures have been proposed but are also inade- Recall is defined as the ratio of the number of relevant doc- quate. We propose in this paper a new measure which is a uments that are retrieved to the total number of relevant generalisation of recall. This measure takes into account the documents. Precision is the ratio of the number of rele- specificity of SIR, when elements to be retrieved are linked vant documents that are retrieved to the total number of by structural relationships. We show an instantiation of this measure on the INEX database and present experiments toshow how well it is adapted to SIR evaluation.
Raghavan [12] proposed a probabilistic version of recall-precision, which is not inconsistent as standard precision/recallcan be, especially when documents are not fully ordered. We will not define more precisely their measure here. Instead, Information Retrieval systems aim at retrieving documents we will detail an extension of precision and recall in the case that are relevant to a given user information need. The of a non-binary relevance scale, as it was used to evaluate notion of relevance is not only not well defined and ambigu- Structured Information Retrieval systems in the 2002 INEX ous [13, 9], it is also user specific. The evaluation of IR workshop. This extension was proposed by Kek¨ systems appeared very early as a key problem of IR. Clever- arvelin [7]. In that case, the set R is defined in a fuzzy don experiments on the Cranfield collection [3] were the first way: a document can be more or less relevant. When the experiments that justified the development of entirely auto- document is highly relevant, it will be in the set of the rele- matic IR systems. Evaluation is useful for comparing differ- vant documents with a degree of 1. When the document is ent systems and is used to justify theoretic and/or pragmatic not relevant, it will be in this set with a degree of 0. Every developments of IR systems.
value between 0 and 1 will be a measure of the relevance ofthe document. This scale thus generalises the classic binary Many different parameters can be used in order to measure scale (relevant/not relevant) that is used in IR. Let us de- the performance of an IR system like for example time and note j(d) the degree with which the document d belongs to space taken by the system to answer the query and the user the relevant set of documents for a given query. Then, recall effort to find relevant documents. Swets [14] was the first and precision are computed as: to clearly define how a metric should be defined in order toprovide an objective evaluation of IR systems: a measureshould only reflect the ability of the system to discriminate relevant documents from irrelevant ones.
A number of hypotheses are also necessary (even if they are implicit) to develop evaluation measures. We can distin- guish two kinds of hypotheses: those which are necessary tothe computation of the measure and those which are priorson user behaviour. Examples of typical assumptions are the where N is the number of documents in the list, E is the set following: (1) the user follows the ordered list of retrieved of documents and L is the set of documents in the list. Those elements beginning with the first element; (2) a relevant two formulas generalise standard recall-precision: when j(d) document is still relevant even if the user has already seen takes only the values 0 or 1, they give the same results.
the same information in another document higher in the re-trieved list. We will make such hypotheses explicit when In this paper, we propose a measure to evaluate SIR systems.
describing our measure.
We will first introduce the new problem of SIR. We willshow how standard recall/precision have been extended to There are many different approaches for IR evaluation [15, evaluate such systems and why this is not well adapted to SIR evaluation. We will then introduce a new measure whichis related to the recall. We will compare our measure and precision/recall extension on stereotypical systems the corpus provided by INEX1.
if j ∈ {2E, 3L, 3S} if j ∈ {1E, 2L, 2S} Evaluation and Structured Information Re-
if j ∈ {1S, 1L} Atomic units are usually documents in classical IR. Withthe actual growth of structured documents 2, the atomic unit is no more the whole document but any logical element in the document. We will call such an element a doxel (for DOCument ELement) in the remainder of this paper. Com- pared to IR on unstructured collections, Structured Infor-mation Retrieval (SIR) should not focus on returning doc-uments but the smallest doxel that contains the answer to Table 1: Quantisations are used to convert an assess- the query. While that query can be only free text like in ment from the INEX scale JINEX to a binary or real standard IR (using the INEX terminology, those are Con- scale used to compute recall and precision. In INEX, tent Only queries, CO in short), a query can also specify two quantisations were proposed: fs is a "strict" both constraints on the structure and on the content (those quantisation, fgis a "generalised quantisation" are called Content And Structure queries, CAS in short).
We are interested in the evaluation of systems that answer ment component, but the component is too small to act as CAS and CO queries, but we will focus here mainly on CO.
a meaningful unit of information; finally, exact coverage (E) We will say that a good answer (the smallest doxel) is SIR- when the topic is the main theme of the doxel.
relevant to distinguish this notion from usual relevance.
The two dimensions are not fully independent: a non rel- Our work was greatly influenced by the recent INEX initia- evant element (0) must have no coverage (N). There are tive [6]. In this section, we describe briefly how SIR systems only 10 different values in this scale (and not 16). In the were evaluated in INEX 2002, which was the first initiative remainder of this paper, JINEX denotes this set of 10 val- where a corpus of assessed XML documents was built. We ues. Each of these values is a digit (relevance) followed by will show why the current evaluation methodology is not a letter (coverage). Thus, 2E means "fairly relevant with well suited for SIR.
exact coverage". Within this scale, the doxels that shouldbe returned by a perfect SIR system will be all the doxels Let us first describe the INEX scale used for the user as- with an exact coverage, beginning with those with high rele- sessments. This scale is neither binary, nor between 0 and vance: in the case of the INEX scale, SIR-relevant doxels are 1, but is two-dimensional. The first dimension is related those that have an exact coverage. Doxels with too small to the extent with which the element is relevant. The rele- or too big coverage in this scale are considered not relevant.
vance does not take into account the non relevant part of the The motivation is that exact doxels are the doxels a user doxel, even if that part is 99% of the doxel. For example, the is searching for, while "too small" doxels are contained in common ancestor of the whole database will be considered an "exact" doxel and "too big" doxels contain an "exact" as highly relevant even if only a small paragraph is highly relevant. In INEX'02, four levels of relevance were distin-guished: the doxel can be irrelevant (0) if it does not contain LIMITS OF CURRENT MODELS
any information about the topic of the request; marginally The first measure proposed in INEX 2002 was standard re- relevant (1) if it mentions the topic of the request, but only call and precision (i.e. using f in passing; fairly relevant (2) if it contains more information s , see table 1). In this case, only doxels with exact coverage and high relevance (INEX than the topic description, but this information is not ex- scale) are the relevant elements (for the binary scale). A sys- haustive; highly relevant (3) if it discusses the topic of the tem that does always returns a near match will have a recall and a precision of 0. This should be avoided since the taskcomplexity is very high. Moreover, when one is assessing The second dimension, coverage, is specific to structured the corpus one can find it difficult to give the exact match Document coverage describes how to one doxel rather than to a smaller one. For example, the much of the document component is relevant to the request list element in INEX often contains only one paragraph; the topic. Again, there are four levels: no coverage (N) when textual content of both elements (list and paragraph) is thus the query topic is not a theme of the document component; the same. It is impossible to make a choice and if we give too large (L) when the topic is only a minor theme of the an exact coverage to both, a SIR system will have to return document component; too small (S) when the topic or an both elements in order to have a perfect recall.
aspect of the topic is the main or only theme of the docu- In order to cope with that problem, G¨ overt [5] proposed to add some relevance to neighbouring doxels, using fg to 2Where the textual (or multimedia) content of the document convert an assessment from the INEX assessment scale to a is usually organised in a tree value between 0 and 1. A highly relevant doxel with an exact match will have a relevance of 1 in the [0, 1] scale. Some of A MEASURE FOR SIR
the doxel neighbours will also have a non null relevance: its We will suppose an ideal situation where assessments in ancestors – within the document boundary – will have a the INEX 2002 corpus strictly follow the definition of SIR- relevance of 0.75 (too big); some of its children will have a relevance (which is not the case). We will thus make the relevance of 0.25 (too small). Non relevant doxel will have following assumption that a SIR-relevant doxel can only a 0 value for relevance. This choice might seem better than contain SIR-relevant doxels that are less relevant or have the first one, but is still not adequate: a smaller coverage. This constraint states that the samerelevant information is assessed with "exact coverage" onlyone time.
• For every SIR-relevant doxel, there will be a new set of IR-relevant doxels. To give an example of what it In this section, we describe our measure, beginning with implies, consider a system that returns a doxel and some general hypotheses and its definition. Then we present two ancestors: this system will have a recall of 2.25, the probabilistic events and the assumptions we made on which is better than a system that returns two highly them, and finally we show how to calculate our measure.
• A system that returns all the SIR-relevant doxels will not be considered as having retrieved all the relevant The definition of a measure is based on an hypothetical user information: this system will not have a recall of 1.
behaviour. Hypotheses used in classical measures are sub-jective but do reflect a reality. In the SIR framework, wewill propose a measure that estimates the number of rele- Those problems are more connected to relevance assessments vant doxels a user might see. We will now describe how a for free text queries, where there is no constraint on the typical user behaves in the context of SIR retrieval. This be- structure of the retrieved doxels. Nevertheless, the case of haviour will be defined by three different aspects: the doxel structured queries can also be discussed. We will distinguish list returned by the SIR system, the structure of the docu- two different cases: ments and the known relevance of doxels to a query. Thefollowing hypotheses are similar to that supposed in classicalIR: • The topic formulation does not have any constraint that forbids a doxel and a sub-doxel (a doxel contained Order The user follows the list of doxels, beginning with in this doxel like e.g. a paragraph in a section) to the first returned. He never discourages himself nor be both retrieved like for example the query "find a does he jump randomly from one doxel to another; paragraph or a section that talks about cats". Re-call/precision are clearly not adapted to this case; Absolute relevance A doxel is still relevant even if the user has already seen another doxel that contains the • The topic formulation does not allow a doxel and its same (or a part of the same) information; sub-doxel to be both retrieved ("chapters that talkabout photography"). In this case, we can use stan- Non-additivity Two non relevant doxels will never be rel- dard (or generalised) recall and precision without hav- evant even if they are merged.
ing any problem.
The three last hypotheses are specific to our measure Classical measures require the definition of the typical be-haviour of a system user. This user consults the list of re-trieved doxels one by one, beginning with the first returned Structure browsing The user eventually consults the struc- doxel and continuing in the returned order. In the next tural context (parent, children, siblings) of a returned section, we propose a measure based on a specific user be- doxel. This hypothesis is related to the inner structure haviour, which takes into account the structure of the doc- uments. In particular, we integrated in our measure the Coverage influence The coverage of a doxel influences the fact that a user might explore the doxels which are near the behaviour of the user. If the doxel is "too large", then returned doxel in the structure.
the user will most probably consult its children. Ifthe doxel is "too small", the user will most probably In Web-based IR, classical precision/recall can be problem- consult the doxel ancestors; atic. Even if the problem is slightly different, some authorshave considered using the structural information (hyper- No hyperlink The user will not use any hyperlink. More links) of the corpus. For instance, Quintana, Kamel and precisely, he will not jump to another document. This McGeachy [11] proposed a measure that takes into account hypothesis is valid in the INEX corpus but can easily data on the displayed list of documents, on the user knowl- be removed in order to cope with hyperlinked corpora.
edge of the topic and also on the links between the docu-ments. They propose to estimate the mean time that a userwill spend before finding a relevant document. We follow The measure we propose is the expectation of the number somewhat the same approach. The main difference is that of relevant doxels a user sees when he consults the list of we rely upon a probabilistic model which makes our measure the k first returned doxels divided by the expectation of the sound and easily adaptable to new corpora.
number of relevant doxels a user sees if he explores all the Number of doxels in the list consulted by the For simplicity, we will now drop the query q from the for- mulas, as the measure is computed independently for every Number of SIR-relevant doxels that have been The doxel e is in the list consulted by the user The user has seen the doxel e (either in the listor by browsing from a doxel in the list) The following hypotheses are necessary for the computation The user sees the doxel e after he consulted the of the measure. Note that all these assumptions are made knowing the query q and the length of the list N . The firsttwo hypotheses are intuitive. The first hypothesis statesthat the relevance of a doxel does not depend on the fact the user sees it: P (Se ∧ Re) = P (Se)P (Re) doxels of the database. We denote this measure by ERR(for Expected Ratio of Relevant documents): The second states that the behaviour of a user (going from a doxel in the retrieved list to another doxel, e → e′) does not depend on the fact that the doxel e is in the list (Le): P (Le′ ∧ e′ → e) = P (Le′)P (e′ → e) This measure is computed for one query. The measure ERRis normalised (ERR ∈ [0, 1]) as E [NR/N = E ] represents The third states that the fact that events R or L that are the maximum number of SIR-relevant doxels a user can see related to different doxels are independent, and that in par- in the whole corpus. The measure can thus be averaged over different queries.
Se ∧ Le or ¬(Se ∧ Le) and Se′ ∧ Le′ or ¬(Se′ ∧ Le′ ) We now have to compute the expectation with the assumptions on the user behaviour we just made.
We will introduce some events that are used to formally This hypothesis has no intuitive meaning and has been in- model the user behaviour and will make some hypotheses on troduced only for allowing the measure computation. Nev- the (probabilistic) relationships between these events. The ertheless, it can be justified by those two statements: the three different probabilities we introduce are respectively re- relevance is assigned by the user and thus the probability lated to the assessments, to the retrieved doxels and to the of SIR-relevance does not depend upon the SIR-relevance of document structure. The set of events we use in this paper another doxel but on the user assessment (that is denoted is summarised in table 2.
by our event q). The second point is that the fact Se thatthe user sees a doxel e only depends on the fact that a doxele′ is in the list (which is known when we know the length of the list N which is the case here) and that the user moves Let us denote E the set of doxels, e or e′ a doxel from E and from a doxel e′ in the list to another doxel e.
q a given query. A doxel e can be more or less relevant withrespect to the query. We will denote the probability of SIR- The third hypothesis is also a simplification of reality, but is relevance of a given doxel by P (Re/q). The list returned as necessary as the two first. It is related to the probability by the SIR system is only partially ordered so that some Se that the user see a doxel e. The more the user can access rearrangements of the list are possible. Depending on the this doxel from the retrieved doxels by navigating along the length N of the list, a doxel is then consulted by the user document structure, the more "chanches" he has to see that with a probability P (Le/q, N = k).
doxel. As it is not possible to evaluate all the interactionsbetween previously seen doxels and this event, we make the When a user consults a doxel e′ from the list, he eventually hypothesis that correspond to the "noisy-or". This hypoth- will use the structure to navigate to another doxel e from esis is used to compute the probability of the logical impli- the document. As it is difficult to make this process deter- cation A1 ∨ · · · ∨ An ⇒ B as 1 − P (¬A1) . . P (¬An). We ministic, we will use P (e′ → e/q) as the probability that the user goes from e′ to e. Note that this probability dependsupon the query, this will be illustrated in the next sections.
e′ ∧ e′ → e)/N ´ e′ ∧ e′ → e)/N We will suppose that the IR user sees the doxel e iff: In this equation, we assumed that the event e → e is certain(identity move), that is P(e → e) = 1 as the logical or is • e is in the list; over all doxels in E.
• e′ is in the list and the user browses from e′ to e In this subsection, we describe how to compute the measure.
This event is denoted Se and we can write: We now have to derive this measure from the behaviour of Le ∨ (∃e′ ∈ E, Le′ ∧ e′ → e) ≡ Se a typical user. We will thus introduce a set of probabilities, each of which describes a part of the user behaviour. We will for the INEX database3, namely for a query the probability also make several hypotheses in order to make this measure P (Re) of relevance of a doxel and the probability P (e → e′) computable. We now describe several hypotheses that are that the user browse from a doxel to another.
related to the relevance assessments, to the returned list andto the structure of the documents Computing P (Re)INEX relevance assessments are given in a two dimensional We want to calculate E [NR/N = k], with 1 ≤ k ≤ E . We scale (coverage and relevance). For a given query, we will know that by definition, compute P (Re) as4: rP (NR = r/N = k) The user has seen r SIR-relevant doxels (N these two conditions are both met: (1) there exists a subset {e1, . . , er} ⊆ E of SIR relevant doxels that the user hasseen and (2) for every other doxel, either the doxel is not where j(e) is the assessment of the doxel e for the given SIR-relevant or the user has not seen it. If one considers the query in the scale JINEX. To avoid counting the same rel- set of all sets A that contains r doxels from E, this condition evant information twice, we will furthermore suppose that can be written formally as: the probability of SIR-relevance of a doxel is zero wheneverthe doxel has an ancestor that is relevant with exact match, if ∃e′, j(e′) ∈ {1E, 2E, 3E} and e′ is an ancestor of e Events for two different sets are exclusive and using hypoth-esis (H3) we can state that: Computing P (e′ → e)To compute the probability that the user jumps from a doxel to another, we will distinguish several relationships between those doxels. Formulas below were only justified by our in- P (Se ∧ Re/N = k) tuition and can easily be replaced by others. We will denote length(e) the length of doxel e. This length will usually be the number of words contained in the doxel. We will denote by d(e, e′) the distance between two doxels. We used the e ∧ Re )/N = k) number of words that are between those two doxels: for ex-ample, the distance between the last paragraph of section 1and the second paragraph in section 2 will be the number of This formula can be reduced, using the hypothesis H1 we words in the first paragraph of section 2 (plus the number of words of the section title). We can now give the formulas, distinguishing four different cases.
P (Se ∧ Re/N = k) e′ and e are not in the same document We made the hypothesis that the user does not follow any Using the definition of S P (e′ → e) = 0 e and the noisy-or hypothesis, we e′ is a descendant of e P ¬(Le′ ∧ e′ → e)/N = k´ We will suppose that the more e′ is an important part of e the greater the probability that a user goes from e′ to Note that E [NR/N = E ] can easily be computed as P (Se/N = e. e′ relevance has an influence on this probability: if the E ) = 1. Then, using hypothesis (H2), we finally obtain e′coverage is S (or better, E), the probability is higher: 1 − Q (1 − P (Le′ /N = k)P (e′ → e)) where a is 7 when the coverage is exact, 3 when the coverage is too small and 1 otherwise.
3Note one can use the same definitions for any corpus of In the last section, we derived the computation of the mea- sure ERR, but we did not instantiate it in a practical case.
4Other functions are of course possible, we chose one that We now propose a way to compute some of the probabilities seemed "reasonable" to us e is in e′ 1. The model perfect is not perfect for GRP. This can This is a symmetric case. The only difference is the coverage be seen as it is not the best model and as precision influence: a is 7 when the coverage is exact, 3 when the falls very quickly between recall 0.2 and 0.6. This is coverage is too big and 1 otherwise.
because when using the generalised quantisation fg we are adding relevant doxels (for precision/recall) that Other cases are not SIR-relevant. Thus, even if the system returns If in the same document two doxels are one after another all the SIR-relevant systems, it does not return the (like two sibling paragraphs), we will state that the proba- other relevant doxels. For our measure ERR, we can bility that the user follows the path between the two doxel see that after almost 400 doxels, model perfect has is proportional to the inverse of the distance between the retrieved all SIR-relevant doxels.
2. The model ancestors has a higher performance than P (e′ → e) = 2 + d(e′, e)´−1 model perfect. This point is related to the previousone: because the model ancestors returns more dox- els that are relevant (due to the quantisation), recall is better. Due to the limited size of the list and to the 4possible values for scores, examination of the retrieved In this section, we show how the measure discriminates be- doxels shows another thing: every SIR-relevant doxel tween different IR systems. In order to compare the be- in the returned list is preceded by a list of its ancestors.
haviour of generalised precision-recall versus our measure, We can see this effect with our measure, as the mea- we considered six different hypothetical "SIR-systems" which sure increases slowly with the number of the retrieved make use of known assessments. These systems exhibit "ex- documents for the model ancestors. Our measures treme" behaviours which illustrate a whole set of different also correctly discriminates those two models, as the situations. The six systems are named: performance of model ancestors is far below the per-formance of model perfect.
perfect A system that returns the SIR-relevant doxels 3. The model parent is much higher than the model document A system that returns all document in which a biggest child. This is not what could be expected, SIR-relevant doxel appears as the parent can contain many irrelevant parts. Thiseffect is due to the fact that doxels with coverage "too parent A system that returns always the parent of a SIR- small" have a lower value in the real scale than those relevant document with coverage "too big". With our measure, model ancestors A system that returns ancestors of a SIR-relevant performances are much closer.
document with a score 4. The model document is close to the model biggest biggest child The SIR system returns the biggest child (in child. This is not a good property of GRP, since we want a measure that favours systems that retrievedelements of smaller granularity than documents and In all these experiments, the score of the doxel was given by since the biggest child is very often close to the SIR- the relevance (first dimension of J relevant doxel (maybe as close as the document). With INEX) of its SIR-relevant doxel: we scored 1 for a doxel which was highly relevant, ERR, this is not the case.
0.5 for a fairly relevant doxel and finally 0 for a marginallyrelevant doxel.
Those four observations show that our measure is better In our experiments, we used all the content only queries for suited to SIR evaluation than GRP. If we consider the the- which there were some assessments. We only kept the 1000 oretic foundations of our measure, it gives some guarantees first documents returned by the different systems. Given about its validity.
that scores can only take three values, the P/R curve wascomputed using the Raghavan [12] probabilistic definition of precision and recall (with a step of 0.1). We computed the In this article, we have described a new measure for SIR sys- values at N = 0.1000 for our own measure. We averaged tems called the Expected Ratio of Relevant document (ERR).
our results for P/R and ERR in order to hide the specificities This measure is a generalisation of recall in classical IR: of each assessment. We didn't consider the case of standard when the probability of going from a doxel to another is precision/recall (e.g. using fs) as almost all of the models always null, the measure reduces to a form of generalised proposed here will have a near null precision-recall curve.
recall. This measure is consistent with SIR, in the sensethat it favours systems that find the smallest relevant dox- els. Other proposed measures like standard or generalised In figure 1, we present the curves obtained with our measure precision and recall are not good indicators of the perfor- and in figure 2 the generalised recall/precision (GRP). We mance of a SIR system, as was shown in the last section.
will comment on those curves in this subsection: we will Note that results presented here should however be inter- point the shortcomings of the GRP and see how our measure preted with care, as we took very specific systems to un- corrects the problem. When we analyse those curves, we can derline the strange behaviour of GRP. Our measure has the at least identify four problems with the GRP: advantage of a sound theoretical foundation and explicitly integrates the structure of the documents in the modellingof user behaviour5.
The presented measure could also be very easily adaptedin order to evaluate performance of systems in the case ofweb retrieval. Another interesting property is that it couldfavour systems that provide Best Entry Points to the doc-ument structure [8], from which users can browse to accessrelevant information. In this case, if from a retrieved doxelthere is a high probability that the user goes to some (SIR-)relevant doxels, the measure will increase faster than if thedoxel is (SIR-)relevant but provides no (structural) links toother (SIR-)relevant doxels.
The last step would have been to provide an extension ofprecision as we did for recall. But when we tried to follow theprobabilistic approach of Raghavan, a number of problemsarose6 and it is still not clear which set of hypotheses couldbe used to solve the problem. However, the curves we candraw with the proposed measure are informative enough andhave good properties, so it could replace or complement theGRP used for the evaluation of SIR-systems.
5This behaviour should be empirically validated.
6In particular, we need to calculate the probability of finding NR relevant doxels in the retrieved list if this list has a given length. This probability can only be computed inO(2MR)where M R is the number of relevant doxels for thequery.
Figure 1: Measure ERR (log-scale for the axis of abscissas). The axis of abscissas represents the length of thelist of retrieved doxels. The axis of ordinate represents the measure ERR (in %). The measures are averagedover the queries.
Figure 2: Generalised precision-recall. The axis of abscissas represents recall and the axis of ordinate theprecision. Precision are averaged over the queries.
[12] Vijay V. Raghavan, Gwang S. Jung, and Peter [1] Ricardo Baeza-Yates and Berthier Ribeiro-Neto.
Bollmann. A critical investigation of recall and Modern Information Retrieval. Addison Wesley, New precision as measures of retrieval system performance.
York, USA, 1999.
ACM Transactions on Information Systems,7(3):205–229, 1989.
[2] Peter Bollmann and Vladimir S. Cherniavsky.
Measurement-Theoretical Investigation of the [13] Don R. Swanson. Historical Note: Information MZ-Metric. In Robert N. Oddy, Stephen E. Robertson, Retrieval and the Future of an Illusion, pages C. J. van Rijsbergen, and P. W. Williams, editors, 555–561. Multimedia Information and Systems.
Proc. Joint ACM/BCS Symposium in Information Morgan Kaufmann, July 1997.
Storage and Retrieval, pages 256–267, 1980.
[14] John A. Swets. Effectiveness of Information Retrieval Methods. American Documentation, 20(1):72–89, [3] C.W. Cleverdon. The Cranfield tests on index January 1969.
language devices. In Aslib proceedings, volume 19,pages 173–192, 1967.
[15] Cornelis J. Van Rijsbergen. Information Retrieval.
Butterworths, 1979.
[4] William S. Cooper. Some inconsistencies and misidentified modelling assumptions in probabilisticinformation retrieval. In Nicholas J. Belkin, PeterIngwersen, and Annelise Mark Pej, editors,Proceedings of the 14th ACM SIGIR, Copenhagen,Danemark, 1992. ACM Press.
overt. Assessments and evaluation measures for XML document retrieval. In Proceedings of theFirst Annual Workshop of the Initiative for theEvaluation of XML retrieval (INEX), DELOSworkshop, Dagstuhl, Germany, December 2002.
overt and Gabriella Kazai. Overview of the Initiative for the Evaluation of XML retrieval (INEX)2002. In Proceedings of the First Annual Workshop ofthe Initiative for the Evaluation of XML retrieval(INEX), DELOS workshop, Dagstuhl, Germany,December 2002. ERCIM.
ainen and Kalervo J¨ arvelin. Using graded relevance assessments in IR evaluation. Journal of theAmerican Society for Information Science (JASIS),53(13):1120–1129, 2002.
[8] Mounia Lalmas and Ekaterini Moutogianni. A Dempster-Shafer indexing for the focussed retrieval ofa hierarchically structured document space:Implementation and experiments on a web museumcollection. In 6th RIAO Conference, Content-BasedMultimedia Information Access, Paris, France, April2000.
[9] Stefano Mizzaro. How many relevances in information retrieval? Interacting With Computers, 10(3):305–322,1998.
[10] Stephen M. Pollock. Measures for the Comparison of Information Retrieval Systems. AmericanDocumentation, 19(3):387–397, October 1968.
[11] Yuri Quintana, Mohamed Kamel, and Rob McGeachy.
Formal methods for evaluating information retrievalin hypertext systems. In Proceedings of the 11thannual international conference on Systemsdocumentation, pages 259–272, Kitchener-Waterloo,Ontario, Canada, October 1993. ACM Press.




Microsoft word - bypass consent.doc

Obesity Care 8800 ROESELARE (Belgium) Phone: +32 51 23.70.08 Fax: +32 51 23.79.41 email: website: Informed Consent for Roux-en-Y Gastric Bypass Please read this form carefully and ask about anything you may not understand. I am giving P. Pattyn and B. Smet (my doctors) and the whole ObesityCare team permission to perform a

Copyright © 2008-2016 No Medical Care