calculating precision and recall in lucene


I finally solve the problem of how to do precision, recall and MRR calculation in lucene framework. after keep asking around on the internet I finally has a workaround. let me show you how.

firstly you need to run this algorithm that named PrecisionRecall.java and supplied it with 3 files, the topics file, qrels file and your index file. after that the result will be shown just like below.

a little background, recall measures how well the search system finds relevant documents while precision measures how well the system filters out the irrelevant documents

package lia.benchmark;
/**
 * Copyright Manning Publications Co.
 *
 * Licensed under the Apache License, Version 2.0 (the "License");
 * you may not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *     http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing, software
 * distributed under the License is distributed on an "AS IS" BASIS,
 * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 * See the License for the specific lan
*/

import java.io.File;
import java.io.PrintWriter;
import java.io.BufferedReader;
import java.io.FileReader;
import org.apache.lucene.index.*;
import org.apache.lucene.search.*;
import org.apache.lucene.store.*;
import org.apache.lucene.benchmark.quality.*;
import org.apache.lucene.benchmark.quality.utils.*;
import org.apache.lucene.benchmark.quality.trec.*;

// From appendix C

/* This code was extracted from the Lucene
   contrib/benchmark sources */

public class PrecisionRecall {

  public static void main(String[] args) throws Throwable {

 File topicsFile = new File("C:/Users/Raden/Documents/lucene/LuceneHibernate/LIA/lia2e/src/lia/benchmark/topics.txt");
    File qrelsFile = new File("C:/Users/Raden/Documents/lucene/LuceneHibernate/LIA/lia2e/src/lia/benchmark/qrels.txt");
    Directory dir = FSDirectory.open(new File("C:/Users/Raden/Documents/myindex"));
    Searcher searcher = new IndexSearcher(dir, true);

    String docNameField = "filename";

    PrintWriter logger = new PrintWriter(System.out, true);

    TrecTopicsReader qReader = new TrecTopicsReader();   //#1
    QualityQuery qqs[] = qReader.readQueries(            //#1
        new BufferedReader(new FileReader(topicsFile))); //#1

    Judge judge = new TrecJudge(new BufferedReader(      //#2
        new FileReader(qrelsFile)));                     //#2

    judge.validateData(qqs, logger);                     //#3

    QualityQueryParser qqParser = new SimpleQQParser("title", "contents");  //#4

    QualityBenchmark qrun = new QualityBenchmark(qqs, qqParser, searcher, docNameField);
    SubmissionReport submitLog = null;
    QualityStats stats[] = qrun.execute(judge,           //#5
            submitLog, logger);

    QualityStats avg = QualityStats.average(stats);      //#6
    avg.log("SUMMARY",2,logger, "  ");
    dir.close();
  }
}

/*
#1 Read TREC topics as QualityQuery[]
#2 Create Judge from TREC Qrel file
#3 Verify query and Judge match
#4 Create parser to translate queries into Lucene queries
#5 Run benchmark
#6 Print precision and recall measures
*/

you need to fill the qrels file with all the result that you expected it would displayed.

# Format:
#
#       qnum   0   					doc-name						is-relevant
#
#

0 	 0 	 	C:\Users\Raden\Documents\Yammer\adiDATA\adi\12.txt     	 	 1
0 	 0 	    C:\Users\Raden\Documents\Yammer\adiDATA\adi\16.txt     	     1
0 	 0 	   C:\Users\Raden\Documents\Yammer\cisiDATA\cisi\555.txt    	 1
0	 0	   C:\Users\Raden\Documents\Yammer\cisiDATA\cisi\1037.txt		 1
0	 0	   C:\Users\Raden\Documents\Yammer\cisiDATA\cisi\1120.txt	 	 1

and later on you also need to fills topics file with any search keyword you’d like to know its relevancy

<top>
<num> Number: 0
<title> elements
<desc> Description:
<narr> Narrative:
</top>
 

and here is the final result

0  -  contents:elements

0 Stats:
  Search Seconds:         0.034
  DocName Seconds:        0.039
  Num Points:            44.000
  Num Good Points:        5.000
  Max Good Points:        5.000
  Average Precision:      1.000
  MRR:                    1.000
  Recall:                 1.000
  Precision At 1:         1.000
  Precision At 2:         1.000
  Precision At 3:         1.000
  Precision At 4:         1.000
  Precision At 5:         1.000
  Precision At 6:         0.833
  Precision At 7:         0.714
  Precision At 8:         0.625
  Precision At 9:         0.556
  Precision At 10:        0.500
  Precision At 11:        0.455
  Precision At 12:        0.417
  Precision At 13:        0.385
  Precision At 14:        0.357
  Precision At 15:        0.333
  Precision At 16:        0.312
  Precision At 17:        0.294
  Precision At 18:        0.278
  Precision At 19:        0.263
  Precision At 20:        0.250

SUMMARY
  Search Seconds:         0.034
  DocName Seconds:        0.039
  Num Points:            44.000
  Num Good Points:        5.000
  Max Good Points:        5.000
  Average Precision:      1.000
  MRR:                    1.000
  Recall:                 1.000
  Precision At 1:         1.000
  Precision At 2:         1.000
  Precision At 3:         1.000
  Precision At 4:         1.000
  Precision At 5:         1.000
  Precision At 6:         0.833
  Precision At 7:         0.714
  Precision At 8:         0.625
  Precision At 9:         0.556
  Precision At 10:        0.500
  Precision At 11:        0.455
  Precision At 12:        0.417
  Precision At 13:        0.385
  Precision At 14:        0.357
  Precision At 15:        0.333
  Precision At 16:        0.312
  Precision At 17:        0.294
  Precision At 18:        0.278
  Precision At 19:        0.263
  Precision At 20:        0.250

Advertisements

3 comments on “calculating precision and recall in lucene

  1. m says:

    where can i download “qrels” file

    please help.

  2. m says:

    i can’t fine “qrels” file in trec

    please help.

  3. ibouce says:

    Can you tell me why my results are zero !!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s