Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



20 Commits

Repository files navigation


Project Structure

The project structure is as follows:

  • data: Contains the dataset in json format.
  • src: Contains the source code for the project.
  • requirements.txt: Contains the required python packages for the project.
  • Contains the project documentation.


The refined Bench4BL dataset used for this project is provided in the data directory in json format. The dataset contains information about the location of the bus stops in the city of Bengaluru. The dataset contains the following fields:

  • bug_id: Unique identifier for the bug.
  • bug_title: Title of the bug.
  • bug_description: Description of the bug.
  • project: Project to which the bug belongs.
  • sub_project: Subject to which the bug belongs.
  • version: Version of the project.
  • fixed_version: Version in which the bug was fixed.
  • fixed_files: Files in which the bug was fixed as a json array.


The following are the pre-requisites for the project:

  • Python 3.10
  • Elasticsearch
  • NVIDIA CUDA enabled GPU
  • Required Python Packages

Installing Required Packages

Python 3.10:

We recommend using a virtual environment to install the packages and run the application. Learn to use a virtual environment here.


  1. Download Python 3.10:

    • Visit
    • Download the Windows installer (Windows Installer (64-bit) recommended).
    • Run the installer.
    • Check the box to add Python to PATH during installation.
  2. Verify Installation:

    • Open Command Prompt.
    • Type python --version.
    • You should see Python 3.10.x.

Linux (Ubuntu/Debian):

  1. Install Python 3.10:

    • Open Terminal.
    • Run the following commands:
      sudo apt update
      sudo apt install python3.10
  2. Verify Installation:

    • Type python3.10 --version.
    • You should see Python 3.10.x.

Install PyTorch:

PyTorch with CUDA 11.3 support is required for the project.

Use the following command to install PyTorch with CUDA support:

pip install torch==1.10.0+cu113 torchvision==0.11.0+cu113 torchaudio==0.10.0+cu113 torchtext==0.11.0 -f

Verify the installation by running the following command:

python -c "import torch; print(torch.cuda.is_available())"

You should see True if PyTorch is installed correctly with CUDA support.

If you do not have a CUDA-enabled GPU, install the CPU version of PyTorch. Learn more about PyTorch with CUDA support here.



  1. Download Elasticsearch:

  2. Extract and Start Elasticsearch:

    • Extract the downloaded ZIP file.
    • Navigate to the extracted directory.
    • Run bin\elasticsearch.bat in Command Prompt.
  3. Verify Installation:

    • Open a web browser.
    • Go to http://localhost:9200.
    • Check for a JSON response indicating Elasticsearch is running.

Linux (Ubuntu/Debian):

  1. Download and Install Elasticsearch:

    • Open Terminal.
    • Run the following commands:
      sudo dpkg -i elasticsearch-<version>-amd64.deb
  2. Start Elasticsearch Service:

    • Run:
      sudo systemctl start elasticsearch
      sudo systemctl enable elasticsearch
  3. Verify Installation:

    • Open a web browser.
    • Go to http://localhost:9200.
    • Ensure Elasticsearch is running by checking for a JSON response.

Install Required Python Packages:

  1. Navigate to Project Directory:

    • Open terminal/command prompt.
    • Use cd to move to the directory containing requirements.txt.
  2. Install Packages:

    • Run pip install -r requirements.txt.


Index Documents in Elasticsearch for Each version of the Project:

  1. Create Index:
    • Run 'src/IR/Indexer/' to create an index in Elasticsearch. The configuration for the index is provided in 'IR_Config.yaml'.
    • Extract the source files from Git Projects per version and index them in Elasticsearch using ''. The GitHub Repositories are listed in the Bench4BL repository.
    • The default port for Elasticsearch is 9200.
  2. Train or download the models from the following link:

Localizing the bugs:

Run the command below to localize the bugs:

python src --br-path /path/to/input/data  --kw-model-dir /path/to/keyword/model --ce-model-dir /path/to/cross-encoder/model --L 10 --topK_rerank 50 --topN 10
- `--br-path`: Path to the input data in json format. The format of the json file should follow the format of the dataset provided in the `data` directory.
- `--kw-model-dir`: Path to the keyword model.
- `--ce-model-dir`: Path to the cross-encoder model.
- `--L`: Length of the keywords.
- `--topK_rerank`: Number of bugs to rerank.
- `--topN`: Number of top outputs to return.


No description, website, or topics provided.






No releases published


No packages published
