
Create Your First Project
Start adding your projects to your portfolio. Click on "Manage Projects" to get started
Reading Comprehension with BERT
Github Link
# Project Overview
This project focuses on a reading comprehension task where the model reads a passage, processes a related question, and selects the correct answer from three available choices. The task requires a deep understanding of context, details, and nuances within the text.
## Task Description
- **Objective:**
- Read a passage.
- Process a corresponding question.
- Select the correct answer from three choices.
## Model
### BERT
- **BERT (Bidirectional Encoder Representations from Transformers):**
Utilized as the core language model to understand context in both directions.
### Fine-Tuning
- A pre-trained BERT model is fine-tuned on a reading comprehension dataset.
- The project uses a tiny BERT model as a benchmark.
- Options available for further experimentation include larger models:
- `bert-base-uncased`
- `bert-large-uncased`
## Dataset
- **Dataset Link:**
[View Dataset on Google Drive](https://drive.google.com/file/d/1LABaYT-2gWthtNnW7PKlG9pM8Mh3NvuA/view)
- **Training Data (`train_data.csv`):**
- 8488 rows, 7 columns including:
- Question ID
- Context (passage)
- Question text
- Three answer choices
- Label indicating the correct answer (choice number)
- **Test Data (`test_data.csv`):**
- 2122 rows, 7 columns (the label column is “Null”)
## Implementation Details
### Data Processing
- Processes CSV data.
- Tokenizes passages and questions.
- Prepares the dataset for model training.
### Model Training
- Fine-tuning process details:
- Approximately 2 minutes for data processing.
- Approximately 2 minutes for training using the sample code.
- Hyperparameters (learning rate, batch size, number of epochs) are tuned to optimize performance.
### Advanced Options
- For more advanced experimentation, larger pre-trained models such as `bert-base-uncased` or `bert-large-uncased` can be used to potentially improve accuracy.
- Note: These models may require longer processing times.
## Output & Evaluation
### Output
- Final predictions on the test set are saved in a CSV file containing:
- **id:** Row index
- **label:** Predicted answer label (0, 1, or 2)
### Evaluation
- Model performance is measured by categorization accuracy—higher accuracy indicates a better understanding of the text.
## Files Included
- **Jupyter Notebook:**
- `DL_Weeek10 (2).ipynb` – Contains the full implementation for data processing, model fine-tuning, and evaluation.
- **Datasets:**
- `train_data.csv`
- `test_data.csv`
- **Output CSV File:**
- Contains two columns: `id` and `label` (for Kaggle submission).
## Conclusion
This project demonstrates the effective use of BERT for a comprehensive reading comprehension task. By fine-tuning a pre-trained BERT model on a dataset with passages, questions, and multiple answer choices, the system accurately selects the correct answer. The repository includes all code, dataset processing steps, and evaluation metrics necessary to replicate and extend the project.