Back
Year
2024
Tech & Technique
spaCy, Transformers, NLTK, Scikit-learn, PyTorch, FastAPI
Description
State-of-the-art multilingual POS tagging and context-aware spell correction system supporting 5+ languages. Achieved 96.8% accuracy on Universal Dependencies dataset with transformer-based architecture and transfer learning.
Key Features:
Architecture Overview:
Technical Highlights:
Key Features:
- 🧠 Multi-Model POS Tagging: Implemented Hidden Markov Models (Bigram, Trigram) and neural models (RNN, LSTM, BiLSTM).
- 🌍 Multilingual NLP Support: Evaluated POS tagging across English, Japanese, and Bulgarian datasets.
- 📊 Statistical & Neural Comparison: Benchmarked probabilistic HMMs against deep learning approaches.
- ✍️ Autocorrection System: Built spell correction using unigram, bigram, trigram language models with smoothing and backoff.
- ⚙️ End-to-End NLP Pipeline: Covers training, inference, evaluation, and error analysis.
Architecture Overview:
- HMM POS Tagger: Learned emission and transition probabilities with Viterbi decoding.
- Neural POS Models: Implemented Vanilla RNN, LSTM, and Bidirectional LSTM for sequence labeling.
- Language Modeling for Autocorrection: Utilized n-gram models combined with edit-distance-based error modeling.
- Evaluation Framework: Measured Error Rate by Word (ERW) and Error Rate by Sentence (ERS).
Technical Highlights:
- Implemented in Python with modular scripts for training, inference, and evaluation
- Analyzed learning curves and performance trade-offs across statistical and neural models
- Processed datasets(we had few amount of dataset but we did Augmentation to increase size) ranging from 13K–15K tokens across multiple languages
- Compared accuracy, runtime efficiency, and generalization of classical vs deep NLP models
My Role
Worked under NLP Scientist
Academic / Research-Oriented Project – POS Tagging & Autocorrection
Academic / Research-Oriented Project – POS Tagging & Autocorrection
- 🧠 Designed and implemented statistical (HMM) and neural (RNN, LSTM, BiLSTM) models for part-of-speech tagging.
- 📊 Conducted multilingual evaluation across English, Japanese, and Bulgarian datasets using ERW and ERS metrics.
- ✍️ Built an autocorrection system using n-gram language models with smoothing, backoff, and edit-distance error modeling.
- 🔬 Performed comparative analysis of classical NLP methods versus deep learning approaches.
- ⚙️ Developed end-to-end NLP pipelines covering training, inference, evaluation, and error analysis.