Machine Learning and Pattern Recognition

AI3011 | Spring 2026

Home Lectures Readings Labs Project Info

Spring 2026


Advanced Predictive Modeling for Flight Delay Mitigation

Kartik Kaushik, Saksham Bhasin, Vedant Kapoor

CODE PDF
[click on the image to enlarge]

Flight delays cost the aviation industry billions annually, yet existing predictive models focus only on whether a flight will be delayed — not whether it can recover. This project addresses that gap by building a system that predicts, before departure, both the likelihood and magnitude of in-air delay recovery for commercial flights.

Using the Aeolus 2024 dataset (6.28M flights, 22 features), we engineered a physics-grounded feature set — including Padding Ratio, Available Buffer, and cyclical departure time encodings — applied strict temporal data splitting to eliminate leakage, and used target encoding to efficiently handle hundreds of airports without dimensionality explosion.

Our architecture, a sequential Two-Brains XGBoost pipeline, separates the problem into two stages: a Gatekeeper Classifier that screens flights incapable of recovery, and a Predictor Regressor that estimates exact recovery minutes for viable flights. Stage 1 Precision (73.58%) is the architectural bottleneck — low precision floods the regressor with unrecoverable flights, corrupting its training gradient. Stage 2 achieves a Test MAE of 10.88 minutes, improving to 5.41 minutes after Optuna-based Bayesian hyperparameter optimization. SHAP values provide full prediction explainability, making the system interpretable for aviation stakeholders.


Affective State–Driven Adaptive Gaming: Modulating Player Experience Using Behavioral Telemetry and Machine Learning

Vageesh Chandra Srivastava, Garvit Jain , Siddhartha Kumar

CODE PDF
[click on the image to enlarge]

Modern video games typically rely on static difficulty systems that fail to adapt to a player’s emotional state, often leading to boredom or frustration. This project addresses the problem of dynamically adjusting gameplay by predicting a player’s affective state using behavioral telemetry such as aiming accuracy, mouse movement patterns, and interaction intensity.

Machine learning models–including Random Forest, Support Vector Machine, Logistic Regression, and XGBoost–were trained to classify player states such as Flow, Boredom, Enjoyment, and Negative High Arousal. The system then uses these predictions to modulate game difficulty in real time, aiming to maintain optimal engagement. The study demonstrates that gameplay behavior contains meaningful signals for inferring emotional states, though challenges such as overlapping behavioral patterns and noisy self-reported labels limit classification accuracy.

Techniques like feature analysis, label refinement, and SHAP-based interpretability were used to improve understanding of model behavior. The resulting system provides a foundation for adaptive gaming experiences, with potential applications in personalized entertainment, education, and human-computer interaction, ultimately contributing to more engaging and responsive interactive systems. Player affective states can be inferred from gameplay telemetry using machine learning, and these predictions can be used to create adaptive gaming experiences.


An Integrated Machine Learning Framework for Credit Rating Prediction and Financial Fraud Detection.

Avneet Kaur Sandhu, Hunnar Khurana, Renaya Gupta

CODE PDF
[click on the image to enlarge]

This project builds an integrated machine learning framework for two linked financial risk tasks: predicting corporate credit ratings and detecting financial fraud. Credit ratings from agencies such as Standard & Poor's and Moody's are expensive, infrequently updated, and unavailable for many firms, while fraudulent financial reporting often remains hidden for years. The framework uses public financial statements, macroeconomic indicators, SEC EDGAR filings, AAER labels, accrual patterns, earnings-manipulation signals, the Beneish M-score, and a derived rating gap to support credit analysis, audit prioritization, and investment screening.

The credit rating module predicts a firm's rating on a 22-point ordered scale using Ridge Regression, Random Forest, XGBoost, and Ordered Logistic Regression. The fraud detection module uses Isolation Forest, supervised XGBoost, Positive-Unlabeled Learning, and an ensemble meta-learner trained on 2,344 firm-year observations across 401 firms, including 86 confirmed fraud firm-years. Data leakage was corrected by ensuring that imputation, normalization, model fitting, and PU sampling used only training data, while class imbalance was handled through weighted learning and PU-style treatment of unlabeled firms.

The Random Forest rating model achieved an RMSE of 3.00 and MAE of 2.23, while Ordered Logistic Regression reached 41.5% within-one-notch accuracy and 60.9% within-two-notch accuracy. For fraud detection, PU Learning achieved an AUC-ROC of 0.76 and Average Precision of 0.38 after leakage correction, with precision of 0.25, recall of 0.81, and F1 of 0.38 at the selected threshold. The system can be deployed as a quarterly EDGAR-based dashboard at Plaksha or similar institutions, though scaling will require careful handling of EDGAR rate limits, incomplete XBRL coverage, historical-label bias, and the computational cost of PU Learning.


Art Authentication Using ResNet-50 and GMM-Based Anomaly Detection

Ayushmaan, Ronak Tiwari , Shloka Srivastava

CODE PDF
[click on the image to enlarge]

Art forgery challenges cultural heritage preservation, with millions lost annually to counterfeit paintings. Existing authentication methods rely heavily on expert judgment, which is costly, subjective, and unscalable. Our project, automates painting authenticity verification using the WikiArt dataset, with potential applications in museum cataloguing, auction house verification, and digital art provenance tracking.

Our methodology integrates supervised deep learning with probabilistic density estimation. A ResNet-50 backbone (pretrained on ImageNet) was fine-tuned on 22 artists, balanced at 400 images each to avoid class bias. We extracted 4096-dimensional patch-aggregated embeddings using mean and max pooling over overlapping 224×224 patches to capture both average and extreme stylistic features. A Gaussian Mixture Model (GMM) was trained per artist to model authentic embeddings, and forgery detection was performed via log-likelihood thresholding (mean − 2.2σ of the best-matching distribution). The absence of real forgery data was addressed through one-class density modeling, while high-dimensional GMM complexity was managed using diagonal covariance.

The proposed framework achieved strong performance with 84.32% test accuracy, 0.83 precision, 0.84 recall, an F1-score of 0.84, and a log-loss of 0.53. The GMM-based rejection framework provides statistically grounded authenticity scores. These methods were selected through literature review on automated painter identification and patch-based feature extraction (Choudhury, 2021; Gencarelli et al., 2023; Afifi et al., 2025; Sabino et al., 2026), along with trial-and-error and computational feasibility considerations. Overall, the pipeline replaces subjective analysis with mathematically grounded authenticity scoring, contributing to digital provenance tracking and the protection of global cultural heritage.


Attention Guided Retinal Age Gap Prediction

Agamvir Singh, Hitesh Jindal, Subhi Sabharwal

CODE PDF
[click on the image to enlarge]

Retinal fundus imaging provides a non-invasive approach for analyzing vascular patterns associated with biological aging. This project presents an explainable retinal age gap prediction framework using deep learning and vascular biomarker analysis. The objective of the proposed system is to predict retinal biological age from retinal fundus images and improve interpretability through vascular feature analysis.

The methodology combines retinal image preprocessing, vessel segmentation, biomarker extraction, and deep learning-based age prediction. Multiple machine learning architectures including EfficientNet-B0, EfficientNet-B3, RETFound Linear Probe, and fine-tuned RETFound Vision Transformer (ViT) models were explored and compared. RETFound-based architectures demonstrated stronger retinal representation learning, while the linear probe approach achieved the best Mean Absolute Error (MAE) of 6.77 years. Vascular biomarkers such as vessel density, tortuosity, branching points, fractal dimension, and vessel thickness were extracted using skeletonization and morphological analysis. SHAP-based explainability was further applied to identify feature contributions influencing retinal age prediction.

Major challenges included retinal image quality variability, segmentation noise, and limited computational resources, which were addressed using preprocessing, augmentation, and transfer learning strategies. The proposed framework can be deployed as an AI-assisted retinal screening system in hospitals, diagnostic centers, and research environments for automated retinal aging analysis. With larger datasets and clinical validation, the system has potential for scalable and interpretable medical imaging applications.


Beyond EWMA: Smarter Volatility Forecasting with Sentiment and Residual Machine Learning

Himesh Agarwal, Rahil Shah, Sahil Patel

CODE PDF
[click on the image to enlarge]

Every trading desk on Dalal Street relies on a thirty-year-old framework called EWMA to predict market risk. It is stable and predictable, but it reacts slowly to sudden regime shifts. When unexpected panics hit, traditional risk models fail because they only look backward. We changed that by building a robust machine learning pipeline designed to forecast stock volatility with far greater precision.

Working with financial data is notoriously difficult due to hidden structural flaws like look-ahead bias, overlapping records, and highly correlated features that cause models to memorize historical noise rather than predict the future. We solved these issues by implementing a strict 21-day data embargo and creating a "Hybrid-5" framework that cleanly structures our inputs. Crucially, we integrated NLP-driven sentiment analysis to capture public mood and news momentum before those shifts reflect in price charts.

Instead of predicting risk from scratch, our ensemble supported by a highly disciplined XGBoost model that works directly alongside EWMA to correct its lagging errors in real-time. Our testing proved with statistical certainty that combining market psychology with advanced data engineering vastly outperforms traditional financial standards. We engineered a reliable, clear system built to withstand market chaos.


Children's Phonetic Speech Recognition

Armaan Raisinghani, Rohan Gupta , Shreya Khanna

CODE PDF
[click on the image to enlarge]

Children’s speech recognition remains substantially more difficult than adult ASR due to acoustic and phonetic differences such as higher pitch, shorter vocal tracts, and developing pronunciation patterns. In this work, we developed a machine learning system for predicting IPA phonetic sequences directly from children’s speech audio. Potential applications include AI literacy tutors, pediatric speech therapy support tools, and more inclusive voice assistants, helping address the large accuracy gap between adult and child ASR systems.

We formulated the task as unsegmented phonetic speech recognition using Connectionist Temporal Classification (CTC). Our final architecture was a compact ASR-native CLDNN model operating on log-Mel spectrograms with temporal sequence modelling through bidirectional LSTMs (BiLSTMs). The model preserves temporal acoustic structure while remaining lightweight enough for low-resource children’s speech data.

The best-performing configuration, CLDNN h384, achieved a test CER of 0.561 and test PER of 0.581. Results demonstrate that smaller sequence-oriented acoustic architectures can effectively model children’s IPA recognition and achieve strong phonetic transcription performance in low-resource settings.


Circadian Rhythm Modelling

Naisha Shah , Nandini Aggarwal , Raghav Saboo

CODE PDF
[click on the image to enlarge]

Disruptions in circadian rhythms affect health, sleep quality, and daily productivity, yet accurately tracking internal body clocks in real-world settings remains difficult. Traditional methods such as DLMO testing or sleep lab studies are invasive, expensive, and impractical for continuous monitoring. This project aims to develop a non-invasive, scalable solution to predict a user’s Acrophase (their daily peak physiological time) using data from the NHANES (National Health and Nutrition Examination Survey) dataset.

Such a system could enable personalized health insights, improved sleep recommendations, and better scheduling for productivity and well-being. To address this, we preprocess wearable-style features such as activity levels, heart rate patterns, and sleep behavior, while encoding time cyclically using sine and cosine transformations to reflect its 24-hour nature. We implement and compare three machine learning models: K-Nearest Neighbors (KNN) as a baseline, LightGBM for efficient handling of complex feature interactions, and a Multi-Layer Perceptron (MLP) neural network for capturing nonlinear patterns.

Key challenges included limited directly labeled circadian data and handling cyclical time, which we addressed through feature engineering and circular transformations. Model performance is evaluated using Circular RMSE and Circular R², along with accuracy, precision, recall, and confusion matrices after grouping predictions into time-of-day categories. Results indicate reliable prediction of circadian phase, demonstrating feasibility for deployment in wearable-based health monitoring systems at scale, though challenges such as user variability and real-time integration remain.


Credit Risk Assessment for Financial Institutions

Anahad Singh, Manavi Nakra, Tanmay Mohan

CODE PDF
[click on the image to enlarge]

Credit default prediction is a high-stakes machine learning problem aimed at estimating the likelihood that a borrower will fail to repay a loan. While modern models can learn complex patterns from financial data, their effectiveness depends heavily on how borrower history is represented. This project, built on the Home Credit Default Risk dataset, focuses on transforming raw financial records into meaningful behavioral signals such as repayment capacity, recent payment stress, debt burden, credit utilization, delinquency progression, and prior credit-seeking behavior. In addition to these interpretable features, cross-table aggregations and risk summaries are used to capture latent patterns from historical financial records that individual variables alone cannot express.

The proposed system combines domain-driven feature engineering with an ensemble of gradient boosting models — LightGBM, XGBoost, and CatBoost — trained and validated using strict 5-fold stratified cross-validation. More than 55 experimental runs were conducted to iteratively evaluate feature families, aggregation strategies, and ensemble configurations. The final pipeline achieves a ROC-AUC score of 0.80004 while remaining modular, scalable, and interpretable, making it suitable for deployment in real-world credit risk assessment and financial decision-support systems.


Culturally-Adaptive Helmet Detection System for Indian Traffic

Advika Ramesh, Rithwik Agarwal, Sukrit Sharma

CODE PDF
[click on the image to enlarge]

Standard helmet detection systems are usually trained on Western datasets, which makes them unreliable in Indian traffic environments where cultural headwear and regional garments are common. This creates false positives when Sikh riders wearing legally exempt pagdis are flagged as violations, and false negatives when bulky fabrics such as hijabs, dupattas, or burqas are misclassified as helmets. This project develops a culturally aware three-class detection pipeline for Helmet, Pagdi, and No-Helmet cases, with the goal of making automated enforcement more fair and reducing manual verification work for traffic authorities.

The solution frames the task as a three-class object detection problem and uses YOLOv11 because it supports real-time embedded deployment and uses Distribution Focal Loss for more precise bounding-box learning. Instead of treating box edges as rigid lines, the model represents them as probability distributions, which helps when textile boundaries are blurry or visually ambiguous. Region-specific traffic data, zero-shot auto-annotation with Grounding DINO, and LLaVA-assisted label curation are used to build a more context-aware dataset.

The main challenges were limited region-specific data, ambiguous annotations for cultural garments, and false positives under noisy traffic-camera conditions. These were addressed through targeted data sourcing, separate preprocessing for heterogeneous datasets, and confidence-thresholding logic at inference time. The resulting system is intended for scalable, fairer traffic enforcement in Indian road settings.


Data-Centric Training Strategies for Robust Object Detection in Unstructured Indian Traffic Environments

Daiwik Chilukuri, Farhan Hussain, Niksh Hiremath

CODE PDF
[click on the image to enlarge]

Object detection on Indian roads presents unique challenges that Western-validated datasets and preprocessing pipelines do not adequately address. This project systematically evaluates whether standard data-side techniques, including augmentation strategies and small object handling, actually improve detection performance when applied to the IDD-95k dataset, which reflects the real heterogeneity of Indian traffic: autorickshaws, two-wheelers, dense crowds, and irregular lane behavior across roughly 75,000 images spanning 11 object classes.

Three architectures are compared under controlled configurations: YOLOv11s and YOLOv26s (single-stage, COCO-pretrained) and Faster R-CNN with ResNet-50-FPN backbone (two-stage, COCO-pretrained). Each runs under a strict no-augmentation baseline and a full augmentation configuration including mosaic, mixup, HSV shifts, geometric shear with exact bounding box transforms, AutoContrast, and Gaussian noise. A third configuration adds multi-scale small object handling.

All evaluation uses pycocotools uniformly, computing mAP@0.5, mAP@0.5:0.95, AP by object size, per-class AP, and a custom COCO-style confusion matrix with background class. Validated performance improvements under augmentation and small object handling would confirm that these techniques generalize beyond Western benchmarks to Indian road conditions, providing a defensible empirical basis for their adoption in ADAS and traffic monitoring pipelines deployed in India. Deployment at Plaksha would face challenges including campus-specific distribution shift, night and monsoon degradation, and multi-camera inference cost at scale.


Deceptive Opinions Spam Detection

Krrish Singhania , Mukund Saraf, Vansh Jain

CODE PDF
[click on the image to enlarge]

Addressing the 2026 World Economic Forum’s ranking of "Misinformation & Disinformation" as a top global risk, this project tackles fake and incentivized reviews that severely impact e-commerce ecosystems and consumer trust. To address the "vocabulary overfit" observed in prior research and the limitations of arbitrary classification thresholds, we implement a novel Semi-Supervised Domain Adaptation pipeline.

We pivot from forced binary choices by introducing an explicit "Uncertain" probability zone, allowing the foundational model to isolate genuinely ambiguous reviews. By leveraging high-confidence predictions as pseudo-labels for domain-specific fine-tuning, our approach adapts to new product categories without requiring manual annotation. This methodology yields a significant improvement in cross-domain generalization, decreasing the uncertainty rate by nearly 20% across evaluated domains and providing a highly scalable framework for real-time review moderation.


Deciphering Visually Similar Indian Scripts

Anil Aleti, Chandrakant Singh, Yukteswar Mantha

CODE PDF
[click on the image to enlarge]

Script identification is the rate limiting step in Indian multilingual OCR. Fixing this single stage alone doubles word recognition accuracy from 36% to 71%, yet the best published model still misclassifies Assamese as Bengali 39% of the time and Hindi as Marathi 20% of the time. We took those two failure modes as our starting point.

Working on the Bharat Scene Text Dataset across 12 Indian languages, we ran a 3-phase investigation. In Phase 1, we benchmark five architectures which helped in establishing that model capacity primary bottleneck at this data scale. In Phase 2, few targeted training improvements elevated ResNet-18 from 67.68% to 85.41% on 12-class classification, while a fine-tuned CLIP ViT-B/16 reached 86.5%. ResNet-18 solves Hindi-Marathi but breaks on Odia-Kannada, while CLIP ViT-B/16 resolves Odia-Kannada but regresses on Hindi-Marathi.

In Phase 3, we applied five ensemble strategies to resolve this tradeoff. A stacking meta learner achieved the best ensemble result at 86.44%, yet Marathi recall remained at 63%. Ambiguity based routing revealed CLIP's Hindi-Marathi errors are confidently wrong, not uncertain. A specialist override fired zero times at its optimal threshold. Each experiment proved that these two model families cannot be fused to simultaneously solve both tradeoffs. A joint architecture that unifies local stroke analysis with global spatial reasoning is what the problem truly demands.


DecoyNet: Decoy Data Injection for Threat Reduction

Aashna Baldi , Keerthana K S, Proshita Agarwal

CODE PDF
[click on the image to enlarge]

Data breaches and stealthy data exfiltration attacks remain major cybersecurity challenges, particularly in financial systems where attackers increasingly employ bulk theft, targeted extraction, mimicry, and slow incremental access to evade conventional anomaly detectors. DecoyNet proposes a machine-learning-driven deception framework that injects statistically realistic decoy records into real financial transaction datasets to enable early breach detection. The system uses a three-layer architecture consisting of decoy generation, strategic injection, and real-time detection. Using the PaySim synthetic financial dataset containing 6.3 million mobile-money transactions, legitimate transaction patterns are learned through a deep autoencoder with latent-space representation learning. A PCA + Gaussian Mixture Model fallback pipeline is additionally implemented to maintain decoy realism under overfitting conditions. Generated decoys are validated using a Random Forest discriminator, KL-divergence, and KS statistical testing to ensure similarity with legitimate records. Decoys are strategically injected using random, edge-case, cluster-based, and high-value transaction placement strategies, with all decoy entries securely stored using salted SHA-256 hashing. Experimental evaluation across multiple simulated attack scenarios demonstrated 100% detection across bulk, targeted, mimicry, and slow-theft attacks while maintaining a 0% false-positive rate during legitimate access. Downstream fraud-detection performance degraded by only 0.0061 AUC-ROC after decoy injection, confirming preservation of dataset integrity. The proposed framework demonstrates the feasibility of scalable ML-based cyber deception for proactive database security and insider-threat detection.


Deepfake Detection using Behavioural Patterns

Manaasvi Vij, Manshika Jain , Roshni Rai

CODE PDF
[click on the image to enlarge]

Deepfakes are particularly interesting not only in and of itself but because they are a prototype for problems to come. Deepfakes are one of the latest technological developments in artificial intelligence and are creating a “crisis of knowing”. This is because they are convincing, differing significantly from traditional disinformation. We acquired the FakeAVCeleb dataset consisting of real and fake audio-visual deepfake videos. Our work proposes a multimodal deepfake detection framework where it moves beyond the visual appearance and tries to interrogate behavior for authenticity of the media. The system analyses markers such as eye blink pattern, consistency across frames and the synchronization of lip movement, audio and expression. Facial features were extracted using EfficientNetB0, and MediaPipe Face Mesh was used to capture fine-grained behavioural cues. Temporal inconsistencies are analyzed through optical flow and frame-level feature variations, while audio patterns are encoded using MFCCs. These signals are fused into a unified representation and classified by using LightGBM,this enables efficient learning from complex, high dimensional data. To address the black-box nature of machine learning models, the framework incorporates SHAP. The final hyperparameter-tuned LightGBM model achieved 98.39% accuracy, 96.75% balanced accuracy, 0.9900 F1-score, and an AUC-ROC of 0.9994, demonstrating class separability, scalability, interpretability, and strong generalization across both real and fake classes.


Detecting AI-Assisted Interview Cheating Through Speech Analysis

Hardik Nanda, Narasimha P, Vyshnav KP

CODE PDF
[click on the image to enlarge]

The rapid rise of AI-assisted interview tools that can generate live scripted responses has created a serious challenge for the integrity of online hiring. This project tackles the problem by developing a speech-based classification system that differentiates between read speech and spontaneous conversational speech, using the behavioral and timing differences between the two styles. This system could be integrated into interview monitoring platforms or large-scale remote proctoring systems to flag potentially AI-assisted responses.

A major challenge was the significant difference in recording quality between the clean LibriSpeech corpus and the noisy Buckeye conversational corpus. Initial experiments using 98-dimensional acoustic feature vectors extracted with openSMILE eGeMAPSv02 achieved nearly perfect accuracy. This showed that the model was relying on specific recording artifacts from the dataset rather than real speech behavior. To fix this, a domain-knowledge ablation strategy was introduced to gradually remove features that risked leaking information, like MFCCs, spectral descriptors, and loudness-dependent cues.

The final system utilizes a Dynamic Voice Activity Detection (VAD) pipeline that separates speech from silence using noise-aware thresholding. From this, eight strong behavioral features related to pauses, speech rhythm, and speaking variability were extracted and used to train an RBF-kernel Support Vector Machine (SVM). The resulting model achieved about 86% classification accuracy with strong AUC and balanced F1-scores, while significantly reducing dataset leakage and improving the understanding of behavioral aspects.


Detecting Anomalies in Industrial Machine Sounds

Adhwaythaprakash , Amrita , Padavala Gayathri Thanmai

CODE PDF
[click on the image to enlarge]

Industrial machines often develop faults that first manifest as subtle changes in operating sounds. These anomalies frequently go undetected by human operators, resulting in unplanned downtime, costly repairs, and safety hazards across manufacturing, energy, and logistics sectors. Early automated detection of such sounds can enable predictive maintenance, reducing operational losses significantly.

We propose a two-stage unsupervised machine learning pipeline to address the domain shift problem – the key limitation identified across existing literature. A Convolutional Autoencoder is first pretrained on the MIMII 2019 dataset using only normal machine sounds, learning to reconstruct healthy acoustic patterns. Anomalies are flagged when reconstruction error exceeds an adaptive threshold. The model is then fine-tuned on the MIMII DG 2022 dataset, which contains recordings across varied environmental conditions, enabling robust generalization across deployment environments.

Features are extracted as Mel-Spectrograms and MFCCs using Librosa. Key challenges included the huge dataset size, which we managed using Google Colab with streaming access, and catastrophic forgetting during fine-tuning, mitigated by freezing encoder weights at a low learning rate. Performance is evaluated using AUC-ROC, F1-Score, Precision, and Recall, targeting above 80% AUC. The solution is deployable in any factory environment using low-cost microphones connected to edge devices, with the model running inference in real time.


Detecting Brain Anomalies Using Conditioned Diffusion Modelling

Keshav Kant Ahuja, Prajna Sharma , Sukhmani Kaur

CODE PDF
[click on the image to enlarge]

Unsupervised anomaly detection in brain MRI aims to identify pathological tissue without requiring pathology-specific training labels. This project develops a 3D-context conditioned diffusion model for healthy-only T2-weighted brain MRI anomaly detection, with external evaluation on BraTS21 tumour cases. The method adapts a conditioned diffusion framework by adding volumetric context from a 3D MONAI ResNet-50 encoder pretrained using SparK-style masked reconstruction. For each fold, the encoder is pretrained on healthy T2 volumes and then used to condition a 2D DDPM U-Net that reconstructs input slices as pseudo-healthy anatomy. Anomaly maps are generated from reconstruction residuals between the input and reconstructed image.

The healthy training cohort consisted of 691 curated Gold_700 T2 MRI scans from OpenNeuro, HCP Wu-Minn, and WAND. We used five Behrendt et al.-adapted train/validation folds with a fixed healthy test set, ensuring zero train/validation/test overlap. BraTS21 T2 tumour scans were used only for external pathological evaluation, with 1251 prepared cases evaluated per fold and zero failed evaluations. Performance was assessed using AUPRC and best possible Dice after residual-map post-processing. Across five folds, the model achieved 57.05% ± 4.85% AUPRC and 60.31% ± 2.64% best possible Dice.


Detecting Image Forgery in Lung CT Scans

Saher Mohammed , Sneha Mahato, Uditi Bansal

CODE PDF
[click on the image to enlarge]

Medical image forgery poses a critical threat to healthcare integrity, enabling fraudulent insurance claims and potentially leading to misdiagnosis. This project addresses the problem of detecting tampering in 3D lung CT scans, motivated by India's estimated ₹8,000–10,000 crore annual loss to healthcare fraud involving manipulated medical documents. The developed solution supports applications in medical image authentication, cybersecurity in healthcare, regulatory investigations, and ensuring data integrity in clinical research.

The methodology employs a transfer learning-based ResNet50V2 architecture enhanced with contextual residual learning using adjacent CT slices. Each scan is represented as a 3-channel forensic input comprising previous, current, and next slices, with residual extraction (R = I − G(I)) to highlight tampering artifacts. Augmentation techniques and threshold optimization further improved generalization. This approach directly addressed limitations of prior work, which ignored inter-slice correlations and suffered from computational inefficiency.

The enhanced model achieved 92% accuracy, 93% precision, 92% recall, and 0.92 F1-score, outperforming the baseline ResNet50V2 (86% accuracy) and matching state-of-the-art 3D CNN benchmarks with far lower computational cost. These metrics confirm reliable tampered vs. untampered classification.

While the solution cannot be directly deployed at Plaksha as a university setting lacks the medical infrastructure required, it could be deployed in partnership with hospitals or insurance verification systems. Scaling challenges include dataset diversity across imaging equipment, evolving adversarial forgery techniques, and the need for continuous retraining to maintain detection robustness over time.


Development of an Interactive Diagnostic Tool to Boost F1 Performance

Akansksha Parija , Argh Jain , Vaibhav Verma

CODE PDF
[click on the image to enlarge]

Formula 1 strategy suffers from "signal dilution," where lap-aggregate statistics mask localized performance deficits. This project addresses the unknown "ideal" speed baseline by engineering a high-resolution 1-meter spatial grid to quantify Strategic Drift - the gap between car potential and real-world execution. We developed a competitive modeling framework featuring a Bidirectional LSTM, Time2Vec Transformer, and XGBoost. The Bi-LSTM was selected as the primary benchmark due to its research-validated ability to model dual-directional temporal dependencies, capturing both corner exit momentum and upcoming braking constraints. Technical hurdles like high-frequency sensor noise were mitigated via a 3rd-order Savitzky-Golay filter, while cyclic track encoding resolved start-finish line discontinuities.

The Bi-LSTM achieved a robust R2 of 0.9343 and average error of 15.10 kph; conversely, the Transformer failed to generalize, and XGBoost demonstrated a lack of physical grounding by merely mirroring driver habits. A localized diagnostic audit successfully pinpointed an 11.91 kph loss caused by over-braking. This solution is designed for real-time strategy dashboards to provide engineers with live "Why" reports. Scaling challenges include processing over 1.1 million telemetry points per second and managing "Sim-to-Real" gaps in non-stationary track conditions.


Early Detection of Mental Health Deterioration from Online Behavior

Arya Vachhani , Maan Kumawat , Manya Agrawal

CODE PDF
[click on the image to enlarge]

This project focuses on developing an early warning system to detect mental health crises using longitudinal Reddit data. The main problem we address is the lack of timely identification of individuals at risk before critical events, such as posting in suicidal forums. Early detection allows preventive action and improves access to support. This system can be applied to online platforms for automated support alerts, mental health monitoring tools, and institutional well-being systems.

Our methodology follows a two-stage machine learning pipeline. First, a user-level XGBoost model checks whether behavioral and linguistic features contain predictive signals. Then, we build a temporal window-based model for early detection using transformer-based embeddings, linguistic features, and behavioral patterns. XGBoost handles mixed features well and manages class imbalance using weighted learning. Major challenges included extreme class imbalance (~2.6% positive cases), noisy social media data, and computational limitations, which we addressed using class weighting, preprocessing, and efficient feature engineering.

The model achieves an AUROC of ~0.88 and detects about 71% of pre-crisis cases. Although precision is lower (~11.8%), we prioritize recall since missing a crisis is more critical. We will create a dedicated Reddit forum page for Plaksha and deploy the system there to identify students who may require immediate help, while carefully managing privacy, false positives, and scalability.


Early Detection of Phone Addiction to Model Student Performance

Lakshya Chandhoke, Parthiv Sardena, Tanvir Singh

CODE PDF
[click on the image to enlarge]

This work focuses on the early detection and interpretation of smartphone addiction risk among college students using longitudinal behavioral sensing data from the Dartmouth StudentLife dataset. The proposed system predicts a continuous addiction score using a shallow neural network trained on multimodal smartphone-derived behavioral features, including activity patterns, sleep behavior, mobility, sociability, and device interaction data. Based on predefined thresholds, the predicted scores are further categorized into low, medium, and high addiction-risk levels.

To improve interpretability and provide actionable insights, GradientSHAP is used to identify the most influential behavioral features contributing to each prediction. The system outputs the top three contributing features for every student, enabling both personalized behavioral analysis and identification of broader addiction-related trends.

The dataset consists of over 200,000 sensing samples collected from approximately 220 participants, containing high-dimensional temporal behavioral features. Preprocessing involved removing features with more than 80% missing values, normalizing the remaining features, and handling class imbalance through oversampling and undersampling techniques. Temporal ordering of data was preserved to retain behavioral progression patterns.

A shallow neural network was selected due to its ability to model nonlinear behavioral relationships while remaining computationally efficient and interpretable for explainability analysis and future intervention-focused deployment within academic environments.


Early Failure Prediction in Distributed Systems Using Log Event Sequences

Aarav Jhawar, Dhruv Nyati , Phani Srivatsav

CODE PDF
[click on the image to enlarge]

Modern distributed systems generate large volumes of log data that capture detailed information about system behavior. These logs provide valuable insights into performance, anomalies, and potential failures. However, extracting meaningful patterns from such high-dimensional and sequential data is challenging. Failures often occur without explicit early warnings, making reactive approaches insufficient and highlighting the need for predictive solutions.

This project focuses on early failure detection by analyzing sequences of log events to identify patterns that precede system failures. The objective is to predict failures before they occur, enabling proactive intervention. Such a system can be applied in real-time monitoring of distributed systems, cloud infrastructures, and large-scale data processing platforms, improving reliability and reducing downtime.

We explored multiple machine learning approaches including CNN, XGBoost, and BiLSTM. CNN captured local patterns, while XGBoost leveraged structured representations of event windows. However, BiLSTM was selected as the final model due to its ability to learn sequential dependencies in log events. The model processes ordered event sequences and predicts the likelihood of failure within a future window. Key challenges included class imbalance, sequence representation, and model optimization, which were addressed through sliding window techniques, balanced sampling, and threshold tuning.

The BiLSTM model achieved an F1-score of 0.84 with a recall of 0.87, demonstrating strong detection capability. The solution can be deployed in real-time monitoring pipelines, with scalability addressed through efficient data processing, streaming architectures, and distributed computation frameworks.


EEG-based Brain Stroke Detection and Localization

Avni Gaur, Kashika Kapoor, Om Satpute

CODE PDF
[click on the image to enlarge]

Stroke remains one of the leading causes of death and long-term disability worldwide, with approximately 11.9 million new cases annually. Conventional diagnostic methods such as CT and MRI scans are expensive, time-intensive, and often inaccessible in emergency or resource-limited settings, creating a critical need for faster and more accessible stroke screening approaches. EEG-based analysis offers a promising alternative; however, existing approaches often rely on isolated EEG metrics, lack standardised preprocessing pipelines, and insufficiently capture complex spatial brain activity patterns.

To address these challenges, we develop a dual-pipeline EEG-based stroke classification system using data from stroke patients and healthy subjects. An SVM-based machine learning pipeline is trained on frequency-band power features to establish baseline performance. In parallel, a CNN-based pipeline converts EEG signals into topographic scalp maps, enabling learning of hemispheric asymmetry and spatial abnormalities. Dataset heterogeneity is handled through signal resampling, electrode alignment, band-pass filtering, and artifact removal.

The SVM model achieved 99.75% accuracy with an F1-score of 99.75%, while the CNN model achieved 99.0% accuracy with 99.67% recall. A CNN-LSTM hybrid model was also explored for temporal EEG learning, achieving 75.0% accuracy and an F1-score of 66.67%, but showed lower generalisation performance due to dataset limitations.


Estimating Solar Panel Efficiency Using Weather and Pollution Data

Kanishk Khandelwal, Prajyot Raut , Rahul Aggarwal

CODE PDF
[click on the image to enlarge]

The project focuses on estimating Global Horizontal Irradiance (GHI) using pollution, weather, and solar geometry data, addressing a major challenge in India where air pollution significantly reduces the sunlight reaching the Earth’s surface. Accurate GHI forecasting is important for solar energy planning and power generation optimization. Existing studies typically combine only weather and pollution data or weather and solar geometry, while few use station-level pollution telemetry for large-scale forecasting across India.

To address this gap, the project proposes a nationwide GHI forecasting pipeline using hourly CPCB PM2.5 and PM10 data, ERA5 weather variables, and computed solar geometry features such as solar zenith and azimuth angles. Multiple machine learning models including XGBoost, LightGBM, and LSTM were explored, along with preprocessing techniques such as cyclical time encoding, temporal lag feature engineering, and Yeo-Johnson normalization. One major challenge involved aligning large-scale multi-source datasets while balancing dataset reduction with maintaining nationwide robustness.

Among all models, the LSTM achieved the best performance with an R² score of 0.8669 and MAE of 39.1769 W/m². The proposed system demonstrates strong potential for pollution-aware solar forecasting and scalable deployment for solar energy planning.


Expression Intensity Alignment with Dialogue Context

Abhishek, Manan Singla, Shivika Dhawan

CODE PDF
[click on the image to enlarge]

In real-world performances, what matters isn't just the emotion, but whether the intensity matches the intent behind the script. To address this, we built an Expression Intensity Alignment system with Context of the Dialogue that goes beyond basic emotion detection by evaluating the expected emotional intensity of a script against an actor's actual physical and vocal performance, enabling objective, data-driven feedback for rehearsals and recorded performances.

Because human expression relies on complex cues across text, audio, and video, we implemented a Late Fusion architecture using BiLSTMs to track behavioral flow over time and Temporal Attention to highlight critical frames. Training was challenging — we encountered severe mathematical failures including mode collapse and NaN saturation, alongside missing continuous labels. We overcame these by adjusting activation functions, restructuring our fusion strategy, and applying knowledge distillation from RoBERTa and EmoRoBERTa onto our offline models.

The resulting Tri-Modal Ensemble proved highly reliable, achieving a Pearson Correlation of 0.6462 and a Text Brain Validation MAE of 0.13. Deployed as an offline Gradio dashboard at Plaksha, the system is already used by the university drama club to refine live performances. A live demo is available at: https://huggingface.co/spaces/abhi-s/Multimodal_Emotion. Future work will focus on handling variable microphone quality, diverse lighting conditions, and expressions outside our CMU-MOSEI training distribution.


Flood Prediction in the Brahmaputra Basin Using Terrain Modeling and Weather Conditions

Avi Gautam, Rakshaan Thareja, Shourrya Gupta

CODE PDF
[click on the image to enlarge]

Dhemaji district in Assam, India, is among the most flood-prone regions in the country, experiencing severe seasonal floods due to Brahmaputra river overflow during monsoon months. Timely and accurate flood prediction at a granular spatial scale remains a critical challenge for disaster preparedness and response. This project develops a machine learning-based geospatial flood prediction system for Dhemaji district using a 500m × 500m grid framework, integrating multi-source satellite and environmental data.

We used Sentinel-1 SAR-derived flood labels as ground truth, which provided dynamic daily flood extent maps across six monsoon seasons from 2019 to 2024 overcoming the cloud cover limitations of optical satellite imagery. Features were derived from multiple sources including ERA5 land surface runoff, HydroSHEDS satellite-derived river network data, SRTM elevation and slope, and MODIS tree cover. A key contribution of this work was the derivation of distance to major rivers as a feature, computed from HydroSHEDS river order data, which emerged as the strongest predictor of flood occurrence.

Eleven machine learning models were evaluated including Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, Neural Networks, and others. Gradient Boosting achieved the best overall performance with a precision of 0.845, recall of 0.821, F1 score of 0.833. The final model was trained on 1.09 million spatial-temporal observations and tested on 165,337 unseen observations from the 2024 monsoon season. The system can be deployed as a district level flood risk assessment tool to support early warning systems, infrastructure planning, and emergency resource allocation in Brahmaputra floodplain regions.


Forecasting Stubble-Burning Activity in Punjab Using Satellite, Climate, and Policy Data

Aditt Singh, Aditya Pratap Singh Parmar, Arnav Jain, Tanush Kalhan

CODE PDF
[click on the image to enlarge]

Post-monsoon stubble burning in Punjab contributes over 40 percent of peak-season fine particulate matter in the National Capital Region, yet operational monitoring remains retrospective. This work develops a machine learning framework for one-week-ahead forecasting of fire activity at 7-kilometre grid resolution. A feature table covering 1,036 active grid cells across six burning seasons (2018–2023) integrates NASA FIRMS active-fire detections, MODIS MOD13Q1 vegetation indices, ERA5-Land daily climate reanalysis, and hand-coded Punjab policy indicators, yielding 56,160 observations and 74 engineered features. Five models were evaluated under a strict temporal hold-out: training on 2018–2021, validation on 2022, and testing on 2023. The Optuna-tuned XGBoost-Tweedie model achieved a test PR-AUC of 0.869 and mean absolute error of 1.88, representing a 21 percent reduction over the persistence baseline. Spatial autocorrelation in prediction residuals (Moran's I = 0.54, p < 0.001) motivated a ConvLSTM stretch model, which further reduced mean absolute error to 0.43. Methodological rigour is supported by an eight-check leakage audit and external validation against four Central Pollution Control Board air quality monitoring stations (Pearson r = 0.83).


Framing bias detection

Kanav Nanda, Sartajdeep Singh, Shlok Gupta

CODE PDF
[click on the image to enlarge]

News media misleads through framing, tone, word choice, and selective emphasis without factual inaccuracy. Existing tools only output binary "biased/not biased" labels, giving no actionable severity insight. This project builds a continuous 1–10 bias severity scorer trained on BABE, a dataset of 3,700 expert-annotated news sentences. A custom formula combining expert bias labels, subjectivity tiers, and lexical density converted categorical annotations into regression targets. Applications include newsroom editorial screening, social media content ranking, media literacy education, and computational journalism research. The societal impact lies in making framing bias measurable and actionable for readers, editors, and platforms alike.

Three ML pipelines were explored TF-IDF with tree models proved too sparse and semantically weak; PCA reduced dimensionality but lost context; the final Embedding + XGBoost hybrid, combining dense semantic embeddings with 28 handcrafted linguistic features, achieved MAE 1.20, RMSE 1.63, and R² 0.78. Key challenges were the absence of continuous labels, high-dimensional sparse features, and annotator disagreement, tackled through formula-based weak supervision, PCA, and noise filtering. The lightweight pipeline is deployable at Plaksha as a browser extension or API, though scaling challenges include domain drift, evolving language, and adversarial framing evasion.


Hemispheric Functional Connectivity-Based Brain Age Prediction for Alzheimer’s Disease using fMRIs

Aadi Arora, Ananya Ramesh, Rushil Gargash , Sara Hanspal

CODE PDF
[click on the image to enlarge]

Alzheimer's Disease (AD) is a progressive neurodegenerative disorder characterized by gradual cognitive decline, affecting millions worldwide. Early and accurate detection remains a critical clinical challenge, as structural changes visible on MRI often appear only after significant neurodegeneration has already occurred. Resting-state functional MRI (fMRI) offers a promising alternative by capturing functional connectivity (FC) patterns which signify the synchronization of neural activity between brain regions; these are sensitive to disease-related disruption before structural damage becomes apparent. Previous studies have demonstrated that brain-predicted age, derived from whole-brain FC matrices, is significantly elevated in symptomatic AD compared to cognitively normal controls, establishing the Brain Age Gap (BAG) as a sensitive neuroimaging biomarker.

We propose a pipeline that decomposes the FC matrix extracted from resting-state fMRIs into three anatomically motivated components which are left intra-hemispheric, right intra-hemispheric, and inter-hemispheric connectivity. We then train 3 separate brain age regression models on the respective components using the resting-state fMRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our hypothesis is that inter-hemispheric disconnection, reflecting degeneration of corpus callosum pathways, is a particularly sensitive driver of accelerated brain aging in AD compared to within-hemisphere connectivity changes. Our methodology applies PCA for dimensionality reduction of high-dimensional FC feature vectors, followed by SVR trained exclusively on cognitively normal (CN) subjects to establish a healthy aging baseline. After training, The models are applied to CN and AD subjects to first predict subject Brain age, and then compute per-subject Brain Age Gap (BAG) scores for each hemispheric component.

Group differences in BAG are assessed using t-tests and ANOVA, and BAG scores are also correlated with Mini-Mental State Examination (MMSE) scores to validate clinical relevance. A SVM classifier then uses the hemispheric BAG scores as features to discriminate between diagnostic groups, evaluated using confusion matrices, ROC curves, and F1 scores under stratified cross-validation. Our results specify which of the hemispheric connectivity components contributes most to brain age acceleration in AD, offering a biologically interpretable decomposition of the brain age signal that extends beyond prior whole-brain approaches. The pipeline finishes up in a per-subject report card combining the predicted brain age, hemispheric BAG breakdown, cognitive score correlation and diagnostic classification. Ultimately, Our project aims to break down complex fMRI data into interpretable diagnostic results that rely primarily on Brain Age as a biomarker.


Image Dating Computer Vision Modelling

Mihir Rao, Mukund Arun, Sidhant Jain

CODE PDF
[click on the image to enlarge]

There exist many archives with millions of images where the year timestamp is missing or only roughly estimated, so our image-year prediction model can help infer a more precise time period. It can be used to organize and verify large collections like the Library of Congress, detect inconsistencies in news or online media, and also be offered as a service to help individuals date old family photographs more accurately.

We’ve implemented a pipeline that combines numerical features with high-level semantic representations. Specifically, we make use of CLIP embeddings to capture rich visual semantics, which are then used to train multiple models. This includes Random Forest, Linear SVM, SGD Classifier, and Logistic Regression. We use these different methods for comparing performance and determining the most effective approach for our task. Our dataset consists of approximately one million images spanning seven decades, each labeled with its actual year of capture.

Performance is evaluated primarily using mean absolute error (MAE), alongside classification accuracy. We have proposed that a guess is accurate, if the predicted year is within +-5 years from the actual year. Currently, our models achieve an accuracy of approximately 60%, representing a significant improvement over existing approaches in the literature. This confirms that the combination of semantic embeddings and traditional machine learning models is quite effective. The pipeline can definitely be deployable in institutional areas like Plaksha by integrating it with digital archiving systems.


Improving Vehicle Classification Under Domain Shift

Rejin Prasad, Rishabh Dhiman, Vibhav Sahu

CODE PDF
[click on the image to enlarge]

This project investigates label-efficient day-to-night vehicle classification under domain shift using Domain-Adversarial Neural Networks (DANN). Traffic models trained in daytime conditions can become less reliable at night because of reduced visibility, glare, reflections, blur, and sensor noise. Fully labelling nighttime target data can improve adaptation, but it is costly to repeat across cameras, roads, and environments. To study this in a controlled way, we use crop-level bus, car, and truck classification from BDD100K, keeping the recognition task fixed while varying the amount of labelled nighttime data.

We conduct a target-label sweep from 0% to 100% labelled nighttime crops using a warm-started DANN framework and compare it with supervised baselines. Performance improves quickly at low label ratios and saturates around 75% labelled night data. DANN 75% reaches 86.76% accuracy, nearly matching the 86.81% achieved with full nighttime labels; the final 25% labels add only +0.05 percentage points.

To test whether this selected representation remains useful outside BDD, we transfer the trained backbones to IDD under few-shot settings for the same three classes. DANN 75% achieves the strongest IDD result, reaching 96.1% accuracy at K=50 with the lowest variance across seeds. Overall, DANN 75% provides the best evaluated balance between nighttime label efficiency, BDD performance, and cross-environment backbone reuse. This supports partial target supervision rather than assuming full labelling is always best.


Infant Cry Longitudinal Analysis

Aditya Tomar, Apurv Bhushan, Lakshya Gupta

CODE PDF
[click on the image to enlarge]

Infant health monitoring remains largely reactive, with caregivers unable to identify health issues until symptoms become visible. Research suggests physiological changes often manifest as shifts in cry acoustics hours or days before this point.

ICLAS uses a fine-tuned ECAPA-TDNN backbone to build per-infant acoustic profiles from which longitudinal drift is measured. Each cry is encoded into a 192-dimensional embedding representing acoustic identity. An autoencoder anomaly detection layer then scores each new cry against the infant's personal baseline. A gated continual updating mechanism inspired by pyCLAD ensures illness-related changes are flagged rather than absorbed into the baseline. Key challenges included catastrophic forgetting, microphone variability across consumer devices, poor performance in noisy environments, and false positives before a stable baseline is established.

The backbone achieved an EER of 21.38%, approximately 4% better than the CryCeleb 2023 benchmark. The gating classifier achieved 98.57% accuracy distinguishing infant cries from non-cry sounds including snoring, pink, white, and brown noise. The EER improvement confirms domain adaptation produces more discriminative per-infant embeddings, and the gating accuracy confirms the system reliably filters which audio is eligible for baseline updates.

ICLAS is designed for integration into existing smart baby cameras as a lightweight software layer requiring no additional hardware. By surfacing early-warning signals before symptoms appear, ICLAS has the potential to meaningfully improve infant health outcomes in home settings where continuous clinical monitoring is unavailable.


IPO Quantitative Analytics and Listing Predictor

Bandaru Jaya Akshitha, Divy Gupta , Mokshith Royal, Nagaveni

CODE PDF
[click on the image to enlarge]

Initial Public Offerings (IPOs) often exhibit significant price volatility on the listing day, making investment decisions highly uncertain and speculative. Accurate prediction of IPO listing-day returns can improve investment efficiency, reduce information asymmetry, and enhance risk management in primary markets. This study aims to develop a machine learning–based predictive framework for forecasting IPO listing-day returns by integrating company fundamentals, subscription demand patterns, and market sentiment indicators. The dataset consists of IPOs issued during the study period, with explanatory variables capturing firm-level financial characteristics, investor subscription behavior across different categories, and prevailing market sentiment indicators such as grey market premium and overall market conditions. Several machine learning models, including regression-based and tree-based algorithms, are employed to model the nonlinear relationships between these variables and IPO listing-day returns. Model performance is evaluated using standard metrics such as Mean Absolute Error, Root Mean Squared Error, and out-of-sample prediction accuracy. The findings demonstrate that machine learning models outperform traditional linear approaches in capturing complex interactions between fundamentals and market sentiment. Subscription demand and sentiment indicators emerge as strong predictors of listing-day performance, highlighting the behavioral aspects of IPO pricing. This research contributes to the literature by illustrating the effectiveness of machine learning techniques in financial prediction and offers practical insights for investors, issuers, and policymakers seeking to improve decision-making and pricing efficiency in IPO markets.


KEY MONKEY: machine learning Model for Piano Note Prediction

Khant Mota , Sameera John , Smriti Kinra

CODE PDF
[click on the image to enlarge]

Predicting next sequence of piano notes from raw piano audio, remains a challenging problem due to simultaneous chords, sustain pedal effects, and complex acoustic overlaps.This project addresses the gap between frame-based CNN models (strong on timing, weak on musicality) and sequence-based Transformer models (strong on context, struggling with polyphonic overlaps and hallucinations). We developed a hybrid LSTM-based architecture trained on MAESTRO dataset (1,184 performances, 200+ hours) to predict piano notes with both temporal precision and musical coherence. We implemented and compared three recurrent architectures: Vanilla RNN (1.4M parameters), GRU (4.1M parameters), and LSTM (5.5M parameters), alongside Random Forest and SVM baselines using sliding-window feature extraction.

The LSTM model achieved 94.91% test accuracy and 0.1082 validation loss, outperforming simpler approaches on long-range musical sequences while maintaining frame-level precision. Key challenges tackled included handling note offset ambiguity (acoustic decay after key release), preventing hallucination in polyphonic chords, and addressing class imbalance (19:1 silence-to-note ratio) through weighted loss functions. The solution is deployable at Plaksha as a real-time piano transcription tool in music education software, with scalability challenges including inference latency on long performances and memory constraints for songs exceeding 15 minutes.


Modeling Access and Barriers to Financial Inclusion

Aarav Mathur, Malini Sen, Vincent Zacharias

CODE PDF
[click on the image to enlarge]

Financial inclusion is often measured by account ownership, yet a substantial gap persists between access and actual usage of digital financial services. This project addresses the problem of “digital financial inactivity” among account holders, aiming to identify and rank the key barriers preventing active participation. Solving this problem has direct applications in policymaking, fintech design, and targeted interventions to improve financial engagement, particularly in developing economies.

The potential impact includes more effective inclusion strategies, reduced dormant accounts, and improved economic resilience. Using the 2025 Global Findex dataset (144,000 observations, 199 features), we restrict analysis to account holders in emerging economies and define inactivity as the target variable. After extensive preprocessing: removing leakage, handling 65% missingness (feature dropping and median imputation), and addressing collinearity, we apply Logistic Regression as a baseline and XGBoost to capture non-linear patterns.

Models are validated using Leave-One-Country-Out Cross-Validation (LOCO) to ensure cross-country generalizability, with SHAP used for interpretability and barrier ranking. Key challenges included high missingness, feature leakage, and heterogeneity across countries, addressed through robust filtering and validation strategies.The model achieves a ROC-AUC of 0.82, indicating strong discriminatory power, supported by Precision-Recall performance. The solution is deployable at Plaksha as a policy analytics tool, though scaling may face challenges such as data availability, regional biases, and evolving digital behaviors.


Modeling the Spatio-temporal Progression of Glioma Tumour

Gaganpreet , Karri Divya Naidu , THS Eswar Reddy

CODE PDF
[click on the image to enlarge]

Glioma, a aggressive primary brain tumor, presents significant clinical challenges due to its heterogeneous progression patterns and complex longitudinal behavior. Predicting post-treatment progression risk remains difficult, as traditional approaches fail to capture the spatiotemporal dynamics embedded across sequential MRI scans.

This project develops a deep learning survival analysis framework that integrates longitudinal multi-modal MRI data with clinical features to predict individualized progression risk for glioma patients using the MU-Glioma-Post dataset. The methodology combines a 3D Convolutional Neural Network (CNN) encoder, extracting tumor-centred spatial features from eight channel MRI patches (T1, T1-contrast, T2, FLAIR, and four segmentation mask channels), with a Long Short-Term Memory (LSTM) network that aggregates features across up to six sequential timepoints. Risk scores are optimized using Cox Proportional Hazards loss, elegantly handling censored patients who did not experience progression during follow-up.

Clinical variables including genomic markers, treatment history, and demographic features were preprocessed and dimensionality-reduced via PCA before integration. The model is evaluated using Concordance Index (C-index), time-dependent AUC, and Integrated Brier Score across five-fold cross-validation with a held-out test set. A well-performing solution could be deployed at Plaksha as a clinical decision-support tool, helping oncologists stratify patients into risk groups and personalize treatment planning, though scaling challenges include data privacy, compute infrastructure, and prospective validation requirements.


Molecular Property Cliff Detection

Dev Vasani , Sarthak Goel, Yesha Ravani

CODE PDF
[click on the image to enlarge]

This project aims to resolve the issue of detecting property cliffs where structurally close molecules exhibit huge variations in the property values, like the HOMO-LUMO gap. As the traditional ML models rely on the assumption of a smooth structure-property relationship, they become inconsistent in detecting cliffs. Detecting molecular property cliffs accurately is important for applications in drug design, material design, etc, as unexpected molecular behavior leads to expensive experimental errors.

To tackle the above challenge, we developed a graph neural network (GNN) based pairwise classification approach, as molecules are generally graph-structured. We used the QM9 database, where molecules are converted into graphs using RDKit, and pairs of molecules similar to each other structurally are formed with the help of Morgan fingerprints and Tanimoto similarity. The model uses graph convolutional layers (GINEConv) with message-passing capabilities, which are followed by a multilayer perceptron layer for classification. We used techniques like class weight, and threshold tuning to overcome the problem of class imbalance and lack of diversity in the dataset.

Our model achieved a PR-AUC of 0.9361, ROC-AUC of 0.9922, an F1 score of 0.8477, a precision of 0.8016, and a recall of 0.8996. This demonstrates effective detection of rare cliff cases. After scaling up, it might affect reliability on unseen molecules and can lead to high computational cost.


Mood Classification of Hindi/Hinglish Songs Using Audio and Emotional Features

Akshit Bansal , Navya Dhody, Saanvi Bhaskar, Sia Jain

CODE PDF
[click on the image to enlarge]

Music is deeply tied to human emotion, yet most recommendation systems still organize songs using broad genre labels that fail to capture how music is actually experienced. This challenge becomes even more significant for Hindi and Hinglish songs, where emotional expression often overlaps across lyrics, instrumentation, and storytelling styles. Our project, Moodify, presents an automated music mood classification system designed specifically for Hindi/Hinglish music, categorizing songs into four emotional classes: Happy, Sad, Romantic, and Excited.

To address the absence of suitable public datasets, we manually curated and labelled a dataset of over 1600 songs spanning multiple decades, artists, and musical styles. Audio features including MFCCs, chroma, spectral contrast, tempo, RMS energy, and rhythm descriptors were extracted using Librosa, while high-level emotional representations such as valence and arousal were explored using Essentia.

The project followed an iterative experimental pipeline beginning with Logistic Regression as a baseline model, followed by dataset refinement, rebalancing techniques, feature engineering, and advanced machine learning approaches including Random Forest, XGBoost, SVM, and Gradient Boosting. Deep learning architectures such as TabTransformer were further explored to capture complex nonlinear relationships between acoustic features, achieving the best overall performance with an accuracy of 62.45% and F1-score of ~0.61

The proposed system demonstrates the potential of culturally aware mood recognition for improving personalized recommendation systems, intelligent playlist generation, and AI-driven music understanding for Indian music ecosystems.


Multi-modal Detection of Alzheimer's Using Health History and MRI Images

Gautam Ganesh, Krish Chhugani, Sahasra Kovvuru

CODE PDF
[click on the image to enlarge]

Our project is a multimodal deep learning framework for Alzheimer’s disease detection using MRI images and clinical data from the OASIS-2 longitudinal dataset. The objective is to improve early diagnosis accuracy and predict severity while reducing computational complexity using a lightweight ResNet-18 architecture.

MRI images were preprocessed using Deep-BET for brain extraction, followed by augmentation techniques such as small rotations and magnification to improve model robustness and reduce overfitting. Clinical features like MMSE, CDR, and brain volume were standardized and combined with extracted MRI features. ResNet-18 was used for feature extraction, and multiple approaches including T-LSTM, fine-tuned ResNet-18 classification, and ResNet-18 with XGBoost were explored.

Among all approaches, the multimodal ResNet-18 + XGBoost framework achieved the best performance, obtaining a weighted precision of 0.6347, weighted recall of 0.6469, and weighted F1-score of 0.6396. The model demonstrated improved classification capability by effectively combining MRI features with clinical features.


Multimodal Phishing Detection

Sighakolli Jahnavi, Cheerla Parthiv Sagar

CODE PDF
[click on the image to enlarge]

Sophisticated phishing attacks increasingly use polymorphic techniques and visual spoofing to bypass traditional, single-modality security filters, leading to severe credential theft. This project addresses this by developing a robust Multimodal Late-Fusion Ensemble to detect zero-day phishing threats by cross-analyzing three distinct webpage dimensions. Potential applications include enterprise proxy gateways and browser extensions, impacting cybersecurity by neutralizing breaches before user interaction.

We engineered a "Wide-Stacking" architecture deploying nine specialized machine learning experts: Tree-based models (XGBoost, LightGBM, Random Forest) evaluate 56 deterministic URL lexical features; Distance and Neural models (SVM, MLP) analyze MPNet-encoded semantic HTML text; and Linear Classifiers evaluate ResNet50 visual embeddings because different data modalities require specific mathematical logic. Tree-based models process deterministic URL features via non-linear cutoffs; Distance models (SVM) classify high-dimensional semantic HTML text (MPNet); and Linear Classifiers evaluate visual embeddings (ResNet50) to prevent overfitting. These models evaluate webpage artifacts independently, preventing visual features from mathematically overshadowing subtle lexical cues, before a Logistic Regression meta-classifier synthesizes the final prediction.

Challenges included hardware memory faults during concurrent training and the algorithmic complexity of feature overshadowing, tackled via constrained single-core execution and Late-Fusion design. Achieving 98.20% accuracy, 0.9843 recall, and a 0.9820 F1-score, the metrics mathematically prove the system effectively traps active threats (high recall) while minimizing operational false alarms (high F1). At Plaksha, this can be deployed at the network perimeter, though scaling requires mitigating deep learning inference latency during high traffic.


Ocean Subsurface Reconstruction via 2DV-Unet

Aditya Jaitly, Rohan Goyal, Vir Dang

CODE PDF
[click on the image to enlarge]

Accurate subsurface ocean data is vital for climate analysis and forecasting, yet in situ vertical profiles remain sparse. This project reconstructs dense, multi-depth subsurface temperature and salinity fields from widely available satellite surface observations (SST, SSS, SLA, geostrophic velocities, and wind forcing).To achieve this, we implement a physics-informed deep learning approach based on a 2D U-Net architecture. Operating as a supervised spatial regression framework, the model maps gridded, multi-source surface inputs to 52-channel multi-depth outputs. The U-Net's encoder-decoder structure captures broad regional patterns, while its skip connections transfer high-resolution spatial information to preserve sharp horizontal gradients and fine-scale structures like eddies. Training is supported by a scalable pipeline utilizing random patch-based spatial sampling from preprocessed NetCDF inputs.Crucially, the network is optimized using a custom composite loss function that enforces physical plausibility alongside data accuracy. This objective combines a primary mean squared error (MSE) loss against reference GLORYS fields with two soft physical constraints: a surface consistency loss anchoring top-depth predictions to satellite inputs, and a thermal stability penalty punishing unphysical vertical temperature inversions. Evaluated via depth-profile performance metrics, the model achieves a Temperature RMSE of 0.2–0.4°C and a Salinity RMSE of 0.5–0.8 psu, successfully translating surface dynamics into physically reliable 3D ocean structures.


Onset of Disease Detection in Paddy Plants

Aastha, Jaspreet Yadav, Reshma Murali

CODE PDF
[click on the image to enlarge]

Rice leaf diseases significantly reduce crop yield and pose a major threat to agricultural productivity, especially in regions dependent on rice cultivation. Early and accurate detection of diseases is crucial for preventing large-scale crop damage and enabling timely intervention. This project addresses the problem of automated rice leaf disease classification using a hybrid machine learning approach that combines deep learning and handcrafted feature engineering.

The proposed methodology uses a CLAHE-based preprocessing pipeline to enhance image contrast followed by resizing images to 128×128 pixels. Feature extraction is performed using a fusion of MobileNetV2-based deep features and classical descriptors including GLCM texture features, Local Binary Patterns (LBP), Histogram of Oriented Gradients (HOG), and HSV-based color moments. These features collectively capture semantic, texture, shape, and color variations in diseased leaves. The extracted features are standardized and reduced using PCA to retain 95% variance, improving efficiency and reducing redundancy. A Random Forest classifier is then trained for final disease prediction due to its robustness on high-dimensional heterogeneous feature spaces.

The model achieves strong performance with high accuracy and stable cross-validation results, indicating good generalization. Additionally, a rule-based severity estimation module provides interpretable disease staging. The solution can be deployed in agricultural monitoring systems or low-cost diagnostic tools at Plaksha University for real-time crop health assessment. However, scalability challenges may include computational cost of feature extraction and adaptation to field images with variable lighting and background noise.


Pay Parity across Demographics

Kavyaa Agrawal, Sadhika Anand , Sahil Gada

CODE PDF
[click on the image to enlarge]

Gender inequality is often reduced to average wage gaps, overlooking deeper disparities in leadership access, career progression, and demographic inequality. This study develops an explainable machine learning framework to analyze gender disparities across wages, seniority, and employment outcomes using the 2022 American Community Survey (ACS) dataset containing over 567,000 observations.

Using XGBoost, Neural Networks, and SHAP (SHapley Additive exPlanations), the framework models wage outcomes and estimates the probability of individuals attaining senior occupational positions across industries while analyzing how these outcomes vary by gender. The study further incorporates intersectional factors such as race, marital status, education, occupation, class of worker, and industry to examine how demographic groups experience inequality under similar labor-market conditions. This enables comparative analyses between groups such as White women vs White men and Hispanic women vs Hispanic men, highlighting how structural disparities vary across both gender and demographic categories.

To isolate the effect of gender, a counterfactual approach compares predicted outcomes under alternate gender assignments while holding all other variables constant. Results show that disparities are highly sector-specific and that leadership gaps persist even among individuals with comparable qualifications and work patterns. This framework can help policymakers, organizations, and institutions identify industries and demographic groups where structural wage and leadership disparities are most prominent, enabling more targeted interventions for workforce equity and inclusive labor-market policies.


Physics Aware Machine Learning Models for Exoplanet Identification

Anirudh Puri, Esraaj Sarkar Gupta, Suyash Prakash

CODE PDF
[click on the image to enlarge]

The challenge in exoplanet detection involves distinguishing true planetary transits from astrophysical false positives that mimic these signals. By utilizing the Kepler Object of Interest (KOI), TESS Object of Interest (TOI), and K2 datasets, we developed a classification system to compare physics-aware classical machine learning against purely data-driven models. Our methodology employed Random Forest and Histogram-based Gradient Boosting to process features anchored in four theoretical physical principles: transit depth consistency based on the planet-to-star area ratio, duration consistency assuming circular orbits, impact parameter squared, and thermal consistency derived from the Stefan-Boltzmann Law. While physics-aware models reached an accuracy of 0.9550 and a PR-AUC of 0.9913 on the high-quality KOI dataset, performance varied across different satellite missions. In highly noisy environments like K2, purely data-driven models were preferred because they could more effectively fit to the complex systemic noise that often violates idealized physical assumptions. Conversely, on the TESS dataset, physics-aware features improved the precision-recall balance, suggesting that the mission's noise profile does not prevent physical anchoring from outperforming purely statistical models. This automated solution could be deployed at Plaksha to accelerate the discovery of new worlds, though scaling will require addressing the increased algorithmic complexity and varying sensitivity of physical anchors across different noisy data sources.


Pre-Release Movie Success Prediction using ROI-Driven Feature Engineering and Machine Learning

Ashmith K P, Swaminath Reddy, Varun K N

CODE PDF
[click on the image to enlarge]

Predicting a movie’s commercial outcome before its release is a critical challenge for producers, investors, OTT platforms, and distributors aiming to reduce financial risk and improve decision-making. This project develops a data-driven machine learning approach to classify movies into success categories (Hit, Average, or Flop) using features such as genre, budget, cast, and production details. The system can assist stakeholders in investment planning, content acquisition, and marketing strategy optimization, while demonstrating the practical application of AI in the entertainment industry.

The methodology involved constructing custom datasets for Indian and Hollywood films through data collection from TMDB and Wikipedia to ensure quality and flexibility. A key contribution is the use of ROI-based engineered features representing the historical performance and “star power” of actors, directors, and production companies. Multiple supervised learning models, including Logistic Regression, Random Forest, Gradient Boosting, and XGBoost, were trained to capture patterns in historical data.

Challenges included limited and inconsistent datasets, missing financial information, and overfitting due to small sample sizes. These were addressed through preprocessing, feature engineering, and dataset balancing. The best performance was achieved by Random Forest (84% accuracy, 84% F1-score) on binary and Gradient Boosting (75% accuracy, 76% F1-score) on 3 class classifications. The solution can be extended into a web-based decision support system for the film industry.


Predicting Real Estate Prices Using Satellite Imagery

Aditya Mishra, Atri Somanchi, Siddharth Singh

CODE PDF
[click on the image to enlarge]

This project develops a city-wise housing price prediction framework for India using large-scale scraped real estate listings with geospatial, infrastructural, and socioeconomic context. Beyond conventional listing attributes such as price, area, and BHK configuration, the pipeline integrates lesser resolution than standared spatial indexing for better merge quality, road accessibility measures, satellite-derived land and built-environment indicators, and district-level socioeconomic proxies to better capture urban spatial structure and neighborhood effects.

A central focus of the work is data reliability and spatial consistency. The system emphasizes improving merge quality, geocoding precision, locality resolution, and spatial coverage so that external contextual features are incorporated only when spatial alignment is sufficiently trustworthy. To reduce overly optimistic evaluation commonly caused by spatial leakage, the modeling framework employs city-wise and spatially aware train-test splits instead of naive random sampling.


Predicting Region-wise Agricultural Production

Adyant Jha, Aksh Jhawar, Krissh Modi

CODE PDF
[click on the image to enlarge]

Agricultural production in Gujarat is highly influenced by weather conditions such as temperature, rainfall, humidity, solar radiation, and wind speed. However, accurately estimating wheat production before harvest remains a major challenge for policymakers and agricultural authorities, as current estimates largely depend on surveys that may often be inaccurate. This study develops a machine learning-based prediction system to estimate total wheat production before harvest using weather data from the crop growing season. The model focuses on cotton production in Gujarat, one of the leading cotton-producing states in India.

A diverse set of weather-related predictive features was constructed using seasonal climate data, including temperature, rainfall, rainfall count (number of rainy days), humidity, solar radiation, and wind speed. These variables were selected because of their direct impact on crop growth, water availability, photosynthesis, and overall wheat productivity. By analysing these weather conditions during the growing months, the model predicts the expected wheat production before harvesting takes place.

Several machine learning approaches were considered for crop production prediction. These models include Ridge Regression, Elastic Net, Random Forest, Gradient Boosting, SVR(RBF), and Multi-layer Perceptron Neural Network. The model which performed the best was Ridge Regression, with an R² of 0.5969 and MAE of 34.70%.

The proposed system provides a scalable framework for weather-driven agricultural production forecasting with applications in agricultural planning, procurement, storage management, irrigation planning, and farmer relief measures. Early prediction of wheat production can reduce dependence on inaccurate survey-based estimates, help policymakers take timely decisions, improve resource allocation, and strengthen food security and agricultural management.


Predicting Shipment Delays and Identify Supply Chain Bottlenecks

Anvita Ghosh, Ishita Sapra, Twisha Agrawal

CODE PDF
[click on the image to enlarge]

Pharmaceutical shipments to developing countries face delays driven by operational inefficiencies, structural barriers like poor customs infrastructure and environmental disruptions. Existing models predict delays but fail to explain why, limiting actionable response. Our solution creates a novel dataset, vertically combining shipment data, World Bank logistics scores and disaster records – capturing both operational and macro-environmental conditions at the destination country level – to decompose each predicted delay into feature-level contributions.

This enables targeted interventions like disaster-responsive rerouting and vendor accountability, ultimately reducing life-critical medicine stockouts in vulnerable regions. We employed XGBoost as our primary model, selected for its native handling of class imbalance via scale_pos_weight, with SHAP decomposition attributing each prediction to operational, structural or environmental causes. Alternative models were benchmarked but deprioritized as XGBoost maximized F1-score while maintaining high recall critical since missing a delayed shipment is costlier than a false alarm.

Key challenges included 88/12 class imbalance (addressed via SMOTE), 18.7% missing disaster data (resolved through hierarchical median imputation) and temporal leakage (prevented using walk-forward temporal validation). Our model achieves 79.63% balanced accuracy, 73% recall, 52.34% F1-score, and 88.8% ROC-AUC. Scaling challenges include maintaining real-time disaster data feeds, adapting to regions with sparse logistics ratings, and ensuring prediction latency remains low for operational decisions.


Predicting Sleep Onset Latency using wearables

Divyannsh Pincha, Kabir Bhalla , Sambhav Banthia , Shikhraj Singh

CODE PDF
[click on the image to enlarge]

Sleep onset latency is a key indicator of sleep readiness, but most consumer wearables still provide only retrospective sleep summaries rather than actionable predictions before bedtime. This project addresses the problem of predicting whether a person will fall asleep within 15 minutes, enabling more timely and personalized sleep guidance.

We developed a machine learning pipeline using wearable physiological signals and contextual features from the Dryad dataset, and also evaluated the approach across multiple models to test generalizability. The methods explored include CatBoost for structured feature-based learning and losgistic Regression and Random Forest models. These models were designed to learn from features such as heart rate, heart rate variability, accelerometer-based activity, skin temperature, and time-based context.

The project is useful for applications such as bedtime decision support, sleep coaching, and personalized sleep optimization. Its broader impact lies in helping users make better sleep decisions using objective physiological data. Although dataset differences and limited sample sizes posed challenges, the work demonstrates a practical framework for real-time sleep onset prediction using wearable data


Respiratory disease classification using acoustic features from cough and vowel sounds

Gracy Tanna, Ishrat Bombaywala, Pahul Singh

CODE PDF
[click on the image to enlarge]

Respiratory diseases such as asthma, COPD, and COVID-19 affect millions globally, yet overlapping symptoms often make diagnosis difficult. This project presents acoustic Health, a machine learning-based respiratory disease screening system that analyses cough and sustained vowel sounds to classify patients into four categories: healthy, asthma, COPD, and COVID-19. The system aims to provide a non-invasive, scalable, and low-cost preliminary screening solution for healthcare settings with limited diagnostic infrastructure.

The study utilised over 78 hours of respiratory audio recordings from the Coswara and Corp datasets. Audio preprocessing included Butterworth bandpass filtering, silence trimming, amplitude normalisation, and feature extraction. A total of 121 acoustic and metadata-based features were generated from cough and vowel recordings. Multiple classification pipelines were explored, including cough-only, vowel-only, cough + vowel combined, CNN-based audio classification, hybrid CNN + XGBoost models, and a two-layer framework. Models evaluated included XGBoost, Random Forest, Histogram Gradient Boosting, Logistic Regression, SVM, and CNNs using transfer learning with EfficientNet-B0.

To address class imbalance, Random Oversampling and balanced class weights were applied. The best-performing model, a hybrid CNN + XGBoost architecture trained on extracted cough and vowel features, achieved *69.13% accuracy, 69.13% recall, and a macro F1-score of 68.87%*, demonstrating strong performance for multi-disease respiratory classification.


Resume Skill-Gap Analyser

Aditya Arora, Kuhuk Katiyar, Reya Saigal

CODE PDF
[click on the image to enlarge]

Modern resume–job matching systems primarily rely on keyword overlap or semantic similarity, which often fails to capture true hiring relevance, capability gaps, and missing critical skills. In this project, we develop a hybrid resume–job reranking framework that combines semantic, lexical, skill-aware, and structural features to improve ranking reliability beyond SBERT-only retrieval.

Our pipeline first uses SBERT for large-scale semantic retrieval, reducing over 2 million possible resume–job combinations into a manageable candidate pool. We then apply a supervised reranking model using features such as TF-IDF similarity, weighted skill overlap, missing-skill importance, title similarity, and years of experience. Skill extraction is enhanced using an ESCO-derived ontology containing approximately 20,000 normalized skill expressions.

To address the lack of reliable ground-truth labels, we construct a gold-standard dataset using majority-vote consensus from three large language models: Claude, GPT, and Gemini. We further compare weak supervision, self-training, and active-learning-based supervision strategies. Experimental results show that confidence-based pseudo-labeling fails to generalize effectively, while uncertainty-based active learning significantly improves performance.

Our final Logistic Regression reranking model achieves an F1-score of 0.769 on a held-out gold-standard test set, outperforming the SBERT-only baseline by approximately 16.5 percentage points while providing more interpretable skill-gap analysis.


Robust and Interpretable Deep Learning for Diabetic Retinopathy Grading under Realistic Image Degradation

Gitanjali Atri , Lavanya Gupta, Sukant

CODE PDF
[click on the image to enlarge]

Diabetic retinopathy grading from retinal fundus images is a critical task in automated medical screening, where disease severity is inherently ordinal in nature. In this work, we propose a deep learning-based framework that integrates model development, ordinal-aware learning, and systematic robustness evaluation under realistic image corruptions. We train and compare two models: a standard cross-entropy classifier and an ordinal regression model designed to exploit the ordered structure of disease grades. To assess model reliability beyond clean test conditions, we introduce a controlled degradation pipeline that simulates clinically relevant variations, including brightness shifts, contrast reduction, motion blur, and sensor noise across multiple severity levels. Model performance is evaluated using accuracy, and Quadratic Weighted Kappa (QWK), which provides a more appropriate measure for ordinal prediction tasks by penalizing predictions based on their distance from the ground truth. Furthermore, we employ Grad-CAM to analyze model interpretability and examine attention drift under increasing degradation severity. Experimental analysis demonstrates differences in robustness and stability between ordinal and cross-entropy formulations, highlighting the importance of ordinal modeling and stress testing for reliable deployment in real-world clinical settings.


Sentiment Analysis Using Speech For Indian Languages

Het Patel, Hussein Akolawala, Jorawar Singh

CODE PDF
[click on the image to enlarge]

Humans rely heavily on speech to communicate content and emotions simultaneously. Therefore, in developing speech systems, the knowledge of emotions should be used as it allows systems like speaker recognition, synthesis and language identification understand the speakers mind and reaction.

While Speech Emotion Recognition (SER) systems have come a long way there hasn’t been a lot of work done to create these systems in Indian Languages. Along with that most systems focus only on detecting emotions rather than helping speakers improve their speech. In this project we are proposing a system which not only classifies emotions but also helps speakers understand how to speak to expressive a specific emotion.

Acoustic and spectral features are extracted from speech and are used in an SVM to first detect which emotion is being conveyed. After that a neural network-based regression model is trained to learn the difference between acoustic feature distributions of different emotions. Given a user’s speech features and a desired target emotion, the model predicts adjustments indicating how pitch, energy, and articulation should change.


Swipester: The next generation of Thrifting

Agastya Tiwari, Sana Arora, Siddhanth Karthikeyan

CODE PDF
[click on the image to enlarge]

This project addresses two key challenges in thrifting, which are accurate personalisation from noisy real-world images and lack of trust in thrift marketplaces due to uncertain garment quality. These problems impact user engagement, reduce returns, and undermine Swipster's platform credibility. The solution has applications in e-commerce platforms, resale marketplaces, and automated moderation systems, with the potential to improve recommendation quality, enhance trust, and streamline quality control.

For personalisation, we developed a multimodal machine learning pipeline using EfficientNet image embeddings combined with captions, metadata, and attribute labels. Models including logistic regression, LinearSVC, boosting, and soft-voting ensembles were used, as they handle high-dimensional features and improve robustness. CNN fine-tuning was also explored for feature refinement. For quality classification (SwipesterQC-v2), we used engineered image features and an ensemble model with probability calibration and edge-tear detection.

Key challenges included class imbalance, noisy backgrounds, incomplete annotations, and limited defect datasets. These were addressed using preprocessing, segmentation-aware inputs, dataset merging, and feature fusion.

The classifier achieved ~86% accuracy, while the recommendation system reached a recall@3 of 94.29%. The quality model achieved 91.03% accuracy with a macro-F1 of 0.909, indicating strong real-world performance.

The system is deployable via a Streamlit interface with backend APIs and can scale with improved data pipelines, though challenges such as latency, dataset drift, and real-time updates remain.


Traffic Volume and Accidents-based Intersection Safety Classification

Abhineet Gulati, Avani Mahawar, Jaanya Goyal

CODE PDF
[click on the image to enlarge]

Traditional road safety analysis often focuses only on crash events, often overlooking exposure factors such as traffic volume. Without incorporating Annual Average Daily Traffic (AADT), it becomes difficult to distinguish between a low-volume road with inherently high crash risk and a high-volume road where increased interactions naturally lead to more accidents. To address this limitation, our project integrates crash data with AADT to develop a more representative safety risk index for intersections. This framework can help city authorities prioritize infrastructure investments, support urban planners in safety simulations, and assist insurers in improving risk assessment.

For the classification task, we adopted a deep learning approach using a Convolutional Neural Network (CNN), specifically a pretrained ConvNeXt architecture. The final fully connected layer was modified to suit our classification objective. Ground truth labels were generated by combining crash data with AADT values to assign a safety score to each intersection.

A major challenge was dataset acquisition, which required extensive communication with institutions. Integrating datasets was also difficult because the AADT dataset lacked geolocation information, requiring records to be matched using street names. Additionally, collecting intersection images required automated scripting.

Model performance was evaluated using accuracy, with current results at 60%. However, greater importance was placed on recall for the high-risk class to minimize the misclassification of unsafe intersections. A confusion matrix was also used for evaluation.


Trans-language Transliteration Using Shared Phonetics

Anirudh Sahijwani, Haridas Mahato , Manish Baghel, Shubham Kumar

CODE PDF
[click on the image to enlarge]

Transliteration converts text between scripts while preserving pronunciation, such as mapping 'namaste' to 'नमस्ते'. Existing systems usually learn direct script-pair mappings, requiring O(N²) parallel datasets across 21 Indic languages, which is difficult to scale. This project proposes a shared phonetic space that supports cross-lingual search, named-entity transfer, multilingual keyboard input, and low-resource transliteration without needing pairwise datasets for every language pair.

The system uses a character-level Transformer encoder shared across 21 languages, InfoNCE contrastive loss with τ = 0.07, mean pooling, and L2 normalization to produce a 128-dimensional unit vector. Aksharantar is used because it provides 26 million pairs across 21 languages, offering broader coverage than Dakshina or smaller Neural MT datasets. The encoder maps Unicode characters through transformer layers, pools the sequence, and learns to pull matching native-Roman pairs together while pushing unrelated pairs apart.

The learned phonetic space achieved a mean positive similarity of 0.9054, mean negative similarity of −0.0708, a discrimination gap of 0.9761, R@1 of 81.6%, R@5 of 97.8%, and R@10 of 98.7%. These results show that the space cleanly separates matching cross-script pairs from unrelated ones. Scaling challenges include weaker performance on low-resource languages such as Kashmiri, real-world spelling noise absent from Aksharantar, and ambiguity in IPA-to-vector conversion.


Urban Heat Island Intensity Prediction

Arnav Nathani, Sahil Aleem, Vayun Gupta

CODE PDF
[click on the image to enlarge]

This project predicts Urban Heat Island intensity across central Bengaluru using a tuned ensemble of XGBoost, LightGBM, and CatBoost. We built a dataset of 2,433 sample points at 50-meter resolution from Landsat, ERA5, Sentinel-5P, MODIS, OpenStreetMap, WorldPop, and Meta's Relative Wealth Index. The final model uses 20 features covering vegetation indices, urban morphology, atmospheric conditions, and socioeconomic data.

The ensemble achieves R² of 0.7940 and RMSE of 1.233°C under Stratified 5-fold cross-validation. Hyperparameters were tuned using Optuna with 60 trials per model. A 3-class severity classifier built on the same features reaches 0.7604 accuracy. Spatial Block CV using KMeans geographic clusters tests whether the model generalizes to unseen regions, with R² holding at 0.72 to 0.74. We also validated the pipeline on Pune (2,382 points), where a pooled BLR and Pune model reaches R² of 0.7497.

TreeSHAP analysis ranks the Relative Wealth Index as the third most important feature, above NDVI and population density. This suggests socioeconomic conditions are independent drivers of land surface temperature, not just proxies for population or built density. All data sources are free and open.


Vitamin Deficiency Detection

Arjun Singh Dev, Jyotiraditya Dyal, Shikhar Mattoo

CODE PDF
[click on the image to enlarge]

Vitamin deficiencies are frequently neglected because early symptoms are easy to overlook, and traditional biochemical blood panels require significant time, cost, and effort. To overcome this diagnostic barrier, our project introduces a non-invasive computer vision pre-screening tool analyzing visible dermatological symptoms. This solution democratizes early-stage healthcare, enabling rapid intervention before severe complications arise.

Given a limited dataset of 6,843 clinical images, training a deep neural network from scratch causes severe overfitting. Therefore, we utilized deep transfer learning via a ResNet50 Convolutional Neural Network. We implemented progressive unfreezing to learn specific dermatological textures and programmatically consolidated the dataset into 8 distinct clinical classes, eliminating mathematical noise from visually identical symptoms. We combated severe class imbalance using Weighted Cross-Entropy Loss and mitigated overfitting via advanced data augmentation, Dropout layers, and dynamic Early Stopping.

Our finalized model achieved a test accuracy of 66.00% and a robust Macro F1-Score of 0.74. Crucially, it isolated unique visual features for pathognomonic conditions, achieving F1-scores between 0.96 and 0.99 for Vitamins B2, C, E, and K. This lightweight model is deployable via the Plaksha student health portal for preliminary triage. Future scale-up will leverage a richer dataset to build region-specific classifiers (face, nails, skin) alongside mitigating skin-tone bias, further enhancing diagnostic precision.