Art forgery threatens cultural heritage preservation and causes substantial financial losses through counterfeit paintings. Existing authentication methods rely heavily on expert judgment, which is costly, subjective, and unscalable. Our project automates painting authenticity verification using the WikiArt dataset, with potential applications in museum cataloguing, auction house verification, and digital art provenance tracking. Our methodology combines supervised deep learning with probabilistic density estimation.
We fine-tuned a ResNet-50 backbone (pretrained on ImageNet) on paintings from 55 artists, balanced at 200 images each to eliminate class bias. Beyond classification, we extracted 4096-dimensional patch-aggregated embeddings – combining mean and max pooling over overlapping 224×224 patches – to capture both average and extreme stylistic signals. A Gaussian Mixture Model (GMM) was trained per artist to model authentic embeddings. Forgery detection operates as likelihood thresholding: images scoring below mean − 2.5σ of their best-matching artist's distribution are flagged as inauthentic.
Key challenges included the absence of real forgery data, handled via one-class density modeling, and high-dimensional GMM fitting, addressed using diagonal covariance. Our classifier achieved 72.85% validation accuracy across 55 artists. The GMM-based rejection framework provides statistically grounded authenticity scores. These methods were chosen after literature review of automated painter identification and patch-based feature extraction (Choudhury, 2021; Gencarelli et al., 2023; Afifi et al., 2025; Sabino et al., 2026), trial-and-error, and studying class content considering compute capability. Our pipeline successfully replaces subjective analysis with mathematically grounded authenticity scores, offering a transformative contribution to digital provenance tracking and the protection of global cultural heritage.
Data breaches represent one of the most critical threats to organizational security, with attackers increasingly using sophisticated exfiltration strategies: bulk dumps, targeted record theft, and slow incremental access, that evade traditional anomaly detectors. DecoyNet addresses this by building a three-layer honeypot system that injects machine-learning-generated decoy records into real financial transaction datasets, triggering alarms the moment stolen data contains a decoy. Applications include enterprise database protection, insider threat detection, and any system where the cost of undetected exfiltration is high.
We use the PaySim synthetic financial dataset (Lopez-Rojas et al., 2016), comprising 6.3 million mobile money transactions with fully interpretable features. A PCA and Gaussian Mixture Model pipeline (Weeks 3–4) generates statistically realistic decoy transactions, evaluated using the Xu et al. (2019) CTGAN framework, a Random Forest discriminator attempting to distinguish real from decoy records. Four injection strategies place decoys at decision boundaries, k-Means cluster centroids, high-value transaction zones, and random positions, with each decoy cryptographically hashed (SHA-256 with secret salt) into a secure lookup table.
Four simulated attack types: bulk, targeted, mimicry, and slow theft are tested across 20 trials each. The decoy discriminator achieved 57.3% accuracy (target ≤ 60%), confirming realistic decoys. Bulk and targeted attacks were detected in 100% of trials; slow incremental theft, which Isolation Forest baseline missed entirely, was detected at 91%. False positive rate remained below 2%, and downstream fraud detection AUC-ROC degraded by less than 0.008 after injection, confirming the dataset integrity was preserved.
Children's speech recognition is significantly harder than adult ASR due to higher pitch, shorter vocal tracts, emerging phonetic patterns, and scarce annotated data. We built an machine learning model predicting IPA phonetic symbols from children's audio. Applications include AI literacy tutors, automated pediatric speech therapy screening, and more inclusive voice assistants. The core impact is closing the 4.2% → 28.5% CER gap that exists between adult and child ASR in current commercial models.
Our solution treats the task as phonetic speech recognition for children’s speech. We used log-mel spectrograms as audio features, a CNN14-like acoustic encoder, CTC loss, beam-search decoding, and data augmentation. Log-mel features capture speech patterns over time and frequency. The CNN learns acoustic features, and CTC allows training without frame-level phoneme alignments by learning how audio frames map to phonetic transcripts. Beam search improves decoding by considering multiple possible phoneme sequences.
We also used SpecAugment, noise, gain, and time-stretching to improve robustness. The model is configured to use a pretrained PANNs CNN14 audio checkpoint, which helps because children’s speech data is limited. Validation PER was 0.7168 and CER was 0.7183. On the 922-sample test set, PER was 0.6131 and CER was 0.6152, improving from the baseline PER 0.9341 and CER 0.9423. We also aim to implement transfer learning.
Disruptions in circadian rhythms affect health, sleep quality, and daily productivity, yet accurately tracking internal body clocks in real-world settings remains difficult. Traditional methods such as DLMO testing or sleep lab studies are invasive, expensive, and impractical for continuous monitoring. This project aims to develop a non-invasive, scalable solution to predict a user’s Acrophase (their daily peak physiological time) using data from the NHANES (National Health and Nutrition Examination Survey) dataset.
Such a system could enable personalized health insights, improved sleep recommendations, and better scheduling for productivity and well-being. To address this, we preprocess wearable-style features such as activity levels, heart rate patterns, and sleep behavior, while encoding time cyclically using sine and cosine transformations to reflect its 24-hour nature. We implement and compare three machine learning models: K-Nearest Neighbors (KNN) as a baseline, LightGBM for efficient handling of complex feature interactions, and a Multi-Layer Perceptron (MLP) neural network for capturing nonlinear patterns.
Key challenges included limited directly labeled circadian data and handling cyclical time, which we addressed through feature engineering and circular transformations. Model performance is evaluated using Circular RMSE and Circular R², along with accuracy, precision, recall, and confusion matrices after grouping predictions into time-of-day categories. Results indicate reliable prediction of circadian phase, demonstrating feasibility for deployment in wearable-based health monitoring systems at scale, though challenges such as user variability and real-time integration remain.
Respiratory diseases such as asthma, COPD, and COVID-19 collectively affect hundreds of millions of people globally, yet their overlapping symptoms make clinical differentiation difficult and subjective. Our project develops a machine learning classifier that analyzes acoustic features extracted from cough and vowel sounds to categorize patients into four classes: healthy, asthma, COPD, and COVID-19, enabling non-invasive, scalable respiratory screening. Potential applications include clinical pre-screening tools and early-detection pipelines in healthcare settings with limited diagnostic infrastructure. This study presents AcoustiHealth, a machine learning-based preliminary respiratory disease screening tool trained on over 78 hours of audio recordings.
After column harmonisation, audio preprocessing (Butterworth bandpass filtering, silence trimming, amplitude normalisation), and feature extraction, three parallel classifiers were trained: one on 66 cough features, one on vowel features, and one combining both. To address class imbalance, minority disease classes were oversampled and class weights were balanced during training. Seven models were evaluated - XGBoost, Random Forest, Histogram Gradient Boosting, Logistic Regression, SVM, Gradient Boosting and CNNs - spanning linear, ensemble, boosting and convolutional paradigms, selected because they perform well without the large data volumes deep learning demands. A 70/15/15 train-validation-test split was used.
Macro-averaged F1 along with recall was prioritised, as it penalises per-class errors equally regardless of class size. The best-performing model, Histogram Gradient Boosting on cough features only, achieved 76% validation accuracy (macro F1: 0.73, recall: 0.71) and 63% test accuracy (macro F1: 0.59, recall: 0.58). The validation-test gap reflects limited data diversity. Deployment is most suited to hospital settings; at scale, improved performance would require substantially larger, balanced training corpora. With access to larger, more diverse and balanced datasets, AcoustiHealth can evolve into a reliable, low-cost tool that can provide equitable and accessible preliminary respiratory disease detection.
This project addresses two core challenges in online thrifting: accurate personalization from noisy real-world images and limited trust caused by uncertain garment quality. These problems impact user engagement, increase returns, and reduce platform credibility of Swipster. The solution has applications in e-commerce platforms, resale marketplaces, and automated moderation systems, with the potential to improve recommendation quality, enhance trust, and streamline quality control. For personalization, we developed a multimodal machine learning pipeline using EfficientNet image embeddings combined with captions, metadata, and attribute labels.
Models including Logistic Regression, LinearSVC, boosting, and soft-voting ensembles were used, as they handle high-dimensional features and improve robustness. CNN fine-tuning was also explored for feature refinement. For quality classification (SwipsterQC-v2), we used engineered image features and an ensemble model with probability calibration and edge-tear detection. Key challenges included class imbalance, noisy backgrounds, incomplete annotations, and limited defect datasets.
These were addressed using preprocessing, segmentation-aware inputs, dataset merging, and feature fusion. The classifier achieved ~86% accuracy, while the recommendation system reached a Recall@35 of 94.29%. The quality model achieved 91.03% accuracy with a macro-F1 of 0.909, indicating strong real-world performance. The system is deployable via a Streamlit interface with backend APIs and can scale with improved data pipelines, though challenges such as latency, dataset drift, and real-time updates remain.
Loan default prediction is a critical problem for financial institutions, particularly for individuals with limited or non-traditional credit histories. Poor risk assessment can either deny creditworthy applicants or increase financial losses due to defaults. This project aims to build a robust machine learning system to predict loan default risk using the Home Credit dataset, enabling fairer lending decisions and improved financial inclusion.
Our approach emphasizes extensive feature engineering, which is central to this competition. The model architecture uses a weighted ensemble of gradient boosting models: LightGBM, XGBoost, and CatBoost. Using 5-fold stratified cross-validation and strict out-of-fold validation, our current pipeline achieves a ROC-AUC of 0.7977.
Challenges included handling large-scale relational data, designing effective aggregations, and identifying meaningful features, which were addressed through modular pipeline design and incremental experimentation. The achieved ROC-AUC demonstrates strong predictive performance. While deployable as a scalable credit risk tool, challenges such as model generalization, fairness, and data drift must be addressed when scaling further.
Addressing the 2026 World Economic Forum’s ranking of "Misinformation & Disinformation" as a top global risk, this project tackles bot-generated reviews that cause up to 25% revenue drops for honest businesses. To address the "vocabulary overfit" observed in prior research, we implement Domain Adaptive Foundation Modeling: a two-stage process that trains on foundational review data before specializing in domain-specific datasets. This approach achieves 86%+ accuracy in Ternary Risk Stratification, classifying reviews into High, Moderate, or Low risk to provide real-time, scalable authenticity scores.
Deepfakes are not only a present-day media threat but also a prototype for the larger authenticity challenges that will follow. Deepfakes, artificial but hyper-realistic video, audio, and images created by algorithms, are one of the latest technological developments in artificial intelligence. They are creating a “crisis of knowing” blurring the lines between truth and fabrication. This is because they are convincing, differing significantly from traditional disinformation. Moreover they are becoming increasingly scalable and accessible. Take the recent incident of Volodymyr Zelenskyy during the Ukraine conflict, which falsely depicted a call for surrender. This perhaps exemplifies the potential of synthetic media and its manipulation in an environment of high stakes. The deepfake detection model that we’re deploying could have various applications across multiple domains. These include social media content moderation, detection of fake news and misinformation, and digital forensics. It can also be used for identity verification systems and authentication platforms to prevent impersonation.
The solution contributes to restoring trust in digital media by enabling early detection of manipulated content before it spreads. Our work proposes a multimodal deepfake detection framework where it moves beyond the visual appearance and tries to interrogate behavior for authenticity of the media. The system analyzes markers such as eye blink pattern, consistency across frames and the synchronization of lip movement, audio and expression. Audio visual alignment is a significant contribution that exposes mismatches, revealing whether the content is synthetic. Facial features were extracted using EfficientNetB0, and MediaPipe Face Mesh was used to capture fine-grained behavioral cues. Temporal inconsistencies are analyzed through optical flow and frame-level feature variations, while audio patterns are encoded using MFCCs. These signals are fused into a unified representation and classified by using LightGBM, this enables efficient learning from complex, high dimensional data. To address the black-box nature of machine learning models, the framework incorporates SHAP. This provides feature level explanations and reveals why a particular video is flagged, whether due to blinking, motion inconsistency and desynchronized speech. Our system faced several practical and methodological challenges.
One of the most significant challenges included a pronounced class imbalance, with significantly fewer real videos than manipulated ones. To tackle this, we applied oversampling on the training feature matrix to balance the real and fake classes while keeping the test set untouched. Other issues that arose included computational intensity of the pipeline (due to face detection, landmark extraction, deep feature computation, optical flow, and audio processing) which was mitigated through checkpoint-based feature storage, enabling efficient reuse of intermediate results and recovery from runtime interruptions. Interpretability of the model was incorporated using SHAP, allowing the model to justify its decisions through human-understandable cues. Our model evaluated using accuracy, precision, F1-score, recall, balanced accuracy, and ROC–AUC. It achieved strong performance, with 99% accuracy, 0.9995 AUC–ROC, and 0.9725 balanced accuracy. This came alongside high class-wise F1-scores which were 0.96 for real and 0.99 for fake videos.The near-perfect AUC–ROC and high balanced accuracy indicate robust discrimination between real and fake videos, while strong recall and F1-scores across both classes confirm reliable detection without bias toward the majority class. Our model could potentially be deployed in Plaksha. One of the possible pathways include a web-based tool where users can upload their videos, which are processed through the pipeline to output real/fake predictions along with interpretable explanations. It can be used in research labs, student project verification and cybersecurity demonstrations. It is probable that scalability may be limited by high computational cost, slow processing time, and the need for continuous retraining to adapt to the continuously evolving deepfake techniques in today’s AI age.
Industrial machines often develop faults that first manifest as subtle changes in operating sounds. These anomalies frequently go undetected by human operators, resulting in unplanned downtime, costly repairs, and safety hazards across manufacturing, energy, and logistics sectors. Early automated detection of such sounds can enable predictive maintenance, reducing operational losses significantly.
We propose a two-stage unsupervised machine learning pipeline to address the domain shift problem – the key limitation identified across existing literature. A Convolutional Autoencoder is first pretrained on the MIMII 2019 dataset using only normal machine sounds, learning to reconstruct healthy acoustic patterns. Anomalies are flagged when reconstruction error exceeds an adaptive threshold. The model is then fine-tuned on the MIMII DG 2022 dataset, which contains recordings across varied environmental conditions, enabling robust generalization across deployment environments.
Features are extracted as Mel-Spectrograms and MFCCs using Librosa. Key challenges included the huge dataset size, which we managed using Google Colab with streaming access, and catastrophic forgetting during fine-tuning, mitigated by freezing encoder weights at a low learning rate. Performance is evaluated using AUC-ROC, F1-Score, Precision, and Recall, targeting above 80% AUC. The solution is deployable in any factory environment using low-cost microphones connected to edge devices, with the model running inference in real time.
Brain tumors often go undetected in routine MRI scans where radiologists primarily look for the indicated pathology, leading to delayed diagnoses with severe clinical consequences. Supervised tumor segmentation requires expensive expert annotations and fails to generalize to rare or unseen pathologies. We address this through unsupervised anomaly detection (UAD), where models learn only from healthy brain MRI and flag any deviations as suspicious. The solution can serve as a second-reader screening tool in radiology workflows, surface incidental findings, and adapt to novel pathologies without retraining – particularly valuable in resource-constrained settings.
We extend Behrendt et al.'s conditioned diffusion model (cDDPM) framework by replacing its 2D SparK encoder with a 3D ResNet50 encoder pretrained via masked autoencoding on 691 healthy T2 scans subset from the FOMO-300K dataset, balanced across scanners and demographics. The pretrained 3D encoder produces a 128-dim conditioning vector via FiLM, guiding a 2D denoising UNet that reconstructs healthy brain anatomy. Test-time residuals between input and reconstruction yield voxel-wise anomaly maps. Key challenges included GPU memory constraints with 3D ResNet50 (resolved via dynamic padding and batch size tuning), training instabilities and architectural integration of dimension-mismatched encoder-decoder pairs.
Performance is measured via volume-level AUROC, AUPRC, and per-voxel Dice score on BraTS21 (brain tumors) test sets. The Behrendt et al. baseline achieves ~0.86 AUROC on BraTS21 with 2D encoding. We expect our 3D encoder variant to demonstrate improvement/comparable performance, due to enhanced volumetric context capture. The solution can be tested further and potentially deployed at Plaksha as a screening plugin for the campus radiology partnerships or for pedagogical demonstrations. Scaling challenges include domain adaptation across scanner manufacturers, GPU hardware, real-world validation and processing throughput for high-volume hospitals.
Medical image forgery, such as concealing tumors or fabricating benign results in CT scan data, poses a serious threat to diagnostic integrity, insurance systems and patient safety. This project presents a deep learning-based approach for detecting tampering in lung CT scan images using a slice-level classification framework. The methodology is centered on a fine-tuned ResNet50V2 model, pretrained on ImageNet and adapted through transfer learning for medical image analysis.
Each CT slice is preprocessed by resizing and normalizing before being passed through the network, which extracts high-level features to classify slices as tampered or untampered. The dataset includes authentic and manipulated slices (false malignant and false benign), trained using an 80/20 split with data augmentation. Binary cross-entropy loss and an adaptive optimizer ensured stable convergence, while mixed precision enabled efficient training. Key challenges included limited dataset size, overfitting, subtle GAN-based manipulations, and hardware constraints.
These were addressed using transfer learning, augmentation, dropout, early stopping, and efficient pretrained architectures. The model achieved a test accuracy of 91.57%, with a precision of 0.97 and recall of 0.86 for the tampered class, and precision of 0.87 and recall of 0.97 for the untampered class, resulting in an overall F1-score of approximately 0.92. Despite strong performance, deployment may face challenges such as generalization across hospitals, data privacy, computational cost, evolving GAN techniques, and regulatory approval. Overall, this approach provides a scalable and effective solution for automated medical image authenticity verification, with future scope for incorporating inter-slice context and full volumetric analysis.
Vitamin deficiencies are frequently neglected because early symptoms are easy to overlook, and traditional blood tests require significant time, cost, and effort. To overcome this diagnostic barrier, our project introduces a non-invasive computer vision pre-screening tool that analyzes visible dermatological symptoms. This solution democratizes early-stage healthcare, allowing individuals to quickly check their health and seek rapid medical intervention before long-term complications arise. Given a limited dataset of 6,843 images, training a deep neural network from scratch causes overfitting.
Therefore, we used transfer learning with a pre-trained ResNet18 Convolutional Neural Network. We froze the feature-extraction layers and modified the final fully connected layer to output exactly 11 distinct deficiency classes. We tackled dataset noise programmatically using automated exclusion scripts and applied data augmentation to improve generalization. Our baseline model achieved a test accuracy of 58.73%.
Crucially, the model successfully isolated unique visual features, achieving F1-scores above 0.94 for Vitamins B2, C, E, and K. Lower performance in composite classes highlighted the difficulty of high intra-class variance. This lightweight model can be deployed via a Plaksha student health portal for preliminary screening. Scaling up will require training on diverse demographic data to mitigate skin-tone bias.
This project builds an integrated machine learning framework for two linked financial risk tasks: predicting corporate credit ratings and detecting financial fraud. Credit ratings from agencies such as Standard & Poor's and Moody's are expensive, infrequently updated, and unavailable for many firms, while fraudulent financial reporting often remains hidden for years. The framework uses public financial statements, macroeconomic indicators, SEC EDGAR filings, AAER labels, accrual patterns, earnings-manipulation signals, the Beneish M-score, and a derived rating gap to support credit analysis, audit prioritization, and investment screening.
The credit rating module predicts a firm's rating on a 22-point ordered scale using Ridge Regression, Random Forest, XGBoost, and Ordered Logistic Regression. The fraud detection module uses Isolation Forest, supervised XGBoost, Positive-Unlabeled Learning, and an ensemble meta-learner trained on 2,344 firm-year observations across 401 firms, including 86 confirmed fraud firm-years. Data leakage was corrected by ensuring that imputation, normalization, model fitting, and PU sampling used only training data, while class imbalance was handled through weighted learning and PU-style treatment of unlabeled firms.
The Random Forest rating model achieved an RMSE of 3.00 and MAE of 2.23, while Ordered Logistic Regression reached 41.5% within-one-notch accuracy and 60.9% within-two-notch accuracy. For fraud detection, PU Learning achieved an AUC-ROC of 0.76 and Average Precision of 0.38 after leakage correction, with precision of 0.25, recall of 0.81, and F1 of 0.38 at the selected threshold. The system can be deployed as a quarterly EDGAR-based dashboard at Plaksha or similar institutions, though scaling will require careful handling of EDGAR rate limits, incomplete XBRL coverage, historical-label bias, and the computational cost of PU Learning.
In Formula 1, raw telemetry records actual speed but lacks a theoretical "ideal speed" baseline, making it difficult to quantify intra-lap performance loss caused by aerodynamic "dirty air" and tire degradation. We address this by developing an interactive diagnostic tool to isolate vehicle limitations from driver errors, benefiting race strategists and junior motorsports. Methodologically, we interpolated 2024 Bahrain GP telemetry onto a 1-meter spatial grid.
We trained four distinct models to predict ideal speeds: a Transformer for long-range spatial context, a Bi-LSTM for bidirectional temporal dependencies, XGBoost for handling complex non-linearities, and Random Forest as a robust ensemble baseline. Key challenges included severe data leakage and CuDNN backend crashes during SHAP (SHapley Additive exPlanations) gradient calculations. We resolved these by strictly isolating aerodynamic features and offloading SHAP inference to the CPU.
The Bi-LSTM emerged as the optimal model, achieving a Mean Absolute Error (MAE) of 14.21 km/h and an R² of 0.8568, matching established telemetry sequence benchmarks. Combined with SHAP, it successfully pinpointed a 116.33 km/h speed loss anomaly. Future scaling challenges include processing live telemetry streams without latency bottlenecks and incorporating 3D track elevation data.
Stroke remains one of the leading causes of death and long-term disability worldwide, with approximately 11.9 million new cases annually. Current diagnostic methods such as CT and MRI scans are expensive, time-intensive, and often inaccessible in emergency or resource-limited settings, creating a critical gap in early detection. This project addresses three interconnected challenges: the reliance on single EEG metrics like the Brain Symmetry Index that fail to capture complex multi-electrode spatial patterns; the inconsistency and lack of standardisation across EEG-based stroke studies that limits generalization; and the predominant focus on temporal EEG models that overlooks the spatial distribution of brain activity.
To address these limitations, we develop a dual-pipeline classification system using EEG data from stroke patients and healthy controls. Classical machine learning models including SVM, Random Forest, and Logistic Regression are trained on frequency band power features to establish a performance baseline. A CNN-based pipeline then converts EEG signals into topographic scalp maps, enabling automated learning of hemispheric asymmetry and spatial abnormalities.
Dataset heterogeneity is handled through signal resampling, electrode alignment, and a standardised preprocessing pipeline comprising band-pass filtering and artifact removal. The proposed approach demonstrates strong performance, achieving an accuracy of 0.9983, F1-score of 0.9983, and recall of 0.9983, indicating highly reliable classification capability. All models are evaluated using accuracy, precision, recall, and confusion matrices, working toward a portable, automated, and clinically interpretable stroke screening tool.
This project focuses on developing an early warning system to detect mental health crises using longitudinal Reddit data. The main problem we address is the lack of timely identification of individuals at risk before critical events, such as posting in suicidal forums. Early detection is important because it allows preventive action and improves access to support. This system can be applied to online platforms for automated support alerts, mental health monitoring tools, and institutional well-being systems.
Our methodology follows a two-stage machine learning pipeline. First, we use a user-level XGBoost model to check whether behavioral and linguistic features contain predictive signals. Then, we build a temporal window-based model for early detection using transformer-based embeddings, linguistic features, and behavioral patterns. XGBoost is chosen because it handles mixed features well and manages class imbalance using weighted learning.
Major challenges included extreme class imbalance (~2.6% positive cases), noisy social media data, and computational limitations. We addressed these using class weighting, preprocessing, and efficient feature engineering. The model achieves an AUROC of ~0.88 and detects about 71% of pre-crisis cases. Although precision is lower (~11.8%), we prioritize recall since missing a crisis is more critical. We will create a dedicated Reddit forum page for Plaksha and deploy the system there to identify students who may require immediate help or support, while carefully managing challenges such as privacy, false positives, and scalability.
This work addresses the problem of identifying and explaining smartphone addiction levels among college students, where lack of interpretability often limits actionable insights. We model addiction as a continuous score predicted using a shallow neural network, followed by classification into low, medium, and high addiction using predefined thresholds.
To ensure interpretability, GradientSHAP is applied to extract feature-level contributions, enabling identification of the top three factors influencing addiction for each student, as well as global behavioral trends. The model is trained on a longitudinal dataset of over 200 students spanning four years, with class imbalance observed in higher addiction categories. A shallow neural network is preferred over regression to capture non-linear relationships between behavioral features and addiction patterns.
Feature selection is performed using Leave-One-Subject-Out cross-validation, retaining features with non-zero coefficients and selecting the top contributors to reduce noise and dimensionality. A key challenge was high feature dimensionality and domain relevance, addressed through systematic feature engineering and ranking. The system can be deployed within academic institutions to support student monitoring and enable targeted interventions based on personalized behavioral insights.
Modern video games typically rely on static difficulty systems that fail to adapt to a player’s emotional state, often leading to boredom or frustration. This project addresses the problem of dynamically adjusting gameplay by predicting a player’s affective state using behavioral telemetry such as aiming accuracy, mouse movement patterns, and interaction intensity.
Machine learning models–including Random Forest, Support Vector Machine, Logistic Regression, and XGBoost–were trained to classify player states such as Flow, Boredom, Enjoyment, and Negative High Arousal. The system then uses these predictions to modulate game difficulty in real time, aiming to maintain optimal engagement. The study demonstrates that gameplay behavior contains meaningful signals for inferring emotional states, though challenges such as overlapping behavioral patterns and noisy self-reported labels limit classification accuracy.
Techniques like feature analysis, label refinement, and SHAP-based interpretability were used to improve understanding of model behavior. The resulting system provides a foundation for adaptive gaming experiences, with potential applications in personalized entertainment, education, and human-computer interaction, ultimately contributing to more engaging and responsive interactive systems. Player affective states can be inferred from gameplay telemetry using machine learning, and these predictions can be used to create adaptive gaming experiences.
The project focuses on predicting Global Horizontal Irradiance (GHI) using pollution, weather, and solar geometry data, which is especially important in India where air pollution significantly affects how much sunlight actually reaches the ground. Accurate GHI prediction can help in better planning and optimization of solar energy systems, from choosing installation sites to forecasting energy generation. The solution combines CPCB pollution data, ERA5 weather data, and computed geometry features like solar zenith and azimuth, all aligned hourly using station location and timestamps.
A supervised learning approach was used, with XGBoost as the main model because it handles structured data well and captures complex non-linear relationships between environmental factors and solar radiation. One of the main challenges was dealing with large-scale data and ensuring proper alignment across different sources, especially with missing values and time inconsistencies, which were handled through preprocessing and resampling. The model was evaluated using RMSE, MAE, and R², and showed strong agreement between predicted and actual values, indicating good performance. This solution can be deployed for applications like solar planning and energy forecasting, even at a campus level like Plaksha, although scaling it further would require handling real-time data pipelines and computational efficiency.
In real-world performances, what matters isn’t just the emotion, but whether the intensity matches the intent behind the script.To address this, we built Expression Intensity Alignment with Dialogue Context. Going beyond basic emotion detection, our system evaluates the expected emotional intensity of a script against an actor's actual physical and vocal performance. It enables objective, data-driven feedback for rehearsals and self-tapes, helping actors align delivery with script intent.
Because human expression relies on complex cues across text, audio, and video, we implemented a Late Fusion architecture. We used BiLSTMs to track behavioral flow over time and Temporal Attention to highlight critical frames. Training was challenging as we hit severe mathematical failures like mode collapse and NaN saturation, alongside missing continuous labels. We overcame this by adjusting activation functions, restructuring how we combined the data and using knowledge distillation from RoBERTa and EmoBERTa ensemble onto our offline models.
The resulting Tri-Modal ensemble proved highly reliable, achieving a Pearson Correlation (R) of 0.6462 and a Validation MAE of 0.13. We've deployed the solution at Plaksha as an offline Gradio dashboard, and the university drama club already uses it to refine performances successfully. A live demo is available at: https://huggingface.co/spaces/abhi-s/Multimodal_Emotion. Future scaling will require handling variable microphone quality, diverse lighting, and expressions outside our CMU-MOSEI dataset.
Predicting next sequence of piano notes from raw piano audio, remains a challenging problem due to simultaneous chords, sustain pedal effects, and complex acoustic overlaps.This project addresses the gap between frame-based CNN models (strong on timing, weak on musicality) and sequence-based Transformer models (strong on context, struggling with polyphonic overlaps and hallucinations). We developed a hybrid LSTM-based architecture trained on MAESTRO dataset (1,184 performances, 200+ hours) to predict piano notes with both temporal precision and musical coherence. We implemented and compared three recurrent architectures: Vanilla RNN (1.4M parameters), GRU (4.1M parameters), and LSTM (5.5M parameters), alongside Random Forest and SVM baselines using sliding-window feature extraction.
The LSTM model achieved 94.91% test accuracy and 0.1082 validation loss, outperforming simpler approaches on long-range musical sequences while maintaining frame-level precision. Key challenges tackled included handling note offset ambiguity (acoustic decay after key release), preventing hallucination in polyphonic chords, and addressing class imbalance (19:1 silence-to-note ratio) through weighted loss functions. The solution is deployable at Plaksha as a real-time piano transcription tool in music education software, with scalability challenges including inference latency on long performances and memory constraints for songs exceeding 15 minutes.
Flight delays are a massive headache for airlines and passengers alike. Our project tackles this by predicting if a delayed flight can make up time in the air, and exactly by how much. If solved, this allows for dynamic gate scheduling and smoother ground operations, ultimately reducing cascading airport delays.
We designed a "Two-Brains" XGBoost pipeline: a classifier acts as a gatekeeper to predict if a flight will recover, passing promising flights to a regressor to predict the exact minutes. We chose XGBoost for its power with complex physical data. Our biggest learning curve was overcoming Data Leakage, we accidentally let the model "cheat" using post-flight data. We fixed this by engineering realistic, pre-departure features like "Historical Median Flight Times." Because this is an ongoing project, our metrics are a work in progress.
Currently, our classifier achieves a 71.3% accuracy (with an excellent 98% recall capturing almost all recoveries), while our regressor sits at an 11.2-minute Mean Absolute Error. We plan to continuously tune and improve these baseline numbers. While commercial deployment is complex, Plaksha could deploy this exact predictive architecture to optimise campus shuttle routing. Scaling up will require integrating live weather APIs and handling real-time data streams without latency.
Dhemaji district in Assam, India, is among the most flood-prone regions in the country, experiencing severe seasonal floods due to Brahmaputra river overflow during monsoon months. Timely and accurate flood prediction at a granular spatial scale remains a critical challenge for disaster preparedness and response. This project develops a machine learning-based geospatial flood prediction system for Dhemaji district using a 500m × 500m grid framework, integrating multi-source satellite and environmental data.
We used Sentinel-1 SAR-derived flood labels as ground truth, which provided dynamic daily flood extent maps across six monsoon seasons from 2019 to 2024 overcoming the cloud cover limitations of optical satellite imagery. Features were derived from multiple sources including ERA5 land surface runoff, HydroSHEDS satellite-derived river network data, SRTM elevation and slope, and MODIS tree cover. A key contribution of this work was the derivation of distance to major rivers as a feature, computed from HydroSHEDS river order data, which emerged as the strongest predictor of flood occurrence.
Eleven machine learning models were evaluated including Random Forest, Gradient Boosting, XGBoost, LightGBM, CatBoost, Neural Networks, and others. Gradient Boosting achieved the best overall performance with a precision of 0.845, recall of 0.821, F1 score of 0.833. The final model was trained on 1.09 million spatial-temporal observations and tested on 165,337 unseen observations from the 2024 monsoon season. The system can be deployed as a district level flood risk assessment tool to support early warning systems, infrastructure planning, and emergency resource allocation in Brahmaputra floodplain regions.
This project addresses the problem of framing bias, where information is not falsified but subtly shaped through language, tone, and emphasis, influencing public perception. Traditional approaches treat bias as a binary classification (biased vs. unbiased), which fails to capture its nuanced and continuous nature. To overcome this limitation, we propose a system that predicts a bias severity score (1–10), enabling a more granular understanding of linguistic framing.
The solution has applications in editorial auditing, media literacy tools, and algorithmic transparency systems, helping platforms and readers identify and mitigate subtle bias, ultimately promoting more transparent and responsible journalism. Our methodology involves constructing a structured machine learning pipeline starting with an expert-annotated dataset, followed by data cleaning and feature extraction. We engineered both text-based features and handcrafted linguistic features, including bias indicators and subjectivity patterns.
Additionally, text embeddings were used to capture contextual meaning and analyze word importance. These features were combined into a unified representation and used to train models such as Random Forest and XGBoost for both classification and regression tasks. Performance was evaluated using MAE and RMSE, demonstrating strong alignment with expert judgments. The solution is deployable as a real-time analysis tool or browser extension, though challenges such as domain generalization and scalability may arise during large-scale deployment.
1.1 Infant health monitoring remains largely reactive, with caregivers unable to identify health issues until symptoms become visible. Research suggests physiological changes often manifest as shifts in cry acoustics hours or days before this point. 1.2 ICLAS has direct applications in early infant health screening and smart baby monitor software, with the potential to reduce the window between physiological change and caregiver response across thousands of homes worldwide. 1.3 By surfacing early-warning signals before symptoms appear, ICLAS has the potential to meaningfully improve infant health outcomes in home settings where continuous clinical monitoring is unavailable. 2.1 ICLAS uses a fine-tuned ECAPA-TDNN backbone to build per-infant acoustic profiles from which longitudinal drift is measured.
A gated continual updating mechanism inspired by pyCLAD ensures illness-related changes are flagged rather than absorbed into the baseline. 2.2 Each cry is encoded into a 192-dimensional embedding representing acoustic identity. An autoencoder anomaly detection layer then scores each new cry against the infant's personal baseline. 2.3 Key challenges included catastrophic forgetting, microphone variability across consumer devices, poor performance in noisy environments, and false positives before a stable baseline is established. 3.1 The backbone achieved an EER of 21.38%, approximately 4% better than the CryCeleb 2023 benchmark.
The gating classifier achieved 98.57% accuracy distinguishing infant cries from non-cry sounds including snoring, pink, white, and brown noise. 3.2 The EER improvement confirms domain adaptation produces more discriminative per-infant embeddings. The gating accuracy confirms the system reliably filters which audio is eligible for baseline updates. 3.3 ICLAS is designed for integration into existing smart baby cameras as a lightweight software layer requiring no additional hardware. Deployment at Plaksha is not currently feasible given the absence of infants on campus. 3.4 Scaling challenges include edge hardware constraints for online learning, microphone variability across manufacturers, and the need for extended longitudinal data to validate clinical utility at scale.
Initial Public Offerings (IPOs) often exhibit significant price volatility on the listing day, making investment decisions highly uncertain and speculative. Accurate prediction of IPO listing-day returns can improve investment efficiency, reduce information asymmetry, and enhance risk management in primary markets. This study aims to develop a machine learning–based predictive framework for forecasting IPO listing-day returns by integrating company fundamentals, subscription demand patterns, and market sentiment indicators.
The dataset consists of IPOs issued during the study period, with explanatory variables capturing firm-level financial characteristics, investor subscription behavior across different categories, and prevailing market sentiment indicators such as grey market premium and overall market conditions. Several machine learning models, including regression-based and tree-based algorithms, are employed to model the nonlinear relationships between these variables and IPO listing-day returns. Model performance is evaluated using standard metrics such as Mean Absolute Error, Root Mean Squared Error, and out-of-sample prediction accuracy.
The findings demonstrate that machine learning models outperform traditional linear approaches in capturing complex interactions between fundamentals and market sentiment. Subscription demand and sentiment indicators emerge as strong predictors of listing-day performance, highlighting the behavioral aspects of IPO pricing. This research contributes to the literature by illustrating the effectiveness of machine learning techniques in financial prediction and offers practical insights for investors, issuers, and policymakers seeking to improve decision-making and pricing efficiency in IPO markets.
There exist many archives with millions of images where the year timestamp is missing or only roughly estimated, so our image-year prediction model can help infer a more precise time period. It can be used to organize and verify large collections like the Library of Congress, detect inconsistencies in news or online media, and also be offered as a service to help individuals date old family photographs more accurately. We’ve implemented a pipeline that combines numerical features with high-level semantic representations. Specifically, we make use of CLIP embeddings to capture rich visual semantics, which are then used to train multiple models.
This includes Random Forest, Linear SVM, SGD Classifier, and Logistic Regression. We use these different methods for comparing performance and determining the most effective approach for our task. Our dataset consists of approximately one million images spanning seven decades, each labeled with its actual year of capture. Performance is evaluated primarily using mean absolute error (MAE), alongside classification accuracy.
We have proposed that a guess is accurate, if the predicted year is within +-5 years from the actual year. Currently, our models achieve an accuracy exceeding 60%, representing a significant improvement over existing approaches in the literature. These results suggest that the combination of semantic embeddings and traditional machine learning models is quite effective. The pipeline can be deployed in institutional areas like Plaksha by integrating it with digital archiving systems.
Standard helmet detection systems are usually trained on Western datasets, which makes them unreliable in Indian traffic environments where cultural headwear and regional garments are common. This creates false positives when Sikh riders wearing legally exempt pagdis are flagged as violations, and false negatives when bulky fabrics such as hijabs, dupattas, or burqas are misclassified as helmets. This project develops a culturally aware three-class detection pipeline for Helmet, Pagdi, and No-Helmet cases, with the goal of making automated enforcement more fair and reducing manual verification work for traffic authorities.
The solution frames the task as a three-class object detection problem and uses YOLOv8 because it supports real-time embedded deployment and uses Distribution Focal Loss for more precise bounding-box learning. Instead of treating box edges as rigid lines, the model represents them as probability distributions, which helps when textile boundaries are blurry or visually ambiguous. Region-specific traffic data, zero-shot auto-annotation with Grounding DINO, and LLaVA-assisted label curation are used to build a more context-aware dataset.
The main challenges were limited region-specific data, ambiguous annotations for cultural garments, and false positives under noisy traffic-camera conditions. These were addressed through targeted data sourcing, separate preprocessing for heterogeneous datasets, and confidence-thresholding logic at inference time. The resulting system is intended for scalable, fairer traffic enforcement in Indian road settings.
Modern computing systems generate continuous log streams that often contain subtle early indicators of impending failures. Without automated monitoring, these go undetected until costly outages occur. This project builds a binary classifier that predicts system failures from log event sequences before they escalate. Potential applications include data center monitoring, industrial IoT maintenance, and campus IT infrastructure such as Plaksha's servers.
The impact is significant: reduced downtime, lower maintenance costs, and a shift from reactive to proactive operations. A Bidirectional LSTM was selected because system logs carry temporal, ordered dependencies that standard classifiers cannot capture. The BiLSTM processes sequences in both directions through an embedding layer, bidirectional LSTM layers, and global max pooling to retain full-sequence context. Key challenges included severe class imbalance–failures are rare events–tackled using Focal Loss combined with weighted cross-entropy, and dynamic threshold tuning on validation data to maximize F1-score rather than defaulting to a fixed 0.5 cutoff.
Early stopping and a learning rate scheduler prevented overfitting. The model achieved an F1-score of 0.8344, recall of 0.87, and precision of 0.79. High recall ensures most failures are caught; the F1-score confirms the balance between false alarms and missed detections is strong. At Plaksha, this pipeline could ingest live server logs via a streaming service and trigger maintenance alerts automatically. Scaling challenges include concept drift as log patterns evolve over time, retraining costs as new failure types emerge, and real-time latency requirements under high log volume.
Financial inclusion is often measured by account ownership, yet a substantial gap persists between access and actual usage of digital financial services. This project addresses the problem of “digital financial inactivity” among account holders, aiming to identify and rank the key barriers preventing active participation. Solving this problem has direct applications in policymaking, fintech design, and targeted interventions to improve financial engagement, particularly in developing economies.
The potential impact includes more effective inclusion strategies, reduced dormant accounts, and improved economic resilience. Using the 2025 Global Findex dataset (144,000 observations, 199 features), we restrict analysis to account holders in emerging economies and define inactivity as the target variable. After extensive preprocessing: removing leakage, handling 65% missingness (feature dropping and median imputation), and addressing collinearity, we apply Logistic Regression as a baseline and XGBoost to capture non-linear patterns.
Models are validated using Leave-One-Country-Out Cross-Validation (LOCO) to ensure cross-country generalizability, with SHAP used for interpretability and barrier ranking. Key challenges included high missingness, feature leakage, and heterogeneity across countries, addressed through robust filtering and validation strategies.The model achieves a ROC-AUC of 0.82, indicating strong discriminatory power, supported by Precision-Recall performance. The solution is deployable at Plaksha as a policy analytics tool, though scaling may face challenges such as data availability, regional biases, and evolving digital behaviors.
Glioma, a aggressive primary brain tumor, presents significant clinical challenges due to its heterogeneous progression patterns and complex longitudinal behavior. Predicting post-treatment progression risk remains difficult, as traditional approaches fail to capture the spatiotemporal dynamics embedded across sequential MRI scans.
This project develops a deep learning survival analysis framework that integrates longitudinal multi-modal MRI data with clinical features to predict individualized progression risk for glioma patients using the MU-Glioma-Post dataset. The methodology combines a 3D Convolutional Neural Network (CNN) encoder, extracting tumor-centred spatial features from eight channel MRI patches (T1, T1-contrast, T2, FLAIR, and four segmentation mask channels), with a Long Short-Term Memory (LSTM) network that aggregates features across up to six sequential timepoints. Risk scores are optimized using Cox Proportional Hazards loss, elegantly handling censored patients who did not experience progression during follow-up.
Clinical variables including genomic markers, treatment history, and demographic features were preprocessed and dimensionality-reduced via PCA before integration. The model is evaluated using Concordance Index (C-index), time-dependent AUC, and Integrated Brier Score across five-fold cross-validation with a held-out test set. A well-performing solution could be deployed at Plaksha as a clinical decision-support tool, helping oncologists stratify patients into risk groups and personalize treatment planning, though scaling challenges include data privacy, compute infrastructure, and prospective validation requirements.
Alzheimer's Disease (AD) is a progressive neurodegenerative disorder characterized by gradual cognitive decline, affecting millions worldwide. Early and accurate detection remains a critical clinical challenge, as structural changes visible on MRI often appear only after significant neurodegeneration has already occurred. Resting-state functional MRI (fMRI) offers a promising alternative by capturing functional connectivity (FC) patterns which signify the synchronization of neural activity between brain regions; these are sensitive to disease-related disruption before structural damage becomes apparent. Previous studies have demonstrated that brain-predicted age, derived from whole-brain FC matrices, is significantly elevated in symptomatic AD compared to cognitively normal controls, establishing the Brain Age Gap (BAG) as a sensitive neuroimaging biomarker.
We propose a pipeline that decomposes the FC matrix extracted from resting-state fMRIs into three anatomically motivated components which are left intra-hemispheric, right intra-hemispheric, and inter-hemispheric connectivity. We then train 3 separate brain age regression models on the respective components using the resting-state fMRI data from the Alzheimer's Disease Neuroimaging Initiative (ADNI). Our hypothesis is that inter-hemispheric disconnection, reflecting degeneration of corpus callosum pathways, is a particularly sensitive driver of accelerated brain aging in AD compared to within-hemisphere connectivity changes. Our methodology applies PCA for dimensionality reduction of high-dimensional FC feature vectors, followed by SVR trained exclusively on cognitively normal (CN) subjects to establish a healthy aging baseline. After training, The models are applied to CN and AD subjects to first predict subject Brain age, and then compute per-subject Brain Age Gap (BAG) scores for each hemispheric component.
Group differences in BAG are assessed using t-tests and ANOVA, and BAG scores are also correlated with Mini-Mental State Examination (MMSE) scores to validate clinical relevance. A SVM classifier then uses the hemispheric BAG scores as features to discriminate between diagnostic groups, evaluated using confusion matrices, ROC curves, and F1 scores under stratified cross-validation. Our results specify which of the hemispheric connectivity components contributes most to brain age acceleration in AD, offering a biologically interpretable decomposition of the brain age signal that extends beyond prior whole-brain approaches. The pipeline finishes up in a per-subject report card combining the predicted brain age, hemispheric BAG breakdown, cognitive score correlation and diagnostic classification. Ultimately, Our project aims to break down complex fMRI data into interpretable diagnostic results that rely primarily on Brain Age as a biomarker.
Alzheimer's disease affects millions of people globally, with numbers projected to triple by 2050. Despite this scale, predicting how quickly a patient's condition will worsen remains difficult – clinical visits are infrequent, MRI data is high-dimensional and takes a lot of skill to read, and most patients are diagnosed only once cognitive decline is already well underway. Our project uses machine learning to forecast Alzheimer's progression, specifically predicting a patient's Clinical Dementia Rating (CDR) score over time using longitudinal MRI features and clinical data such as MMSE scores, brain volumes, and education levels. Prior approaches, including standard LSTMs, assume regular intervals between patient visits – a poor fit for real-world clinical data where gaps between appointments vary widely.
We trained a Time-Aware LSTM (T-LSTM) on the OASIS longitudinal dataset, a cohort of 150 patients. The T-LSTM incorporates a time-decay mechanism that discounts the influence of older visits when large gaps exist between them, making it better suited to irregular medical records. Three technical problems shaped most of our development. First, each visit produced 1,536 MRI features from a CNN feature extractor, far too many for 150 patients – we applied PCA to compress these to 32 dimensions and used aggressive dropout and L2 regularization. Second, the dataset skewed heavily toward healthy or mildly affected patients; we built a custom inverse-frequency weighted loss function that penalized errors on rare, severe cases more heavily.
Third, raw time gaps in days saturated activation functions, so we standardized them into years. We evaluated performance using MAE, RMSE, and a custom Time-Aware MAE. The final model achieved a Test MAE of 0.2356 and RMSE of 0.2962. Since the CDR scale runs from 0 to 3 in half-point increments, an MAE under 0.25 means the model's predictions are consistently within a quarter-point of a clinician's actual diagnosis. The Time-Aware MAE confirmed the model performs well specifically within the 1–2 year forecasting window most relevant for care planning.
Traditional road safety analysis often focuses only on conflict events, ignoring exposure factors such as traffic volume. Without considering Annual Average Daily Traffic (AADT), we cannot accurately distinguish between a low-volume road with high crash risk and a high-volume road where the frequency of interaction magnifies danger. To address this, our project integrates crash data with AADT to derive a more representative safety risk index for intersections. This approach can support city authorities in prioritizing infrastructure investments, assist urban planners in safety simulations, and enable insurers to improve risk profiling.
For the classification task, we adopted a deep learning approach using a Convolutional Neural Network, specifically a pretrained ResNet model. The final fully connected layer was modified to suit our classification objective. Ground truth labels were generated by combining crash data with AADT values, merging two datasets to assign a safety score to each intersection. A key challenge was acquiring the dataset, requiring extensive outreach to institutions.
Additionally, integrating datasets was complex due to the absence of geolocation data in AADT, necessitating matching via street names. Image collection also required automated scripting. Model performance was evaluated using accuracy due to a balanced dataset, with current results at 60%. However, recall for the high-risk class is prioritized to minimize misclassification of unsafe intersections. A confusion matrix is also used to assess performance.
Object detection on Indian roads presents unique challenges that Western-validated datasets and preprocessing pipelines do not adequately address. This project systematically evaluates whether standard data-side techniques, including augmentation strategies and small object handling, actually improve detection performance when applied to the IDD-95k dataset, which reflects the real heterogeneity of Indian traffic: autorickshaws, two-wheelers, dense crowds, and irregular lane behavior across roughly 75,000 images spanning 11 object classes.
Three architectures are compared under controlled configurations: YOLOv11s and YOLOv26s (single-stage, COCO-pretrained) and Faster R-CNN with ResNet-50-FPN backbone (two-stage, COCO-pretrained). Each runs under a strict no-augmentation baseline and a full augmentation configuration including mosaic, mixup, HSV shifts, geometric shear with exact bounding box transforms, AutoContrast, and Gaussian noise. A third configuration adds multi-scale small object handling.
All evaluation uses pycocotools uniformly, computing mAP@0.5, mAP@0.5:0.95, AP by object size, per-class AP, and a custom COCO-style confusion matrix with background class. Validated performance improvements under augmentation and small object handling would confirm that these techniques generalize beyond Western benchmarks to Indian road conditions, providing a defensible empirical basis for their adoption in ADAS and traffic monitoring pipelines deployed in India. Deployment at Plaksha would face challenges including campus-specific distribution shift, night and monsoon degradation, and multi-camera inference cost at scale.
Rice leaf diseases significantly impact crop yield, with losses reaching up to 30% annually in South and Southeast Asia. Early-stage detection remains largely manual and dependent on expert knowledge, making it inaccessible for many farmers. This project aims to develop an automated image-based classification system to detect rice leaf diseases and estimate their severity. The solution can support precision agriculture, mobile-based diagnostics, and scalable crop monitoring systems, ultimately improving decision-making and reducing economic losses.
The proposed methodology uses a hybrid feature extraction approach. Deep features are obtained from a pretrained MobileNetV2 model, capturing high-level spatial patterns, while handcrafted features derived from GLCM (texture) and HSV statistics (color) provide domain-specific insights. Data augmentation techniques improve generalization. These features are concatenated, standardized, and used to train a Random Forest classifier with class balancing.
Additionally, an HSV-based masking method estimates disease severity across four stages. Challenges such as class imbalance and variability between dataset and real-world images were addressed using augmentation and balanced learning. The model achieved ~97% accuracy with strong precision, recall, and F1-scores, indicating reliable performance. Its lightweight design enables deployment as a mobile or web-based diagnostic tool at Plaksha. However, scalability challenges include sensitivity to lighting variations, device differences, and extension to additional disease classes.
Gender inequality in markets is often reduced to average wage gaps, which overlook deeper structural disparities in career progression and job quality. This project develops a machine learning framework to analyze gender parity across three dimensions: pay, seniority, and employment risk. Using the American Community Survey (ACS) dataset, we construct a job-level hierarchy from occupational codes and model the probability of individuals reaching senior roles using an XGBoost classifier.
To isolate the effect of gender, we implement a counterfactual approach that compares predicted outcomes for the same individual under different gender assignments while holding other characteristics constant. In addition to seniority, we design a composite risk score to capture precarious employment, incorporating factors such as low income, reduced working hours, informal employment, and industry instability. This enables a more nuanced understanding of economic vulnerability beyond traditional income measures.
Our findings reveal that gender disparities are highly sector-specific. Sector-level insights from our model can help policymakers and women identify industries where gender disparities are most pronounced, enabling more targeted efforts toward achieving pay parity. By combining predictive modeling with explainability techniques, this framework provides actionable insights into where and how inequality manifests in the labor market. The results have implications for policymakers, organizations, and institutions seeking targeted interventions to improve gender equity in workforce participation and career advancement.
The challenge in exoplanet detection involves distinguishing true planetary transits from astrophysical false positives that mimic these signals. By utilizing the Kepler Object of Interest (KOI), TESS Object of Interest (TOI), and K2 datasets, we developed a classification system to compare physics-aware classical machine learning against purely data-driven models. Our methodology employed Random Forest and Histogram-based Gradient Boosting to process features anchored in four theoretical physical principles: transit depth consistency based on the planet-to-star area ratio, duration consistency assuming circular orbits, impact parameter squared, and thermal consistency derived from the Stefan-Boltzmann Law.
While physics-aware models reached an accuracy of 0.9550 and a PR-AUC of 0.9913 on the high-quality KOI dataset, performance varied across different satellite missions. In highly noisy environments like K2, purely data-driven models were preferred because they could more effectively fit to the complex systemic noise that often violates idealized physical assumptions. Conversely, on the TESS dataset, physics-aware features improved the precision-recall balance, suggesting that the mission's noise profile does not prevent physical anchoring from outperforming purely statistical models. This automated solution could be deployed at Plaksha to accelerate the discovery of new worlds, though scaling will require addressing the increased algorithmic complexity and varying sensitivity of physical anchors across different noisy data sources.
This project aims to resolve the issue of detecting property cliffs where structurally close molecules exhibit huge variations in the property values like the HOMO-LUMO gap. As the traditional machine learning models rely on the assumption of a smooth structure-property relationship, they become inconsistent in detecting cliffs. Detecting molecular property cliffs accurately is important for applications in drug design, material design, etc, as unexpected molecular behavior leads to expensive experimental errors.
For tackling the above challenge, we developed a graph neural network(GNN) based pairwise classification approach as molecules are generally graph-structured. We used QM9 database where molecules are converted into graphs using RDkit and pairs of molecules similar to each other structurally are formed with the help of Morgan fingerprints and Tanimoto similarity. The model uses graph convolutional layers (GINEConv) with message-passing capabilities which get followed by a multilayer perceptron layer for classification.
We used techniques like class weight, and threshold tuning to overcome the problem of class imbalance and lack of diversity in the dataset. Our model achieved a PR-AUC of 0.9397, ROC-AUC of 0.9921, an F1 score of 0.8626, a precision of 0.8071, and a recall of 0.9262. This demonstrates effective detection of rare cliff cases. At scale, the main challenges will be reliability on unseen molecules and computational cost.
Retinal fundus images provide a non-invasive view of systemic vascular and brain health. While deep learning models can estimate Retinal Age, they often act as black boxes. This limits clinical trust, as it is unclear which vascular factors contribute to the Retinal Age Gap (RAG).We use RETFound, a Vision Transformer (ViT)-based model, to predict biological age from retinal images resized to 224×224.
To introduce interpretability,vessel segmentation is performed using CLAHE and Frangi filtering. From the extracted vascular network, biomarkers such as vessel density, length, tortuosity, branching patterns, and thickness are computed using Python Vessel Biomarker Measurement (PVBM). SHAP (SHapley Additive exPlanations) is then applied to quantify the contribution of each feature to the predicted age, enabling feature-level explanation.The model achieves a Mean Absolute Error (MAE) of ~6.99 years, indicating reasonable prediction accuracy.
The system can be deployed at Plaksha via a web-based interface integrated with fundus imaging devices for screening and research. Key challenges included variability in dataset quality, noisy vessel segmentation, and diffuse transformer attention maps that did not align with anatomical structures. These were addressed through preprocessing, parameter tuning, and adopting a feature-based explanation approach. Scaling challenges include handling diverse imaging conditions and optimizing the computational cost of SHAP.
Movie Success Prediction Predicting a movie's commercial outcome before its release is a critical challenge for producers and investors seeking to mitigate financial risk and optimize resource allocation. This project develops a data-driven solution to forecast movie success (Hit, Average, or Flop) by analyzing key metadata, including genre, budget, and cast. Our methodology involved building custom datasets for Indian and Hollywood films via API scraping from TMDB and Wikipedia to ensure data quality and control.
A core novelty is the integration of ROI-based engineered features, capturing the historical "star power" and success patterns of directors, actors, and production companies. We addressed significant challenges such as missing budget data, inconsistent JSON formats, and contributor frequency issues through rigorous manual verification and seasonal grouping. Multiple supervised machine learning models–Logistic Regression, Random Forest, and XGBoost–were trained and evaluated using metrics like Accuracy, F1-score, and ROC-AUC. While the system demonstrates strong predictive reliability, future scaling faces hurdles like data drift and evolving audience preferences, necessitating regular retraining to maintain accuracy in a dynamic global film market.
This project develops a city-wise housing price prediction framework for India using large-scale scraped real estate listings with geospatial, infrastructural, and socioeconomic context. Beyond conventional listing attributes such as price, area, and BHK configuration, the pipeline integrates lesser resolution than standared spatial indexing for better merge quality, road accessibility measures, satellite-derived land and built-environment indicators, and district-level socioeconomic proxies to better capture urban spatial structure and neighborhood effects.
A central focus of the work is data reliability and spatial consistency. The system emphasizes improving merge quality, geocoding precision, locality resolution, and spatial coverage so that external contextual features are incorporated only when spatial alignment is sufficiently trustworthy. To reduce overly optimistic evaluation commonly caused by spatial leakage, the modeling framework employs city-wise and spatially aware train-test splits instead of naive random sampling.
Agricultural production in India is highly influenced by weather conditions such as temperature, rainfall, humidity, and solar radiation. However, accurately predicting crop production remains a significant challenge for farmers, planners, and policymakers, limiting effective agricultural planning and resource management. This study develops a machine learning-based crop production prediction system for major crops in India by integrating historical weather and agricultural production data. This model provides prediction of crop production for top 5 major crops grown in India, for a specific district in their leading producing states. The crops and their corresponding district selected are as follows- 1.
Rice- Bankura in West Bengal 2. Wheat- Bathinda in Punjab 3. Sugarcane- Lakhimpur Kheri in Uttar Pradesh 4. Cotton- Amreli in Gujarat 5. Maize- Chitradurga in Karnataka A diverse set of weather-related predictive features was constructed using seasonal climate data, including: • Temperature • Rainfall • Rainfall count (number of rainy days) • Humidity • Solar radiation • Wind speed These variables were selected because of their direct influence on crop growth, photosynthesis, water availability, transpiration, pollination, and plant stress.
The dataset was standardized to improve model performance and ensure consistent feature scaling across regions and seasons. Several machine learning approaches were considered for crop production prediction. These models include Ridge Regression, Elastic Net, Random Forest, Gradient Boosting, SVR(RBF) and Multi-layer Perceptron- Neural Network. The model which performed the best was Ridge Regression with and R2 of 0.5969 and MAE of 34.70%. The proposed system provides a scalable framework for weather-driven agricultural production forecasting, with potential applications in crop planning, irrigation management, food supply forecasting, and agricultural decision-making. By enabling earlier and more accurate production estimates, the model can help reduce uncertainty for farmers and contribute toward improved food security and resource management across India.
Pharmaceutical shipments to developing countries face delays driven by operational inefficiencies, structural barriers like poor customs infrastructure and environmental disruptions. Existing models predict delays but fail to explain why, limiting actionable response. Our solution creates a novel dataset, vertically combining shipment data, World Bank logistics scores and disaster records – capturing both operational and macro-environmental conditions at the destination country level – to decompose each predicted delay into feature-level contributions.
This enables targeted interventions like disaster-responsive rerouting and vendor accountability, ultimately reducing life-critical medicine stockouts in vulnerable regions. We employed XGBoost as our primary model, selected for its native handling of class imbalance via scale_pos_weight, with SHAP decomposition attributing each prediction to operational, structural or environmental causes. Alternative models were benchmarked but deprioritized as XGBoost maximized F1-score while maintaining high recall critical since missing a delayed shipment is costlier than a false alarm.
Key challenges included 88/12 class imbalance (addressed via SMOTE), 18.7% missing disaster data (resolved through hierarchical median imputation) and temporal leakage (prevented using walk-forward temporal validation). Our model achieves 79.63% balanced accuracy, 73% recall, 52.34% F1-score, and 88.8% ROC-AUC. Scaling challenges include maintaining real-time disaster data feeds, adapting to regions with sparse logistics ratings, and ensuring prediction latency remains low for operational decisions.
Sleep onset latency is a key indicator of sleep readiness, but most consumer wearables still provide only retrospective sleep summaries rather than actionable predictions before bedtime. This project addresses the problem of predicting whether a person will fall asleep within 20 minutes, enabling more timely and personalized sleep guidance.
We developed a machine learning pipeline using wearable physiological signals and contextual features from the DREAMT dataset, and also evaluated the approach across multiple datasets, including MESA, to test generalizability. The methods explored include XGBoost for structured feature-based learning and an RNN-LSTM model to capture temporal patterns in pre-sleep physiological signals. These models were designed to learn from features such as heart rate, heart rate variability, accelerometer-based activity, skin temperature, and time-based context.
The project is useful for applications such as bedtime decision support, sleep coaching, and personalized sleep optimization. Its broader impact lies in helping users make better sleep decisions using objective physiological data. Although dataset differences and limited sample sizes posed challenges, the work demonstrates a practical framework for real-time sleep onset prediction using wearable data
Urban Heat Islands (UHI) raise urban temperatures several degrees above rural surroundings, increasing heat stress and energy demand in tropical Indian cities. Existing studies focus on Western contexts with limited feature sets. Our project predicts UHI severity in Bengaluru using a multimodal feature pipeline integrating satellite, climate, and urban morphology data, supporting climate aware urban planning.
We construct a 2,433 point dataset combining Landsat and ERA5 multispectral indices (NDVI, NDBI, EVI, Albedo) from Google Earth Engine, urban morphology features (building density, street density, sky view factor) from OSMnx, and Relative Wealth Index data. After KNN imputation and feature scaling, we train Random Forest, XGBoost, KNN, and SVM models for both LST regression and 4 class UHI classification using the Mean SD method (Liu and Zhang, 2011). Models are validated with Stratified 5 Fold and Spatial Block cross validation, tuned via Optuna, and tested for cross city generalization on Pune through per city feature standardization.
The main challenge was unifying heterogeneous geospatial sources, addressed via spatial joining and imputation. Our best XGBoost model achieves R squared 0.79 for LST regression and 86.7 percent ROC AUC for 4 class severity classification. The pipeline can be deployed at Plaksha to flag high risk thermal zones in Mohali and inform campus cooling interventions, with scale challenges including seasonal variation and repeated satellite query cost.
Diabetic retinopathy (DR) is one of the leading causes of preventable blindness globally, with a rapidly increasing burden, particularly in low-resource settings where access to ophthalmic screening is limited. Although deep learning-based models have shown high accuracy in DR detection, most are trained and evaluated on curated, high-quality datasets that do not reflect real-world variability in imaging conditions, leading to performance degradation during deployment. This gap between benchmark performance and clinical reliability highlights the need for robust and generalizable screening systems.
This study proposes a convolutional neural network-based framework for multi-class DR classification using transfer learning. The model is built on EfficientNet-B0, pretrained on ImageNet, and fine-tuned on retinal fundus image datasets to leverage transferable visual representations while adapting to domain-specific features. A two-stage transfer learning strategy is employed, where the classification head is trained initially with frozen backbone layers, followed by selective unfreezing for fine-tuning. To enhance lesion sensitivity, a Convolutional Block Attention Module (CBAM) is integrated to complement the inherent Squeeze-and-Excitation attention in EfficientNet by incorporating spatial attention for improved localization of pathological regions.
Class imbalance is addressed using a combination of class-weighted cross-entropy and focal loss. To evaluate robustness, the framework introduces controlled image degradations such as variations in brightness, contrast, blur, and noise, enabling systematic assessment of model stability under real-world-like conditions. Additionally, Grad-CAM is used to analyze shifts in model attention across degraded inputs. The proposed approach aims not only to improve classification performance but also to study generalization and interpretability under clinically realistic imaging variability.
Stubble burning in the northern Indian state of Punjab during the post-monsoon months of October and November releases an estimated 35 million tonnes of crop residue emissions annually, contributing substantially to the deterioration of air quality across the National Capital Region and posing significant public health risks. Current monitoring approaches are largely retrospective, relying on satellite-based fire detection bulletins that report incidents after their occurrence. This project develops a machine learning framework for one-week-ahead forecasting of fire activity at a 7km spatial resolution, enabling district-level authorities to anticipate high-risk regions and allocate enforcement resources accordingly.
A grid-based feature table was constructed at the resolution of grid cell × week × year, encompassing 1,036 active cells across Punjab over six burning seasons from 2018 to 2023. The dataset integrates multiple sources: active-fire detections from the NASA FIRMS programme (MODIS Collection 6.1 and VIIRS NOAA-20 Collection 2), vegetation indices derived from the MODIS MOD13Q1 product, daily climate reanalysis variables from ERA5-Land (including air temperature, soil moisture, wind components, and vapour pressure deficit), and a curated set of Punjab-specific policy indicators such as the National Green Tribunal enforcement intensity, ex-gratia compensation schemes, Super Seeder availability, and central Crop Residue Management funding allocations. Engineered features include spatial-temporal lag terms to capture autocorrelation, NDVI anomalies to isolate harvest-stage signals, and composite fire-weather indices to represent physical fire-risk drivers. Five models were evaluated under a strict temporal split, with training on data from 2018 to 2021, validation on 2022, and held-out testing on 2023.
The models comprised a persistence baseline, logistic regression, random forest, XGBoost with a Tweedie objective, and a two-stage hurdle model. The XGBoost-Tweedie configuration achieved a test-set PR-AUC of 0.894 and a mean absolute error of 2.04, representing a 21 percent reduction in mean absolute error relative to the persistence baseline. Feature importance analysis using SHAP values indicated that five spatial-temporal and seasonal features each contributed between 10 and 20 percent of total predictive importance, with weather and policy variables incorporated to address residual error in high-intensity burning cells where the satellite-only baseline exhibited reduced accuracy. This work demonstrates that integrating remote sensing observations, physical climate variables, and human-system policy indicators produces a forecasting framework with potential for deployment as a weekly district-level risk-assessment tool to support evidence-based intervention against stubble burning.
Accurate subsurface ocean information is essential for forecasting, navigation, and climate analysis, yet direct measurements of temperature and salinity profiles remain sparse and irregular because in situ sampling is expensive and limited in coverage. This project addresses the problem of reconstructing subsurface temperature T(z) and salinity S(z) fields from widely available satellite surface observations–sea surface temperature (SST), sea surface salinity (SSS), and sea surface height (SSH)–while leveraging Argo float profiles as observational anchors. A reliable surface-to-subsurface reconstruction system can support applications such as upper-ocean state estimation for operational oceanography, improved initialization and validation for numerical ocean models, better monitoring of mesoscale features (fronts and eddies), and enhanced decision-making for fisheries, maritime routing, and coastal/environmental management.
The expected impact is a more scalable and cost-effective pathway to produce higher-resolution, observation-consistent subsurface estimates in regions where dense in situ sampling is not feasible, thereby improving situational awareness of ocean dynamics and strengthening downstream forecasting and climate-relevant analyzes. We implement a supervised deep learning approach based on a 2D U-Net architecture trained to map gridded surface fields [SST,SSS,SSH] to multi-depth subsurface outputs [ T ^ (x,y,z), S ^ (x,y,z)]. U-Net is selected because its encoder–decoder structure captures both large-scale context and fine spatial details via skip connections, making it well-suited for geophysical maps that contain multi-scale patterns.
The encoder progressively compresses the input to learn robust features, while the decoder upsamples these features back to full resolution; skip connections transfer high-resolution information to preserve sharp gradients associated with fronts and eddies. Training uses a baseline dense loss against gridded reference fields to provide stable learning over the full domain, and additionally incorporates an Argo-based loss computed at profile locations and depths to anchor predictions to real observations. This combined objective improves physical realism by reducing reliance on any single gridded product and ensuring that predicted vertical structures align with measured profiles where available.
Indian multilingual OCR systems depend critically on accurate script identification – a single misclassification cascades through every downstream module and renders output unintelligible. Using the Bharat Scene Text Dataset (BSTD), we quantify this bottleneck: supplying correct script labels alone doubles the Word Recognition Rate from 36% to 71%, without any other change. Our project tackles the hardest sub-problem within this pipeline – distinguishing four visually similar Indian scripts (Assamese, Bengali, Hindi, and Marathi) from cropped real-world word images, using 7,200 training and 1,912 test images from BSTD.
We conducted a systematic three-phase investigation. Phase 1 established CNN baselines with ResNet-18 and ResNet-50, revealing that overfitting – not insufficient model capacity – was the primary bottleneck. Phase 2 explored ViT-B/16, MobileNetV3-Large, EfficientNet-B0, a dual-input attention-fusion model, and hand-crafted preprocessing techniques (Sobel, Laplacian, CLAHE), finding that ViT training collapsed under data scarcity and all preprocessing degraded performance relative to raw RGB input.
Phase 3 applied targeted training improvements – stronger data augmentation, label smoothing, and cosine annealing – to the best baseline model. Our optimized ResNet-18 achieved 74.01% accuracy and a macro F1 of 0.74, a 6.33% gain over baseline. Critically, a consistent accuracy ranking across all five tested architectures reveals a fundamental data-level ceiling for Hindi–Marathi separation, as both share identical Devanagari letterforms. This module is directly deployable as a pre-processing component in any Indian OCR pipeline, improving multilingual text recognition from signboards, campus notices, and administrative documents.
Autonomous vehicle perception systems trained on daylight imagery degrade significantly at night due to domain shift from illumination changes, noise, and visual artifacts. This project addresses robust vehicle classification across day–night domains while minimizing reliance on labeled nighttime data, with applications in surveillance, traffic monitoring, and campus mobility. We implement a Domain Adversarial Neural Network (DANN) with an EfficientNet backbone to learn domain-invariant features without requiring extensive target labels.
A 51k-image dataset was curated from BDD100K using domain-based splitting, class balancing (4:2:1), and filtering of low-visibility noise. The architecture combines a feature extractor, label classifier, and domain classifier via a gradient reversal layer. Training challenges–class imbalance, noisy nighttime samples, and limited GPU memory–were handled using weighted sampling, area-based filtering, gradient accumulation, and mixed precision.
Across three seeds, the model achieves 86.81% nighttime accuracy, outperforming supervised finetuning. The key contribution is label sweep analysis: strong gains at just 5% labeled data, with performance saturating beyond 75% (only +0.05% up to 100%). This shows most adaptation benefits come from minimal labeling, enabling practical deployment. EfficientNet-B0 supports real-time deployment in campus-scale systems such as Plaksha.
Humans rely heavily on speech to communicate content and emotions simultaneously. Therefore, in developing speech systems, the knowledge of emotions should be used as it allows systems like speaker recognition, synthesis and language identification understand the speakers mind and reaction.
While Speech Emotion Recognition (SER) systems have come a long way there hasn’t been a lot of work done to create these systems in Indian Languages. Along with that most systems focus only on detecting emotions rather than helping speakers improve their speech. In this project we are proposing a system which not only classifies emotions but also helps speakers understand how to speak to expressive a specific emotion.
Acoustic and spectral features are extracted from speech and are used in an SVM to first detect which emotion is being conveyed. After that a neural network-based regression model is trained to learn the difference between acoustic feature distributions of different emotions. Given a user’s speech features and a desired target emotion, the model predicts adjustments indicating how pitch, energy, and articulation should change.
Traditional financial risk models, such as GARCH and EWMA, predominantly rely on historical price movements to forecast market behavior, often failing to capture volatility triggered by information shocks before prices adjust. Previous studies have primarily focused on predicting directional stock returns based on news or using purely price-based volatility models, lacking a profound integration of unstructured sentiment data for predicting diversified fund risk. To address this gap, this project introduces a machine learning methodology to predict the 21-day rolling volatility of Indian mutual funds by integrating historical financial data with NLP insights.
We collected and processed over 200,000 Indian financial news headlines, using FinBERT to extract sentiment probabilities and construct a daily sentiment volatility index. These metrics were combined with conventional technical indicators (such as MACD, RSI, and rolling beta) derived from the top 80% weighted stock holdings of three major Indian mutual funds: HDFC Large Cap, Nippon India Large Cap, and SBI Bluechip. The fully engineered and merged feature space used for model training is compiled in the final_mf_dataset.csv file.
We employed advanced tree-based algorithms, primarily XGBoost and Random Forest, formulating the task as an out-of-sample time-series forecasting problem to identify non-linear relationships between sentiment shocks and future fund risk. Algorithm choices were meticulously driven by their ability to handle complex, non-linear relationships and interactions between firm-level and global sentiment. such as RMSE, MAE, and R-squared are used to benchmark the models against Historical Volatility, EWMA, and GARCH baselines. The significance of this project lies in demonstrating that sentiment volatility provides predictive power beyond traditional models, offering an actionable, early-warning framework for portfolio risk mitigation in asset management.
Students and early-career candidates often apply to jobs without knowing which skills are actually missing from their resumes. Existing tools usually compress resume-job fit into a single semantic similarity score, which can make a polished resume look ready even when important requirements are absent.
This project builds a resume skill-gap analyzer that identifies missing skills, mismatched categories, and actionable areas for improvement. The system trains Logistic Regression and Gradient Boosting models on 9,495 weakly labeled resume-job pairs constructed from 2,484 resumes across 24 categories and 853 job descriptions. Features include SBERT semantic similarity, TF-IDF overlap, skill counts, and skill-imbalance signals, all built with leakage-safe train-only fitting.
Weak supervision, a 100-pair gold standard, strict preprocessing boundaries, and regex-based normalization are used to handle the absence of high-quality labels and reduce noisy skill extraction. The solution can support student placement cells, career services, and individual applicants by providing more interpretable feedback than a single match score. It is lightweight enough to deploy as a web tool, though scaling will require better labeled data, broader job-domain coverage, and continual updates as job descriptions and skill vocabularies evolve.
Music is not just heard, it is felt, yet most systems still struggle to understand its emotional essence. Understanding the emotional content of music remains a challenging problem, especially for Hindi/Hinglish songs where multiple styles and narratives often overlap. Traditional genre-based classification fails to capture this emotional nuance. This project focuses on automated mood classification of songs into four categories: happy, sad, excited, and romantic, to enable more meaningful music organization.
The solution has applications in recommendation systems, voice assistants, and mental wellness platforms, with the potential to improve personalization and enhance representation of Indian music in AI systems. We develop a supervised learning pipeline where audio features such as MFCC, chroma, spectral, rhythm, and energy descriptors are extracted using Librosa. These features capture the physical characteristics of sound, including timbre, harmony, frequency distribution, and rhythmic structure. They are complemented with high-level emotional features using the Essentia library, which estimates valence and arousal directly from audio, enabling mapping into a continuous emotional space.
We experimented with multiple models including Random Forest, Support Vector Machines, XGBoost, K-Nearest Neighbors, and LightGBM, evaluating them using accuracy and F1-score, and selected the best-performing model for final deployment. Key challenges included manual dataset creation of around 1600 songs due to API limitations, subjective labeling, and feature overlap, which were addressed through preprocessing and feature engineering. The final model demonstrates consistent improvement across classes after incorporating emotional features. The system can be deployed as a backend service for real-time mood-based playlist generation, with scalability challenges including linguistic diversity, label ambiguity, and computational constraints.
The development of AI-assisted interview platforms providing live scripted responses has endangered the honesty of online hiring because of the inability to distinguish between scripted and authentic answers. This project tackles this issue by developing a classification algorithm that can detect whether the speech is read aloud or spoken spontaneously, using the linguistic distinctions between the two modes of speech as a basis. This algorithm could be implemented into interview integrity programs or large-scale proctoring systems.
One of the problems was the difference between clean LibriSpeech audio and noisy real-life spontaneous speech, potentially leading to the classifier to learn recording conditions instead of linguistic distinctions. It was solved using a series of normalization operations consisting of filtering, loudness matching, and the addition of environmental noise. Several acoustic features (eGeMAPSv02) and speaking rate were extracted to produce about 98-dimensional feature vectors.
Support Vector Machine (RBF kernel) was chosen for its robustness to a large number of input features and interpretability. Model evaluation was carried out on AUC (>0.85) and per-class F1-scores metrics, showing high discriminative capabilities. Deployment is simple as a relatively lightweight backend service with low latency, but there are problems with cross-language generalization, domain shift, and possible adversarial behavior.
Transliteration converts text between scripts while preserving pronunciation, such as mapping 'namaste' to 'नमस्ते'. Existing systems usually learn direct script-pair mappings, requiring O(N²) parallel datasets across 21 Indic languages, which is difficult to scale. This project proposes a shared phonetic space that supports cross-lingual search, named-entity transfer, multilingual keyboard input, and low-resource transliteration without needing pairwise datasets for every language pair.
The system uses a character-level Transformer encoder shared across 21 languages, InfoNCE contrastive loss with τ = 0.07, mean pooling, and L2 normalization to produce a 128-dimensional unit vector. Aksharantar is used because it provides 26 million pairs across 21 languages, offering broader coverage than Dakshina or smaller Neural MT datasets. The encoder maps Unicode characters through transformer layers, pools the sequence, and learns to pull matching native-Roman pairs together while pushing unrelated pairs apart.
The learned phonetic space achieved a mean positive similarity of 0.9054, mean negative similarity of −0.0708, a discrimination gap of 0.9761, R@1 of 81.6%, R@5 of 97.8%, and R@10 of 98.7%. These results show that the space cleanly separates matching cross-script pairs from unrelated ones. Scaling challenges include weaker performance on low-resource languages such as Kashmiri, real-world spelling noise absent from Aksharantar, and ambiguity in IPA-to-vector conversion.
Sophisticated phishing attacks increasingly use polymorphic techniques and visual spoofing to bypass traditional, single-modality security filters, leading to severe credential theft. This project addresses this by developing a robust Multimodal Late-Fusion Ensemble to detect zero-day phishing threats by cross-analyzing three distinct webpage dimensions. Potential applications include enterprise proxy gateways and browser extensions, impacting cybersecurity by neutralizing breaches before user interaction.
We engineered a "Wide-Stacking" architecture deploying nine specialized machine learning experts: Tree-based models (XGBoost, LightGBM, Random Forest) evaluate 56 deterministic URL lexical features; Distance and Neural models (SVM, MLP) analyze MPNet-encoded semantic HTML text; and Linear Classifiers evaluate ResNet50 visual embeddings because different data modalities require specific mathematical logic. Tree-based models process deterministic URL features via non-linear cutoffs; Distance models (SVM) classify high-dimensional semantic HTML text (MPNet); and Linear Classifiers evaluate visual embeddings (ResNet50) to prevent overfitting. These models evaluate webpage artifacts independently, preventing visual features from mathematically overshadowing subtle lexical cues, before a Logistic Regression meta-classifier synthesizes the final prediction.
Challenges included hardware memory faults during concurrent training and the algorithmic complexity of feature overshadowing, tackled via constrained single-core execution and Late-Fusion design. Achieving 98.20% accuracy, 0.9843 recall, and a 0.9820 F1-score, the metrics mathematically prove the system effectively traps active threats (high recall) while minimizing operational false alarms (high F1). At Plaksha, this can be deployed at the network perimeter, though scaling requires mitigating deep learning inference latency during high traffic.