Machine Learning and Pattern Recognition

AI3011 | Spring 2024

Home Lectures Readings Labs Project Info

Monsoon 2023


A Data-Driven Approach to Optimize Content Strategies and User Engagement for Plaksha's Communication Team

Hriday Rathi, Saumya Choudhary, Shreevardhan Shah

CODE PDF
[click on the image to enlarge]

In the digital era, universities recognize the critical role of social media in connecting with students, alumni, and the wider community. However, optimizing engagement on platforms remains a challenge.

Previous studies, including Ng et al. (2023), predominantly focused on basic statistical analyses of social media data, lacking a profound integration of content analysis and real-time adaptability. To address this gap, we collected historical data from Plaksha University’s official Instagram account, incorporating posts, comments, likes, shares, and follower details.

After rigorous EDA, trend identification and handling missing values, we integrated We employed a combination of TF-IDF with SVM for content analysis and Random Forest Regression for engagement prediction. Algorithm choices were meticulously driven by their ability to handle linear and non-linear, textual and numeric data.

Performance metrics such as MAE, RMSE, Accuracy, Precision, Recall and F1-Score to showcase model effectiveness. Lower MAE and RMSE values indicated accurate user engagement predictions and higher F1-Scores reflected a better understanding of effective content types.

This project is aimed at contributing to tailoring effective content strategies in real-time, leading to enhanced engagement and a more vibrant social media presence for Plaksha.


A Futuristic Approach to Campus Localization and Information

Kaustubh Singh, Shivam Kumar, Vikas Kumar

CODE PDF
[click on the image to enlarge]

This project addresses the demand for an innovative solution to enhance campus tours for visitors using machine learning and computer vision. The objective is to enable users to explore the university environment effortlessly by leveraging a Convolutional Neural Network (CNN) with hidden layers. The CNN is designed to identify and classify various campus objects, including artworks and buildings. To augment the dataset and enhance model robustness, image tilting, blur, and brightness adjustments were applied.

In contrast to prior studies, this project introduces a unique approach by incorporating object localization alongside classification. The CNN not only predicts the category of the object but also uses mobile phones to detect the objects of interest, providing users with a visual indication of the identified entities. Additionally, a text paragraph component creates descriptions about the recognized art pieces or landmarks.

Data preprocessing involves standardization and augmentation techniques, enhancing the network's ability to generalize across diverse campus scenes. The choice of a 3 convolutional layers with max-pooling between the layers and 2 fully connected layers is driven by the balance between model complexity and efficient feature extraction.

Results demonstrate strong performance metrics, including 95% testing accuracy, validating the efficacy of the proposed approach. The results are also validated through a confusion matrix. The significance lies in automating and personalizing campus tours, offering visitors an interactive and informative experience. This project contributes to advancing the state-of-the-art in campus object recognition, paving the way for more intelligent and user-friendly exploration of university environments.


CalorieScan

Divita Jain, Spandan Panda

CODE PDF
[click on the image to enlarge]

College students often encounter challenges in maintaining a balanced diet, especially when depending on cafeteria food, due to insufficient nutritional information. This can lead to health issues such as fatigue, adversely affecting their academic performance and well-being. Unlike existing studies like “FoodieCal”, a unique advantage of CalorieScan lies in its adaptation to the specific plate design used in the university mess, which features six fixed sections for food placement. This knowledge has been leveraged to ensure that the model's accuracy in identifying food items is exceptionally high.

Data for this project was meticulously collected by manually capturing images of mess plates post-serving, ensuring the creation of a diverse and realistic dataset. These images underwent a two-step processing approach, with a custom-trained YOLOv8 model for object detection. Subsequently, the images were cropped based on predefined sections of the plates identified by YOLOv8. The cropped images were then subjected to classification using a fine-tuned RESNET18 model, with emphasis placed on fine-tuning the last layer of RESNET18 for optimal performance. The selection of YOLOv8 and RESNET18 was driven by their efficiency in object detection and computational balance, respectively.

We evaluated CalorieScan using accuracy as our primary metric. This choice was driven by our main goal of accurately predicting meal calorie content, and accuracy provides a clear measure of overall correctness. Our dataset featured a uniform distribution of calorie levels, ensuring a dependable reflection of the model's performance. The overall accuracy achieved in our evaluation was 78.7%.


Classroom Attention Detection using Computer Vision Techniques

Nandan Mandal, Priyanshu Singhal, Vaishnavi Rathi

CODE PDF
[click on the image to enlarge]

Our project aims to build a real-time Classroom Attention Detection Model, addressing the essential need for an objective and dynamic assessment of student engagement. This model represents a significant advancement over previous methods for attention mapping in the classroom setting, which lacked real-time analysis and the ability to scale effectively.

Traditional student engagement measures are often subjective and static, failing to capture real-time dynamics in the classroom. Our project differentiates itself by using advanced algorithms to provide a continuous, quantifiable measure of attention levels, scaling from 1 (unattentive), 3 (moderately attentive), 5 (extremely attentive), and normalizing these scores between 0 and 1.

Our approach combines Yolov8 for precise individual detection with MediaPipe for landmark detection, enabling posture and head pose estimation. These features are then fed into various classification algorithms, including Support Vector Machines (SVMs), XGBoost and a Neural Network. We selected these algorithms due to their proven accuracy and efficiency in object detection and facial analysis.

Our model offers real-time attention assessments and historical trend analysis, allowing educators to make data-driven decisions. With highest accuracy of 71%, it enables personalized teaching approaches that enhance student engagement. Educators can intervene early, adapt teaching methods, and improve learning outcomes, creating a more dynamic and effective classroom environment.


Email Summarizer and Prioritizer

Bhavi Bhavi, Rishi Laddha, Shaurya Mann

CODE PDF
[click on the image to enlarge]

The influx of emails in professional settings can often lead to cluttered inboxes, making it difficult to identify and act on the most critical messages. Our project aims to alleviate this issue by developing a model that prioritizes and summarizes emails efficiently. We integrated BERT for summarizing email content, ensuring that users can quickly grasp the essence of their messages. For the email prioritization, we explored several deep learning and classical ML techniques, including sentiment analysis and a multi-class classification system that labels emails as URGENT, MODERATE, or LOW priority. Our comparative analysis of priority ranking algorithms employed SVM, CNN, and LSTM networks. Performance metrics such as F1 score, accuracy, and precision were utilized to evaluate and select the most effective neural network model. The optimal model based on our performance criteria was the LSTM model which was then deployed to a cloud service to expose an API endpoint. The final outcome provides users with a daily summary of their emails, ranked by priority, which supports better email management and ensures attention to the most pressing communications.


Footfall Prediction for Reducing Food Wastage

Alli Ajagbe, Divith Narendra, Soham Petkar

CODE PDF
[click on the image to enlarge]

Our project aims to optimize inventory at Plaksha University’s mess using machine learning. While existing literature relies on conventional models and computer vision to address then problem through a supply-side outlook, it finds itself unable to answer any uncorrelated surge in demands. We instead utilize a demand-side approach to address this problem by providing recommendations based on demand forecasting for footfall while addressing varying preferences.

To infer recommendations through an unstructured dataset collected through secondary sources(SmartQ, Forbes, and Plaksha Academic Calendar) required us to employ manual cleaning and normalization based on the university population to ensure that we maintained data integrity and accounted for any unexplainable spike in demand. This pre-processing further allowed us to identify potential features which could contribute to increased model accuracy.

Let’s illustrate the impact of our model with an example. For a population of around 350, we’d find on any given day if our true footfall were 247, our model would in the worst-case scenario, predict a number as low as 212 or as high as 282. Now, how did we do that?

Well, we experimented with a host of models, however, we’ve finalized a combination of two to determine our optimal approach. In particular, we use a family of non-linear models, namely Poisson Regression and Gradient Boosting to determine our target variable, footfall. We find that this optimal model would have a minimal MAE [Mean Absolute Error] value of 35.2 and 34.73 while explaining around 80% of the variability in the footfall at Plaksha. This model was developed on a condensed dataset of around 800 data points, with 8 curated features, and would successfully minimize waste generation across campus messes.


Gym Posture Correction

Anit Upadhyaya, Tejasvi Birdh, Vishal Paudel

CODE PDF
[click on the image to enlarge]

Our project revolutionizes gym safety through an advanced real-time posture detection system employing cutting-edge methodologies. To enhance model adaptability, torso-length-based normalization is incorporated during pre-processing. In addition to exploring fundamental ML models such as DecisionTree, Random Forest, SVM Classifier, and XGBoost, our research reveals that the LSTM model excels, capitalizing on its temporal dependency understanding. To further refine our approach, we introduce time series anomaly detection, isolating key features crucial for distinguishing correct from incorrect squat postures. Our novel data augmentation algorithm, grouping frames into sets of 30, significantly enriches the training dataset, bolstering model robustness. At the core of our system is the many-to-one LSTM architecture for binary segmentation, processing 30 frames with 33 landmark features each. This meticulous design yields an exceptional F1-score of 0.87, attesting to its proficiency in precise posture detection. In summary, our innovative fusion of torso-based normalization, time series anomaly detection, and a specialized LSTM architecture establishes a pioneering framework for safer and more effective gym exercises, advancing the field of comprehensive fitness monitoring.


Health Forecasting and Medical Inventory Optimisation

Khushi Goel, Maanal Gauri, Prachi Parakh

CODE PDF
[click on the image to enlarge]

In our project, we aim to develop a health forecast to predict the onset and prevalence of health symptoms at various times of the year, to enhance preventative health measures and facilitate medicine inventory management. Previous studies have applied machine learning and time series analysis for health forecasting on a larger scale, but for serious diseases or pandemic outbreaks. Our project utilizes time series machine learning techniques for health forecasting, and is the first to our knowledge to conduct this for a university setting with a relatively small student population. Furthermore, we are using this data to also help optimize medical inventory stock.

Utilizing anonymized time series data obtained from the healthcare center spanning two years from November 2021 to November 2023, with September-November 2023 as the holdout, we performed exploratory data analysis to characterize our dataset. For feature engineering, we applied agglomerative clustering with Levenshtein distance to correct medicine names, regression to fill missing values, and k-means clustering to impute missing symptom values based on associated medicines. With number of visits as the target variable, we categorized the symptoms into correlated groups, performing predictions with various models (such as XGBoost, LightGBM, Random forest, SARIMA) to select the best one based on performance. So far, we have achieved the best results with LightGBM with an average RMSE value of 1.5 and an average mean absolute error is 0.7, both of which indicate that the predictions made by the model are sufficiently accurate, considering the small-scale dataset. The accuracy of the model has high potential to improve with a bigger dataset.


Job Recommendation System for Plaksha Students

Amol Harsh, Aryaman Khandelwal, Rishi Vijaywargiya

CODE PDF
[click on the image to enlarge]

This project presents a pioneering career recommendation system for students in India, setting itself apart from conventional systems that depend on collaborative and content-based filtering. By harnessing a dataset of top-tier LinkedIn profiles across a variety of job roles, our model offers students personalised guidance to navigate and excel in the competitive job landscape. We intend to enable students to bifurcate between the field of interest into further granularity as multiple job profiles consist of complex nuances and are interrelated to each other on a very high level. Enabling the students to decipher the space and see value in specifics is going to add for their industry experience.

Data collection involved Selenium for dynamic scraping, obtaining nuanced data beyond the reach of traditional static scraping tools. Our pre-processing pipeline made use of the Chat GPT API for advanced keyword extraction and NLTK models for text refinement, including lemmatisation and punctuation removal. For the ML component, we utilised a TF-IDF matrix for textual data vectorisation and TruncatedSVD to reduce features from 5000 to 100, thus mitigating the risk of overfitting. The employment of an XGBClassifier with 100 decision trees was predicated on its superior performance in single classification scenarios. We developed our own clustering implementation with clustering the cumulative job profiles across the vector space and used cosine similarity to see the level of similarity between the target user and the our corpus heavy vector space. The gives us the extent of consistency with the user’s resume and a comprehensive detailed data corpus in relation with experiences, skills and level of education.

The model showcased commendable accuracy at 95%; and balanced precision, recall, and F1-scores of around 90% across various job roles. It excels not merely in pinpointing a student's most likely career path but also in identifying other roles with similar skill applications, providing a multi-dimensional perspective on their employability. This holistic approach to classification enables students to explore a range of potential careers, significantly enhancing their decision-making process by illuminating paths to becoming industry front-runners.


Mentor Recommender System

Avishi Rajgarhia, Shubham Jain, Siddhant Deshpande

CODE PDF
[click on the image to enlarge]

Academic mentorship, crucial for nurturing scholarly growth, is often hindered by traditional recommender systems/time taking manual searches and filtering that inadequately capture complex academic interests for mentor-mentee pairing. Unlike prior studies focusing on basic collaborative and content-based filtering, our project addresses this gap by employing a more nuanced approach.

Our solution leverages the BERT (Bidirectional Encoder Representations from Transformers) model, known for its effectiveness in understanding complex language structures. The process begins with a dataset acquisition process facilitated by a custom scraper. This tool extracts academic profiles, encompassing bios and research interests, forming the foundation for our study. The 'bert-base-uncased' model was fine-tuned on this dataset, enabling it to grasp the specific nuances of academic disciplines. This process involved tokenization, masked language modeling, and fine-tuning BERT on masked input IDs, focusing on semantic relationships within academic fields.

The system's efficacy was gauged using accuracy score, calculated via a truth table and user feedback. Our fine-tuned BERT model outperformed traditional methods (such as Kmeans, GMM, Tf-idf) demonstrating its ability to accurately match mentors and mentees. By implementing cosine similarity for ranking faculty based on user input, our system finely tunes the recommendation process, aligning detailed academic interests more effectively. Additionally, user feedback played a crucial role in assessing real-world applicability. We implemented a user-centric approach by inviting individuals to test our model and provide subjective scores on the quality of mentor recommendations. This dual assessment, combining objective metrics and user satisfaction, reinforced the efficacy of our fine-tuned BERT model.


Optimising Parcel Handling Efficiency

Shivank Joshi, Suhani Agarwal, Tushar Goyal

CODE PDF
[click on the image to enlarge]

This project addresses the issue of time wastage in parcel collection at Plaksha due to manual searching in a large register. By automating the process of identifying recipients from shipping labels, we aim to significantly reduce the 10-minute delay per parcel. Our approach builds upon previous studies like Suh et al.'s work on shipping label recognition and validation, which uses deep neural networks to localize address areas and employs OCR for text extraction. However, our project distinguishes itself by creating an open-source dataset of Indian courier companies' shipping labels and training a model for detecting addresses and barcodes. We've used Twitter and Google to scrape shipping labels and manually annotated them. Synthetic labels were created to add diversity to our dataset. Our methodology involves the fine-tuning of YOLOv8 for efficient address and barcode detection, coupled with fine-tuned GPT4, for recognizing Indian names and addresses. The choice of YOLOv8 was driven by its object detection proficiency, and GPT4 was selected for its large language model capabilities, allowing local inference with timely, relevant outputs. Our results are measured using accuracy metrics and similarity scores, ensuring the correct association of labels with recipients and minimizing unintended matches. This approach not only enhances efficiency in parcel collection but also sets a precedent for the application of AI in logistics and recipient identification.


Optimizing Footfall, Sales and Allocation of Resources at the North Cafeteria

Atharva Sawant, Tanisha Saraf, Udhav Shankar

CODE PDF
[click on the image to enlarge]

The North Cafeteria grapples with inventory and staffing challenges, impacting financials and services. This project proposes a tailored solution for small-scale cafes, emphasizing practical applicability. Employing Machine Learning techniques, the methodology encompasses outlier identification, categorization, and pricing analysis. Initial findings reveal North Cafeteria's marginally higher pricing strategy, with a focal point on main courses and beverages. Noteworthy is the establishment's extensive menu variety and commitment to utilizing quality ingredients. The analysis reveals high-performing items, pricing correlations, and the pivotal role of time series data. The obtained results strive to optimize inventory management, elevate customer satisfaction, and institute a data-driven operational framework. Multiple machine learning models, including the Multi-Layer Perceptron (MLP), Gradient Boosting, and Random Forest, were applied, with the MLP yielding the most favorable outcome (R-squared: 0.99) and demonstrating near-accurate future predictions. Subsequent endeavors will focus on data augmentation, rigorous model validation, and the preliminary deployment of the developed solution within North Cafeteria, with a trajectory towards wider implementation across other university cafes.


PicForPos

Aman Sa, Anshika Srivastava, Anushka Desai

CODE PDF
[click on the image to enlarge]

We aim to develop an indoor navigation model using image recognition to overcome GPS limitations in indoor environments, particularly useful for guiding visitors in complex settings like universities. Unlike traditional methods that rely on landmark recognition or feature extraction to identify final coordinates, our approach simplifies navigation by integrating graph theory. We segment indoor spaces into 'zones', represented as nodes in a graph, and use graph-based algorithms for efficient pathfinding.

We collected data from Makerspace lab using both Android and iOS smartphones, capturing videos at various times to account for differing lighting conditions. The preprocessing phase included converting these videos into frames using OpenCV at custom intervals, followed by image augmentation through PyTorch. This involved extensive use of transformations, one of the notable ones was to handle human occlusion using synthetic silhouette placement. We utilized transfer learning with ResNet, a model chosen for its efficacy in image classification and adaptability to our requirements. Additionally, we implemented the Breadth-First Search (BFS) algorithm using NetworkX to navigate through the graph representation of indoor spaces on a React based GUI interface connected with a backend GPU server.

We tested the model against campus tour videos to validate its accuracy and reliability. With the expansion of the dataset, the best model achieved 96% Validation accuracy, showing its effectiveness in precise indoor location identification.


Plaksha Happiness Index

Shaurya Sighadia, Suhani Jain

CODE PDF
[click on the image to enlarge]

The Plaksha Happiness Index Project addresses the need for a tailored approach to measuring and enhancing well-being at Plaksha University. Unlike previous studies, which broadly applied machine learning to predict happiness, this project specifically targets the university's unique environment, utilizing a combination of sentiment analysis on WhatsApp chats and structured surveys to gather data.

The methodology involves an intricate data collection process from various university sources, including administrative and academic records. Key parameters like exam schedules, facility availability, and environmental conditions were converted into numerical formats for analysis. For data pre-processing, we used normalization and feature scaling techniques alongside NLP algorithms to process textual data. Our machine learning approach integrates both regression and classification models namely Random Forest, Gradient Boosting and XGBoosting, selected for their balance of accuracy and interpretability, considering computational constraints.

Results from this project showcase the performance metrics- Mean Squared Error, Mean Absolute Error and Root Mean Squared Error indicating the models' effectiveness in predicting and understanding happiness levels within the university context. These findings are crucial for developing strategies to improve the overall campus environment, highlighting the importance of factors like academic workload, physical infrastructure, and social dynamics in influencing student and staff well-being. The project's innovative approach and significant results pave the way for more personalized and effective well-being initiatives in educational settings.


Plaksha Library Recommendation System

Nainika Gupta, Nandana N, Niranjani A

CODE PDF
[click on the image to enlarge]

The project is a Library Recommendation System for Plaksha University. It seeks to enhance academic reading among students by optimizing the library's collection through personalized book recommendations. Previous studies have explored standard recommendation systems, but our approach involves a novel Hybrid Model combining Content Filtering and Collaborative Filtering, tailored to students’ past preferences as well as the activity and history of other students.

This dataset, obtained from the Plaksha library, comprises records of books issued and returned by all the batches of UG & TLP students from December 2021 to September 2023. The data privacy statement, approved by the Library Committee and the Office of Research, ensures ethical handling of the tabular dataset, mitigating potential privacy concerns. The data was non-identifiable and required minimal pre-processing. Feature extraction was performed using web-scraping, generating features such as ‘Genre’ and ‘Rating’ of the books. KNN was performed to give insights into which other users closely resemble a test user. Further, matrix factorization was employed to provide personalized recommendations.

Memory-based filtering computes similarity between users, or items, to make a prediction. A typical approach is a neighborhood-based algorithm; a similarity measure identifies the most similar users to the user or items to the user’s already rated items. The predicted rating for an item can be calculated by pooling the collected ratings, possibly weighting each by its similarity value.

Root mean square error is calculated at every model to judge the effectiveness of various models. Other performance metrics are also computed. Interpretation of these metrics will provide insights into the system's ability to generate informed book recommendations based on academic needs and individual interests.

The hybrid recommendation system, combining Content Filtering and Collaborative Filtering, has demonstrated effectiveness in optimizing academic reading at Plaksha University. Ethical handling of non-identifiable data, web scraping for feature extraction, and the integration of KNN and matrix factorization contribute to its success. Memory-based filtering, particularly in neighborhood-based algorithms, enhances user and item similarity, ensuring accurate predictions. Evaluation metrics, including RMSE, affirm the system's proficiency. The hybrid system's personalized recommendations hold immense potential to significantly enhance academic reading, fostering a culture of enriched learning at Plaksha University.


Predicting Startup Fundings via Twitter and Market Sentiment

Anshul Rana, Nikita Thomas, Sauhard Sharma

CODE PDF
[click on the image to enlarge]

Only 1 in 12 startups obtain funding, this competitive nature of financing creates a need for startups to know the likelihood of receiving funding based on their twitter activity. Studies have observed the effect of social media presence on the success of a startup, yet they fail to account for changes in market. We fill this gap by predicting which industries are facing a boom and adjusting likelihood accordingly.

Our data is threefold, for twitter we scraped tweets and engagement metrics. We utilized Tracxn for funding data from 300+ startups. Pre-processing phase involved lemmatisation using NLTK, identifying links, mentions, emojis, currency conversion, handling missing data. Using a 5 layer sequential keras model we calculated the sentiment score for each tweet. This deep learning model achieved an accuracy of 86.3% while tuning hyper-parameters like dropout rate, L2 regression to prevent overfitting. For the first funding occurrence of a startup we identified the factors which most affect a startups success eg. sentiment scores, engagement rates using Random forrest and LDA. Since our data is imbalanced, we use F-1 score to evaluate the models ability to predict the likelihood of startup receiving funding. We received an overall accuracy of 80% with an f1 score of about 86 for the funded companies.


Predicting Taxi Cab Availability at Plaksha and Optimizing Based on User Preferences

Ankit Kumar, Noyonica Chatterjee, Surabhi Tannu

CODE PDF
[click on the image to enlarge]

Addressing the challenges of high cab fares and unpredictable journey times faced by students in urban areas, this project introduces an innovative, machine-learning-based solution. Our study fills a crucial research gap in predicting cab route times and wait times. We collected a comprehensive dataset from a leading cab service provider, comprising 400,000 entries with variables like cab type, origin, destination, departure times, and prices. This dataset underwent advanced data pre-processing using techniques such as "ColumnTransformer" and "OneHotEncoder," laying the groundwork for our machine-learning analysis.

Our methodology included three models for cab price prediction: Linear Regression, Random Forest, and KNN. The performance metrics were as follows:
- Linear Regression: MSE of 3958.30, R2 of 0.78
- Random Forest Regressor: MSE of 568.21, R2 of 0.92
- KNN Regressor: MSE of 1992.32, R2 of 0.88

The Random Forest model proved most accurate, achieving the lowest MSE and highest R2 score, demonstrating its superior ability in discerning complex pricing patterns. Although Linear Regression and KNN exhibited reasonable R2 scores, their higher MSE values indicated less precision.

This project enhances the reliability and affordability of urban cab services for students, leveraging AI to predict more accurate and cost-effective travel options. Our application features a selection tool for choosing an hour and timeframe (1-3 hours), predicting times and prices at 10-minute intervals within the chosen timeframe.


PrepAI - Interview Analysis

Aditya Tyagi, Malhaar Arora, Suhani Shrivastava

CODE PDF
[click on the image to enlarge]

This project addresses the critical need for a comprehensive interview analysis tool to provide constructive feedback on various facets of performance. Leveraging the combination of a dataset of 138 mock interviews from MIT and several self-collected videos, our approach combines lexical, prosodic, and facial features for a nuanced evaluation. Previous studies have explored similar domains, yet our differentiating factor lies in deploying multiple models to predict diverse labels, including but not limited to friendliness, excitement, engagement, structured answers and calmness.

Data collection involved acoustic analysis with PRAAT for prosodic features, pre-trained CNNs for facial emotion detection, and linguistic inquiry word count for lexical features. A random forest regression was applied to the combined feature set, and the predicted values were fed into a classifier for final label classification. Additionally, an LSTM algorithm was employed on the transcript for specific labels, and a CNN+LSTM model analyzed video data for other classifications. The selection of these models was the outcome of meticulous research, involving a thorough comparison of their performance against various alternatives.

Initial results showcase high ROC AUC scores (>0.8) for predicted labels, demonstrating the efficacy of our multi-modal approach. The importance of these results lies in offering holistic feedback to interviewees through a combination of different models, enhancing the tool's accuracy.


Providing Fashion Recommendations Based on Personalized Characteristics and Fashionable Predilections

Arnab Dey, Chandan Yadav, Rishi Sharma

CODE PDF
[click on the image to enlarge]

Diverse body shapes play a pivotal role in clothing recommendations, yet existing methods often adopt a generic approach, overlooking the nuances of individual body types. Inspired by Facebook AI Research's "ViBE: Dressing for Diverse Body Shapes," our project takes on the task of implementing some parts of the paper. ViBE captures clothing's affinity with different body shapes, addressing the limitations of body-agnostic vision methods. We used our limited dataset to create embeddings from four different objects. The lack of SMPLify examples barred our further progress. The dataset is representative of diverse body shapes but lacks in varying quantifiable parameters for clothing.

Methodologically, we leverage ViBE's principles, adapting them to our context. We learn the embedding from a smaller online catalog showcasing diverse body shapes, providing positive pairings of models and clothing. Our approach considers the complexities of clothing selection, prioritizing fit as a crucial factor in boosting wearer confidence and comfort. Through the exploration of ViBE's strategies, we aim to redefine the fashion recommendation landscape, promoting personalized and body-aware suggestions for users with diverse body shapes. This project extends beyond conventional body-agnostic methods and acts as a steppingstone for people working in fashion recommendation. Thus, offering a promising avenue for reshaping the future of inclusive and personalized e-commerce experiences.


Quantifying Classroom Learning Using a Facial Expression-Based Attention Model

Arshia Sangwan, Rahath Malladi

CODE PDF
[click on the image to enlarge]

With technology seeping in and revolutionizing every sector, this project aims to leverage tech to support educators, by providing an approximate analysis of the Attention Levels of the students at various time intervals, based on their facial expressions only. Thus, bridging a significant gap in real-time classroom assessment. Existing research is primarily focused on basic emotion detection using conventional methods like SVMs and CNNs only, lacking specificity and real-time applicability for classroom engagement. Distinctively, our project combines OPEN FACE for precise facial action unit recognition and deep CNNs to classify attention-specific expressions, a unique approach to attention quantification.

The model is trained on well-known datasets like FER 2013, MMI & UIBVFED (all sourced ethically). Further, SMOTE is applied for equitable data across different emotions and dimensionality reduction mechanisms were employed to shorten the entire feature space. For testing, we collected on-ground, in-person classroom videos of students at Plaksha University. Due consent was obtained and all privacy regulations were followed. The data was annotated by responses obtained from the students through a short questionnaire shared after each class.

The solution pipeline integrates OPENFACE for swift and accurate production of Facial Action Units in classrooms, while the 16-layered deep CNN deciphers these expressions to identify the conclusive emotions, which finally are translated into attention levels. Thus, the entire model is robust in all its aspects and is built for the better good of the world!


Question and Answer Assistant for Plaksha's Website

Mangesh Singh, Siddharth Sahu, Tushar Garg

CODE PDF
[click on the image to enlarge]

This project develops a Q&A assistant for Plaksha University's website, employing the Retrieval-Augmented Generation (RAG) architecture to swiftly and precisely address user inquiries. The assistant ensures accuracy and current responses by leveraging knowledge store. This approach adds transparency, revealing the sources used, thereby enhancing user trust and credibility.

The initiative aims to provide instantaneous, specific information, eliminating the need for extensive website navigation, optimizing time utilization. Proficient in repetitive tasks, the assistant efficiently contributes to data collection.

The development involves systematic scraping of the university's website, organizing data into topic-specific chunks for precise retrieval. Utilizing embeddings from Google's PALM and another BGE-en-large by Beijing Academy of Artificial Intelligence, these chunks are transformed into vectors, capturing semantic meaning. Semantic search, employing vector similarity matching, extracts pertinent information.

Exploring diverse retrieval strategies, including self-querying and hybrid systems, the project evaluates performance using ROUGE-1 scores, assessing response quality against 'gold responses,' and also use semantic similarity. This approach ensures a streamlined, transparent, and efficient Q&A system, advancing user interaction and data retrieval.


Smart Whatsapp Chatbot for timely Campus Problems Redressal

Asmi Gulati, Bhuvi Jain, Tanya Sravan

CODE PDF
[click on the image to enlarge]

As a response to the challenges faced during our university's expansion and lack of defined channels for issue resolution. This system categorizes and guides student-reported issues towards resolution. Based on several studies on CNNs, RNNs, and Transformers in this domain, we employed a Transformer Encoder Classifier for its proficiency in processing sequential data and contextual understanding. This approach benefits from the encoder's ability to model long-range dependencies in text and process sequences concurrently. Our methodology involves a two-phase problem-solving approach: detecting texts related to campus issues and then enabling the chatbot to respond with domain-specific solutions and reporting channels based on a knowledge base and conversation context.

Data collection involved extracting and preprocessing text from university chat logs, followed by labeling using Google's generative AI package to discern campus-related problems. Our Transformer Encoder Classifier, a deep neural network with six layers, utilizes multi-head attention and an embedding dimension of 300. This setup, including a feed-forward network with a hidden layer size of 50 and dropout regularization, aims to reduce overfitting while recognizing complex patterns.

Our results demonstrate an accuracy of 85.4% in identifying relevant campus issues, highlighting the chatbot's potential in automated issue detection and classification. This project stands to enhance the responsiveness of campus authorities and improve the overall student experience.


Social Media Simulator to Optimise Digital Marketing Campaigns: An Integrated Approach Using ML and Large Language Models

Devesh Shah, Vedika Agarwal, Yash Shrivastav

CODE PDF
[click on the image to enlarge]

In our project, we introduce a dynamic Social Media Simulator for organizations like Plaksha University to predict responses to posts. Unlike traditional methods using limited real user data or exclusive Large Language Model-based artificial users, our approach combines scraped tweets for authentic user personas. These personas form the basis for a diverse user network created using LLM techniques.

Our methodology uses both LSTM and random forest classifiers to predict user actions—likes, and replies—based on personas, enhancing predictive accuracy. The dataset, compiled by scraping tweets from Twitter, undergoes NLP pre-processing, including tokenization, stemming, and stop word removal. User personas are crafted based on average sentiments, follower statistics, common interaction topics, and profile descriptions.

The classifiers are trained to predict actions, while an LLM predicts potential comments, analyzing their sentiments. This dynamic, data-driven approach advances simulating and comprehending social media interactions. Currently, our classifiers boast 85% and 89% accuracies in predicting user responses, showcasing our methodology's effectiveness. This accuracy demonstrates how publicly available user data can precisely predict user actions and potential comments, empowering companies to assess and optimize their social media presence.


Utilizing Computer Vision for Building a Basketball Shooting Pose Classifier

Arvind Kumar, Eshwar SK, Vikas Kumar

CODE PDF
[click on the image to enlarge]

Biomechanical inconsistencies in basketball shooting forms contribute to erratic results among players, both novice and seasoned. Addressing this issue, we propose a Computer Vision (CV)-based pose classifier to analyze and enhance shooting techniques. Drawing inspiration from previous works on sports analytics and computer vision, we exclusively leverage computer vision and machine learning to categorize and offer personalized recommendations for refining shooting styles. Existing models often focus on shot quality without a specific biomechanical diagnosis. Our approach employs MediaPipe Pose Detection, a computer vision library, to precisely locate key body points during a basketball shot. Key features, such as shooting hand angle, release point height, alignment of shoulders, hips, and feet, elevation, balance, and elbow position, are extracted for comprehensive analysis. The dataset is normalized, and features are not reduced, as all identified components are deemed equally crucial to shooting motion. We're actively implementing algorithms like K-means Clustering for comparative analysis, and CNNs for image-based classification. We're also exploring RNNs and LSTMs for sequential data. Our approach combines proven methods with ongoing experimentation for improved predictive models. In our basketball shot classification experiments: SVM achieves 0.73 accuracy, 0.76 precision, and an F1 score of 0.85; Random Forest attains 0.85 accuracy, 0.86 precision, and an F1 score of 0.90. This research contributes to advancing the understanding and enhancement of basketball shooting techniques through a comprehensive biomechanical analysis facilitated by computer vision and machine learning methodologies.


Utilizing Multi-Modal Cues for Detection of Mental Depression

Aanya Patil, Sanidhya Singh, Sarthak Sachdev

CODE PDF
[click on the image to enlarge]

The prevalence of depression as a widespread societal concern necessitates advanced approaches for effective diagnosis and treatment. Previous research primarily utilized deep learning to detect depression, focusing on facial expressions or voice. In this study, we address the critical need for reliable and efficient depression assessment by developing a deep learning model that integrates text, video, and voice signals to enhance the accuracy of depression detection beyond existing methodologies.

The DIAC-WOZ dataset, which includes audio and video recordings and questionnaire responses, was acquired from the University of South California by requesting access.For audio data, the COVAREP toolbox extracts features such as fundamental frequency, voice quality metrics like NAQ and QOQ, and spectral features including MCEP and HMPDM. Librosa was also used for manual audio feature extraction attempts. These algorithms are chosen for their effectiveness in capturing key audio characteristics. The textual data underwent stop-words removal to discard unnecessary words and vectorisation for being appropriately fed into the neural network. For video data, the Constrained Local Neural Fields framework tracks and extracts facial landmarks, action units, head pose, and gaze direction. SVM (RBF kernel), Random Forest, XGBoost and CatBoost were applied to the three modalities separately, followed by best weight calculation to arrive at a final, mixed consolidated model consisting of Random Forest and XG Boost. These algorithms were employed due to their effectiveness in capturing complex decision boundaries in high-dimensional feature spaces.

Via this approach, the final model’s Accuracy is 0.77, Precision is 0.82, Recall is 0.85, and F1-Score is 0.84. We see that the model performs reasonably in this case. LSTM with Gating (sentence-level) resulted in an f1 score of 0.68 with an accuracy of 0.66 for our use case. The metrics obtained by this are not good enough to be trustworthy in a real-world scenario deployment yet.


Utilizing Smartphone Sensor Data for Student's Life Analysis

Nishant Mahajan, Prashant Tiwari, Vansh Gupta

CODE PDF
[click on the image to enlarge]

This project addresses the need to predict day-to-day activities using smartphone sensor data, including accelerometer, and gyroscope, while also assessing individual discipline, consistency, and time management. Existing studies predominantly focused on forecasting physical activities and overlooked these nuanced behavioural aspects. Differing from prior research, this endeavour aims to predict diverse activities and infer individual attributes such as discipline, consistency, dedication, and activity-specific time allocation through sensor data analysis. Data from 20 students, encompassing walking, running, and relaxation, underwent preprocessing for noise reduction and feature extraction, complemented by a Kaggle dataset for activity prediction. We used various models like logistic regression, LSTM, and XG-Boost which have different accuracy, and achieved the best accuracy in the XG-Boost model with 94% accuracy when tested on the dataset. This project highlights the potential of utilizing sensor data to discern not only activities but also crucial behavioral attributes, holding profound implications for understanding and analyzing human behavior and time management through smartphone sensors.


WattFlow

Rachit Gupta, Samhitha Samishetty, Shreyas Kannan

CODE PDF
[click on the image to enlarge]

This project aims to optimize energy consumption in STPs, a critical operational metric affecting costs and environmental impact, aligning with SDG goals of sustainable infrastructure through advanced ML forecasting methods. Recent academic enquiries in STP energy forecasting have employed many machine learning techniques focusing on features like Biological Oxygen demand, Chemical Oxygen Demand, etc to enhance predictive accuracy, operational efficiency, and environmental sustainability, as evidenced by performance metrics such as R² and RMSE.

However, very little literature is currently available for energy optimization studies in Indian WWTPs and STPs. Through this study, the authors would like to pioneer a ML approach for sustainable operation of STPs, specific to the Indian ecosystem where instances of up to 72% of discharged untreated sewage have been reported. The data from Plaksha University's STP, procured from Projects team was originally recorded in logbooks but were photographed and manually converted into usable digital data. The ML component employed Random Forest and many other methods ideal for their robust handling of mixed data types and efficiency with smaller datasets. Model performance is evaluated with combinations of ensemble methods to screen the methods with the highest relative accuracy on the testing data.


YogaPal

Mayank Mor, Priyansh Desai, Rushiraj Gadhvi

CODE PDF
[click on the image to enlarge]

The project addresses the prevalent challenges of incorrect yoga postures due to a lack of accessible guidance, leading to sub-optimal results amidst busy schedules. Prior studies focused on classifying yoga poses or assessing user form quality, but failed to integrate both features and suggest corrections. This project distinguishes itself by not only classifying poses but also offering corrective suggestions, a unique approach unseen in previous studies. Data collection involved utilizing OpenSource Datasets and capturing in-house data through a Raspberry Pi with 4 cameras, enhancing model robustness from diverse visual perspectives. Pre-processing involved landmark selection, normalization, and visibility filtering, removing unnecessary landmarks and ensuring pose invariance to size and distance. Feature reduction using PCA was also implemented. The choice of these algorithms stemmed from their ability to refine pose landmarks, normalize coordinates, and enhance model reliability. The ML component utilized a 3-layer model. The results demonstrated promising precision, recall, and F1-scores across various yoga poses, showcasing the model's efficacy in identifying and correcting postures. These metrics signify the solution's potential in addressing the identified need by empowering users to perform yoga effectively without formal instruction, ensuring correct poses, and improving outcomes amidst busy lifestyles.