This class will involve a project which will serve several purposes. First, it will give you the opportunity to explore
in-depth the multidimensional (pun intended) facets of the Machine Learning and Pattern Recognition course.
Second, it will support the development of your critical thinking and hands-on application skills; in my opinion,
this is one of the primary goals of university education.
The students will form groups of two or three (Remember: One is a maverick, two is a pair, three is a team, and
four is a crowd) to undertake a project. The project must be inspired by a real-world problem (the instructor will
help the students to identify the problem) and which could be potentially solved by developing a machine learning
solution. The students are expected to work with the instructor in the first three weeks to identify a problem for
each group, the hardware and/or software resources that would be needed, and the methodologies that may be
required to work on it. Students are also encouraged to approach other faculty members to access data from their
research for this project.
Problem Statement
What is the problem you are trying to solve and why? What are the potential applications that will come out
of the developed solution? What will be the potential impact of the solution?
Literature Survey
A survey of what other research groups/labs/industry have done to solve the above problem. What were the
developed solutions that came out of those studies? How are you planning to take these solutions
further/fix shortcomings of these solutions/extend these solutions for your application or increase the performance?
Dataset and Features Preprocessing
-
If you collected the dataset yourself: what were the considerations you took into account doing data collection,
how was the data collected, were there any ethical concerns (if so, how did you address them), how many features
and data points are there in your dataset, etc.
-
If you did not collect the dataset yourself: what is the nature of the dataset and why did you choose that
particular dataset, how was the data collected by its authors, were there any ethical concerns (if so,
how did they address them), how many features and datapoints are there in that dataset, etc.
For both cases, how was the features pre-processing done by you? For e.g., were there any missing features,
how did you interpolate them (e.g., regression), did your dataset require you to do features dimensionality
reduction using PCA, LDA, etc., did you use any algorithm to see which features are more important than others
or if you will need to collect/procure more data to get more valuable features.
Please use lots of plots/other visualization techniques and quantifiable numbers information in this section
of your presentation. This will give the instructor an idea of the depth of your work until now and your thinking.
Proposed ML Methodology
How are you planning to use ML on your project between midterm and endterm? Which algorithms do you think would be
apt for tackling the problem statement of your project and why? Has any work been done already to this end (if so,
any preliminary results)? What are the possible challenges that you foresee in your project (hardware/software,
dataset availability, algorithmic complexity, etc.) and how do you think you will tackle them?
Note: All the members of a project group must be present during the group presentation which should be jointly delivered.
Additionally, your presentation during the midterm must not exceed 8 minutes and we will keep a maximum of 4 minutes for Q&A.
Endterm Project Evaluation Criteria
Problem Statement
What is the problem you are trying to solve and why? What are the potential applications that will come out of the developed
solution? What will be the potential impact of the solution?
Literature Survey
A survey of what other research groups/labs/Industry have done to solve the above problem. What were the developed solutions
that came out of those studies? How are you planning to take these solutions further/fix shortcomings of these solutions/extend
these solutions for your application or increase the performance?
Dataset and Features Preprocessing
-
If you collected the dataset yourself: what were the considerations you took into account doing data collection, how was
the data collected, were there any ethical concerns (if so, how did you address them), how many features and datapoints
are there in your dataset, etc.
-
If you did not collect the dataset yourself: what is the nature of the dataset and why did you choose that particular
dataset, how was the data collected by its authors, were there any ethical concerns (if so, how did they address them),
how many features and datapoints are there in that dataset, etc.
For both cases, how was the features pre-processing done by you? For e.g., were there any missing features, how did you
interpolate them (e.g., regression), did your dataset require you to do features dimensionality reduction using PCA,
LDA, etc., did you use any algorithm to see which features are more important than others or if you will need to
collect/procure more data to get more valuable features.
ML Methodology
Which ML methods did you use to work on the above problem statement and why? How do these models work? What were the
challenges that you faced in your project (hardware/software, dataset availability, algorithmic complexity, etc.) and
how you tackled them?
Performance Metrics and Deployability of the ML solution
What were the performance metrics and how much were they? How do these performance metrics show that your solution works?
Can the solution be deployed at Plaksha to solve the problem you have chosen? If so, how? What may some challenges be for
the deployed solution when it will scale up?
Note: All the members of a project group must be present during the group presentation which should be jointly delivered.
Additionally, your presentation during the midterm must not exceed 10 minutes and we will keep a maximum of 5 minutes for Q&A.
Please understand that since this is a group project, every member of the group is expected to know about every aspect of the
project when the jury will ask questions. Please do not say later that for e.g., “I was only in-charge of data analysis and
someone else was in-charge of literature survey.”