NOC | Scalable Data Science

Course abstract

One is interested in computing summary statistics (word count distributions) for a set of words which occur in the same document in entire Wikipedia collection (5 million documents). Naive techniques, will run out of main memory on most computers One needs to train an SVM classifier for text categorization, with unigram features for hundreds of classes. One would run out of main memory, if they store uncompressed model parameters in main memory One is interested in learning either a supervised model or find unsupervised patterns, but the data is distributed over multiple machines. Communication being the bottleneck, naÃ¯ve methods to adapt existing algorithms to such a distributed setting might perform extremely poorly. In all the above situations, a simple data mining/machine learning task has been made more complicated due to large scale of input data, output results or both. In this course, we discuss algorithmic techniques as well as software paradigms which allow one to develop scalable algorithms and systems for the common data science tasks

Course Instructor

Prof. Anirban Dasgupta

Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016).Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â
More info

Prof. Sourangshu Bhattacharya

Sourangshu Bhattacharya is an Assistant Professor in the Department of Computer Science and Engineering, IIT Kharagpur. He was a Scientist at Yahoo! Labs from 2008 to 2013, where he was working on prediction of Click-through rates, Ad-targeting to customers, etc on the Rightmedia display ads exchange. He was a visiting scholar at the Helsinki University of Technology from January - May 2008. He received the B.Tech. in Civil Engineering from I.I.T. Roorkee in 2001, M.Tech. in computer science from I.S.I. Kolkata in 2003, and Ph.D. in Computer Science from the Department of Computer Science & Automation, IISc Bangalore in 2008. He has many publications in top conferences and journals, including ICML, NIPS, WWW, ICDM, CIKM, etc. His current research interests include modeling influence in social networks, distributed machine learning, and representation learning.
More info

Teaching Assistant(s)

No teaching assistant data available for this course yet

Course Duration : Sep-Nov 2020

View Course

Enrollment : 20-May-2020 to 21-Sep-2020

Exam registration : 14-Sep-2020 to 02-Nov-2020

Exam Date : 18-Dec-2020

Enrolled

2702

Registered

Certificate Eligible

Certified Category Count

Gold

Silver

Elite

Successfully completed

Participation

Success

Elite

Silver

Gold

Legend

AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75 AND FINAL SCORE >=40
BASED ON THE FINAL SCORE, Certificate criteria will be as below:
>=90 - Elite + Gold
75-89 -Elite + Silver
>=60 - Elite
40-59 - Successfully Completed

Final Score Calculation Logic

Assignment Score = Average of best 6 out of 8 assignments.
Final Score(Score on Certificate)= 75% of Exam Score + 25% of Assignment Score

Scalable Data Science - Toppers list

SOHOM CHAKRABORTY 63%

M AKASH KUMAR 63%

Indian Institute of Technology,Madras

BREYOLIN A S FEBI 63%

MEPCO SCHLENK ENGINEERING COLLEGE

Enrollment Statistics

Total Enrollment: 2702

Registration Statistics

Total Registration : 26

Assignment Statistics

Assignment

Exam score

Final score

Course Name: Scalable Data Science

Course abstract

Course Instructor

Prof. Anirban Dasgupta

Prof. Sourangshu Bhattacharya

Teaching Assistant(s)

Course Duration : Sep-Nov 2020

View Course

Enrollment : 20-May-2020 to 21-Sep-2020

Exam registration : 14-Sep-2020 to 02-Nov-2020

Exam Date : 18-Dec-2020

Enrolled

Registered

Certificate Eligible

Certified Category Count

Gold

Silver

Elite

Successfully completed

Participation

Success

Elite

Silver

Gold

Legend

Final Score Calculation Logic

Enrollment Statistics

Total Enrollment: 2702

Registration Statistics

Total Registration : 26

Assignment Statistics

Score Distribution Graph - Legend

Assignment Score: Distribution of average scores garnered by students per assignment.

Exam Score : Distribution of the final exam score of students.

Final Score : Distribution of the combined score of assignments and final exam, based on the score logic.