Course Name: Scalable Data Science

Course abstract

One is interested in computing summary statistics (word count distributions) for a set of words which occur in the same document in entire Wikipedia collection (5 million documents). Naive techniques, will run out of main memory on most computers One needs to train an SVM classifier for text categorization, with unigram features for hundreds of classes. One would run out of main memory, if they store uncompressed model parameters in main memory One is interested in learning either a supervised model or find unsupervised patterns, but the data is distributed over multiple machines. Communication being the bottleneck, naïve methods to adapt existing algorithms to such a distributed setting might perform extremely poorly. In all the above situations, a simple data mining/machine learning task has been made more complicated due to large scale of input data, output results or both. In this course, we discuss algorithmic techniques as well as software paradigms which allow one to develop scalable algorithms and systems for the common data science tasks


Course Instructor

Media Object

Prof. Anirban Dasgupta

Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016).                                                                           
More info
Media Object

Prof. Sourangshu Bhattacharya

Sourangshu Bhattacharya is an Assistant Professor in the Department of Computer Science and Engineering, IIT Kharagpur. He was a Scientist at Yahoo! Labs from 2008 to 2013, where he was working on prediction of Click-through rates, Ad-targeting to customers, etc on the Rightmedia display ads exchange. He was a visiting scholar at the Helsinki University of Technology from January - May 2008. He received the B.Tech. in Civil Engineering from I.I.T. Roorkee in 2001, M.Tech. in computer science from I.S.I. Kolkata in 2003, and Ph.D. in Computer Science from the Department of Computer Science & Automation, IISc Bangalore in 2008. He has many publications in top conferences and journals, including ICML, NIPS, WWW, ICDM, CIKM, etc. His current research interests include modeling influence in social networks, distributed machine learning, and representation learning.
More info

Teaching Assistant(s)

No teaching assistant data available for this course yet
 Course Duration : Jul-Sep 2021

  View Course

 Syllabus

 Enrollment : 20-May-2021 to 02-Aug-2021

 Exam registration : 17-Jun-2021 to 20-Aug-2021

 Exam Date : 26-Sep-2021

Enrolled

1753

Registered

42

Certificate Eligible

16

Certified Category Count

Gold

0

Silver

1

Elite

3

Successfully completed

12

Participation

17

Success

Elite

Silver

Gold





Legend

AVERAGE ASSIGNMENT SCORE >=10/25 AND EXAM SCORE >= 30/75 AND FINAL SCORE >=40
BASED ON THE FINAL SCORE, Certificate criteria will be as below:
>=90 - Elite + Gold
75-89 -Elite + Silver
>=60 - Elite
40-59 - Successfully Completed

Final Score Calculation Logic

  • Assignment Score = Average of best 6 out of 8 assignments.
  • Final Score(Score on Certificate)= 75% of Exam Score + 25% of Assignment Score
    Note:We have taken best assignment score from both July 2020 and July 2021 course
Scalable Data Science - Toppers list

SOHOM CHAKRABORTY 76%

DIPESH TANDEL 68%

INDIAN INSTITUTE OF TECHNOLOGY,MADRAS

Enrollment Statistics

Total Enrollment: 1753

Registration Statistics

Total Registration : 42

Assignment Statistics




Assignment

Exam score

Final score

Score Distribution Graph - Legend

Assignment Score: Distribution of average scores garnered by students per assignment.
Exam Score : Distribution of the final exam score of students.
Final Score : Distribution of the combined score of assignments and final exam, based on the score logic.