NOC | Scalable Data Science

Course abstract

One is interested in computing summary statistics (word count distributions) for a set of words which occur in the same document in entire Wikipedia collection (5 million documents). Naive techniques, will run out of main memory on most computers One needs to train an SVM classifier for text categorization, with unigram features for hundreds of classes. One would run out of main memory, if they store uncompressed model parameters in main memory One is interested in learning either a supervised model or find unsupervised patterns, but the data is distributed over multiple machines. Communication being the bottleneck, naÃ¯ve methods to adapt existing algorithms to such a distributed setting might perform extremely poorly. In all the above situations, a simple data mining/machine learning task has been made more complicated due to large scale of input data, output results or both. In this course, we discuss algorithmic techniques as well as software paradigms which allow one to develop scalable algorithms and systems for the common data science tasks

Course Instructor

Anirban Dasgupta

Anirban Dasgupta is currently an Associate Professor of Computer Science & Engineering at IIT Gandhinagar. Prior to this, he was a Senior Scientist at Yahoo! Labs Sunnyvale. Anirban works on algorithmic problems for massive data sets, large scale machine learning, analysis of large social networks and randomized algorithms in general. He did his undergraduate studies at IIT Kharagpur and doctoral studies at Cornell University. He has also received the Google Faculty Research Award (2015), the Cisco University grant (2016), and the ICDT Best Newcomer Award (2016).
More info

Sourangshu Bhattacharya

Sourangshu Bhattacharya is an Assistant Professor in the Department of Computer Science and Engineering, IIT Kharagpur. He was a Scientist at Yahoo! Labs from 2008 to 2013, where he was working on prediction of Click-through rates, Ad-targeting to customers, etc on the Rightmedia display ads exchange. He was a visiting scholar at the Helsinki University of Technology from January - May 2008. He received the B.Tech. in Civil Engineering from I.I.T. Roorkee in 2001, M.Tech. in computer science from I.S.I. Kolkata in 2003, and Ph.D. in Computer Science from the Department of Computer Science & Automation, IISc Bangalore in 2008. He has many publications in top conferences and journals, including ICML, NIPS, WWW, ICDM, CIKM, etc. His current research interests include modeling influence in social networks, distributed machine learning, and representation learning.
More info

Teaching Assistant(s)

Soumi Das

M.Sc Computer Science, Banaras Hindu University

IITKGP

Jayesh Choudhari

PhD Computer Science

IITKGP

Course Duration : Aug-Sep 2018

View Course

Enrollment : 18-Apr-2018 to 06-Aug-2018

Exam registration : 25-Jun-2018 to 28-Aug-2018

Exam Date : 07-Oct-2018

Enrolled

5266

Registered

311

Certificate Eligible

166

Certified Category Count

Gold

Silver

Elite

Successfully completed

118

Participation

Success

Elite

Gold

Legend

>=90 - Elite + Gold
60-89 - Elite
40-59 - Successfully Completed
<40 - No Certificate

Final Score Calculation Logic

Assignment Score = Average of best 6 out of 8 assignments.
Final Score(Score on Certificate)= 75% of Exam Score + 25% of Assignment Score.

Scalable Data Science - Toppers list

Top 1 % of Certified Candidates

PARTHKUMAR BHANUBHAI TRIVEDI 94%

GOVERNMENT ENGINEERING COLLEGE, BHAVNAGAR

DIPJYOTI BISHARAD 85%

NOKIA

Top 2 % of Certified Candidates

KURMA SAI SREE BHARGAV 84%

INDIAN INSTITUTE OF TECHNOLOGY TIRUPATI

Top 5 % of Certified Candidates

PARTH PATWA 81%

INDIAN INSTITUTE OF INFORMATION TECHNOLOGY, SRI CITY

MIHIR SHAH 77%

DHIRUBHAI AMBANI INSTITUTE OF INFORMATION AND COMMUNICATION TECHNOLOGY

NANDURI NAGA RAMANA SAI SRI LAKSHMAN TARUN 77%

PRAGATI ENGINEERING COLLEGE

USHARANI 77%

KL UNIVERSITY

MAHIMA RAO 77%

M S RAMAIAH INSTITUTE OF TECHNOLOGY

Enrollment Statistics

Total Enrollment: -1

Data Not Found..!

Registration Statistics

Total Registration : 311

Assignment Statistics

Feedback Videos

Assignment

Exam score

Final score

Course Name: Scalable Data Science

Course abstract

Course Instructor

Anirban Dasgupta

Sourangshu Bhattacharya

Teaching Assistant(s)

Course Duration : Aug-Sep 2018

View Course

Enrollment : 18-Apr-2018 to 06-Aug-2018

Exam registration : 25-Jun-2018 to 28-Aug-2018

Exam Date : 07-Oct-2018

Enrolled

Registered

Certificate Eligible

Certified Category Count

Gold

Silver

Elite

Successfully completed

Participation

Success

Elite

Gold

Legend

Final Score Calculation Logic

Enrollment Statistics

Total Enrollment: -1

Registration Statistics

Total Registration : 311

Assignment Statistics

Feedback Videos

Score Distribution Graph - Legend

Assignment Score: Distribution of average scores garnered by students per assignment.

Exam Score : Distribution of the final exam score of students.

Final Score : Distribution of the combined score of assignments and final exam, based on the score logic.