Big Data Algorithms and Data Structures

Instructor:
Hu Fu fuhu@mail.shufe.edu.cn
Office: 504 School of Information Management and Engineering

Teaching Assistant:
Qun Hu email: 2019212804@163.sufe.edu.cn

Lectures:
Friday 6-8:35pm, 703 Third Lecture Hall

Syllabus:
This course is a theoretical introduction to basic data structures and algorithms used to deal with data of large scales. Randomization plays a crucial role in these techniques, and a large of part of this course focuses on randomized algorithms and data structures. Topics include a review of discrete probability theory, hashing, search trees, concentration inequalities, skip lists, dimensionality reduction, and streaming algorithms.

Prerequisites:
Familiarity with basic data structures and algorithms will be assumed. Discrete probability theory will be used throughout the course; a quick review will be provided at the beginning of the course.

Texts:
There is no required textbook.
Supplementary readings are occasionally provided here. Optional readings will be marked as such.

Course Work:
Grades are determined by Problem sets/written assignments (40%) + Project (20%) + Final (40%). The final is a take-home exam. For the course project, students will form groups of up to 4 people and survey a topic related to big data data structures or algorithms. The instructor will suggest candidate topics, but students are encouraged to explore topics of their own interests. Each group will make an in-class presentation in the last lecture, and submit a survey. A survey can be in either Chinese or English, with an expected length of two pages. The projects will be evaluated based on the quality of presentation and the written survey.
Suggested Project Topics
Here are some keywords that may help you find topics for the final presentation. Search online for resources on big data algorithms, and you will see plenty other topics and resources. See e.g. a list of papers compiled here by Chandra Chekuri.