CS 4622/5622 – Applied Data Science with Python

Course Syllabus

View Syllabus

Course Description

The course introduces students to Python tools and libraries commonly used by organizations for managing the phases in the life cycle of Data Science projects. The content is divided into four main themes. The first theme reviews the fundamentals of Python programming. The second theme focuses on data engineering and explores Python tools for data collection, exploration, and visualization. The next theme covers model engineering and includes topics related to model design, selection, and evaluation for image processing, natural language processing, and time series analysis. This theme also introduces recent advances in large language models, multi-modal models, and agentic AI systems. The last theme focuses on Data Science Operations (DSOps) and encompasses techniques for model serving, performance monitoring, diagnosis, and reproducibility of data science projects deployed in production. Throughout the course, students will gain hands-on experience with various Python libraries for Data Science workflow management. Additional work is required for graduate credit.

Textbooks

There are no required textbooks for this course.

Learning Outcomes

Upon the completion of the course, the students should demonstrate the ability to:

  1. Attain proficiency with commonly used Python frameworks for managing the life cycle of Data Science projects.

  2. Develop pipelines for integrating data from multiple sources, designing predictive models, and deploying the models.

  3. Apply Python tools for data collection, analysis, and visualization, such as NumPy, Pandas, Matplotlib, and Seaborn, to real-world datasets.

  4. Implement machine learning algorithms for image processing, natural language processing, and time series analysis using Python-based frameworks, such as Scikit-Learn, Keras, TensorFlow, and PyTorch.

  5. Understand the principles of model selection and evaluation, including hyperparameter tuning, cross-validation, and regularization.

  6. Design and implement advanced AI systems using large language models, vision-language models, and agentic AI integration.

Prerequisites

The course requires basic programming skills in Python. Prior knowledge of data science methods is beneficial but not required.

Grading

Student assessment will be based on 6 homework assignments (worth 45 pts), 6 quizzes (worth 45 marks), and class participation and engagement (worth 10 marks).

Lectures

Theme 2 - Data Engineering Pipelines

Theme 3 - Model Engineering Pipelines