CS 4622/5622 – Applied Data Science with Python¶
Course Syllabus¶
Course Description¶
Textbooks¶
There are no required textbooks for this course.
Learning Outcomes¶
Upon the completion of the course, the students should demonstrate the ability to:
Attain proficiency with commonly used Python frameworks for managing the life cycle of Data Science projects.
Develop pipelines for integrating data from multiple sources, designing predictive models, and deploying the models.
Apply Python tools for data collection, analysis, and visualization, such as NumPy, Pandas, Matplotlib, and Seaborn, to real-world datasets.
Implement machine learning algorithms for image processing, natural language processing, and time series analysis using Python-based frameworks, such as Scikit-Learn, Keras, TensorFlow, and PyTorch.
Understand the principles of model selection and evaluation, including hyperparameter tuning, cross-validation, and regularization.
Design and implement advanced AI systems using large language models, vision-language models, and agentic AI integration.
Prerequisites¶
The course requires basic programming skills in Python. Prior knowledge of data science methods is beneficial but not required.
Grading¶
Student assessment will be based on 6 homework assignments (worth 45 pts), 6 quizzes (worth 45 marks), and class participation and engagement (worth 10 marks).
Lectures¶
Lecture 1 - Short History of AI
Theme 1 - Python Programming
Theme 2 - Data Engineering Pipelines
- Lecture 6 - NumPy for Array Operations
- Lecture 7 - Data Manipulation with pandas
- 7.1 Introduction to
pandas - 7.2 Importing Data and Summary Statistics
- 7.3 Rename, Index, and Slice
- 7.4 Creating New Columns, Reordering
- 7.5 Removing Columns and Rows
- 7.6 Merging DataFrames
- 7.7 Calculating Unique and Missing Values
- 7.8 Dealing With Missing Values: Boolean Indexing
- 7.9 Exporting A DataFrame to csv
- References
- 7.1 Introduction to
- Lecture 8 - Data Visualization with Matplotlib
- Lecture 9 - Data Visualization with Seaborn
- Lecture 10 - Databases and SQL
- 10.1 Introduction to SQL
- 10.2 Using SQLite with Python
- 10.3 Create a New Table
- 10.4 Database Example
- 10.5 Querying Databases with SELECT
- 10.6 Sorting Data with ORDER BY
- 10.7 Filtering Data
- 10.8 Conditional Expressions
- 10.9 Joining Multiple Tables
- 10.10 Return Data Statistics
- 10.11 Grouping Data
- 10.12 Modifying Data
- 10.13 Working with Tables
- 10.14 Constraints
- 10.15 Subqueries
- 10.16 Connect to an Existing Database
- References
- Lecture 11 - Data Exploration and Preprocessing
Theme 3 - Model Engineering Pipelines
- Lecture 12 - Scikit-Learn Library for Data Science
- 12.1 Introduction to Scikit-Learn
- 12.2 Supervised Learning: Classification
- 12.3 Supervised Learning: Regression
- 12.4 Unsupervised Learning: Clustering
- 12.5 Hyperparameter Tuning
- 12.6 Cross-Validation
- 12.7 Performance Metrics
- 12.8 Model Pipelines
- 12.9 Flow Chart: How to Choose an Estimator
- Appendix
- References
- Lecture 13 - Ensemble Methods
- Lecture 14 - Artificial Neural Networks with Keras-TensorFlow
- Lecture 15 - Convolutional Neural Networks with Keras-TensorFlow
- Lecture 16 - Model Selection, Hyperparameter Tuning
- Lecture 17 - Artificial Neural Networks with PyTorch
- Lecture 18 - Natural Language Processing
- Lecture 19 - Transformer Networks
- Lecture 20 - NLP with Hugging Face
- Lecture 21 - Large Language Models
- 21.1 Introduction to LLMs
- 21.2 Creating LLMs
- 21.3 Finetuning LLMs
- 21.4 Finetuning Example: Finetuning LlaMA-2 7B
- 21.5 Chat Templates for Formatting LLM Data
- 21.6 LLM Evaluation
- 21.7 Prompt Engineering
- 21.8 Foundation Models
- 21.9 Limitations and Ethical Considerations of LLMs
- Appendix: Unsloth Library for LLM Training and Inference
- References
- Lecture 22 - Large Language Models (Part 2)
- Lecture 23 - Reasoning Models
- Lecture 24 - Agentic AI
Theme 4 - Model Deployment Pipelines
Tutorials
- Tutorial 1 - Working with Jupyter Notebooks
- Tutorial 2 - Python IDEs, VS Code
- Tutorial 3 - Terminal and Command Line
- Tutorial 4 - Virtual Environments
- Tutorial 5 - Google Colab
- Tutorial 6 - Image Processing with Python
- Tutorial 7 - TensorFlow, TensorFlow DataSets
- Tutorial 8 - PyTorch
- Tutorial 9 - GitHub
- Tutorial 10 - Docker Containers