CSE 470 C Machine Learning (3 credits)
Offered Fall 2019
instructor: Philippe Giabbanelli
Catalog description
This course introduces the process, methods, and computing tools fundamental to machine learning. Students will work on large real-world datasets to accomplish tasks such as predicting outcomes, discovering associations, and identifying similar groups. Students will complete a term project showcasing the different steps of the machine learning process, from data cleaning to the extraction of accurate models and the visualization of results.
Prerequisites:
CSE 274 and MTH 231, or permission of instructor
Required topics (approximate weeks allocated):
- Introduction to the course, including logistics and syllabus (0.5)
- Setting-up technologies used in this course, including Anaconda and Jupyter notebooks (0.5)
- Principles of Python programming in a professional environment (2)
- Control flows in Python (loops, functions, conditionals)
- Handling of data through files or in-memory structures (lists, associative arrays)
- Use of the Pandas library for mapping and filtering
- Essentials of data cleaning and transformation (2.5)
- Detecting outliers
- Filling in missing values
- Feature engineering
- Dimensionality reduction
- Data balancing
- Overview of key machine learning tasks, e.g. classification, clustering. (0.5)
- Standard techniques for classification (3)
- Decision trees
- Support vector machines
- Ensemble learning (e.g. random forests)
- Overview of possible course projects (0.5)
- Unsupervised learning (3)
- Artificial neural networks on TensorFlow
- Clustering
- Visualizing machine learning models(1.5)
- Principles of scientific visualization applied to machine learning
- Programming visualizations within a machine learning workflow
Learning Outcomes:
Students will be able to...
- Describe how to create accurate and generalizable models from large and messy datasets.
- Students can describe how to employ methods for classification or clustering, and
- Students can identify which specific implementations for these methods would be most suitable given the characteristics of a dataset.
- Implement Python code to clean data and derive a model using an appropriate machine learning algorithm.
- Students can use modern programming environments (e.g., Jupyter Notebooks, scikit-learn, TensorFlow) to create end-to-end machine learning applications that clean the data and extract models.
- Students cna describe key concepts in data cleaning, such as dealing with outliers, missing values, or selecting features.
- Present solutions to stakeholders using visualizations and professional machine learning workflows.
- Students can describe how to visualize a machine learning model, using principles from scientific visualizations.
- Students can use modern programming environments (e.g., seaborn, matplotlib) to produce visualizations that stakeholders can use for decision-making.