CSE 470 C Machine Learning (3 credits)

Offered Fall 2019
instructor: Philippe Giabbanelli

Catalog description

This course introduces the process, methods, and computing tools fundamental to machine learning. Students will work on large real-world datasets to accomplish tasks such as predicting outcomes, discovering associations, and identifying similar groups. Students will complete a term project showcasing the different steps of the machine learning process, from data cleaning to the extraction of accurate models and the visualization of results.

Prerequisites:

CSE 274 and MTH 231, or permission of instructor

Required topics (approximate weeks allocated):

  • Introduction to the course, including logistics and syllabus (0.5)
  • Setting-up technologies used in this course, including Anaconda and Jupyter notebooks (0.5)
  • Principles of Python programming in a professional environment (2)
    • Control flows in Python (loops, functions, conditionals)
    • Handling of data through files or in-memory structures (lists, associative arrays)
    • Use of the Pandas library for mapping and filtering
  • Essentials of data cleaning and transformation (2.5)
    • Detecting outliers
    • Filling in missing values
    • Feature engineering
    • Dimensionality reduction
    • Data balancing
  • Overview of key machine learning tasks, e.g. classification, clustering. (0.5)
  • Standard techniques for classification (3)
    • Decision trees
    • Support vector machines
    • Ensemble learning (e.g. random forests)
  • Overview of possible course projects (0.5)
  • Unsupervised learning (3)
    • Artificial neural networks on TensorFlow
    • Clustering
  • Visualizing machine learning models(1.5)
    • Principles of scientific visualization applied to machine learning
    • Programming visualizations within a machine learning workflow

Learning Outcomes:

Students will be able to...

  1. Describe how to create accurate and generalizable models from large and messy datasets.
    1. Students can describe how to employ methods for classification or clustering, and
    2. Students can identify which specific implementations for these methods would be most suitable given the characteristics of a dataset.
  2. Implement Python code to clean data and derive a model using an appropriate machine learning algorithm.
    1. Students can use modern programming environments (e.g., Jupyter Notebooks, scikit-learn, TensorFlow) to create end-to-end machine learning applications that clean the data and extract models.
    2. Students cna describe key concepts in data cleaning, such as dealing with outliers, missing values, or selecting features.
  3. Present solutions to stakeholders using visualizations and professional machine learning workflows.
    1. Students can describe how to visualize a machine learning model, using principles from scientific visualizations.
    2. Students can use modern programming environments (e.g., seaborn, matplotlib) to produce visualizations that stakeholders can use for decision-making.