Intro to Machine Learning with Scikit-learn
Overview
This talk will give a tour of the scikit-learn project, an open source python platform for doing most types of modern machine learning. Scikit-learn is used in all corners of the machine learning world, from early prototypes to production models, both in academia and industry. This powerful tool kit has been carefully maintained to provide high quality, thoroughly documented code and a consistent developer experience. The web site, scikit-learn.org, provides a great wealth of introductory material and tutorials, development guides and code documentation, and links to primary research.
In this talk, we’ll cover the following topics:
Supervised Machine Learning
- Building a basic machine learning classifier
- Demonstrating how easy it is to try additional classifiers with minimal code changes
- A brief discussion of how to compare the efficacy of different classifiers
Unsupervised Machine Learning
- Running a basic data clustering algorithm
- Again demonstrating how easy it is to swap algorithms
- Comparing the results of different algorithms, and discussing how a practitioner might choose between them
Data Processing
- A brief overview of the data preprocessing tools available in the scikit-learn platform
Participants will gain the knowledge they need to begin their own course of study into practical machine learning, and will be introduced to helpful resources to lean on during the learning process. The code used in the talk will also be available on github after the talk.
Presenters
Hailey Buckingham, Cylance
Hailey is a Data Scientist at Cylance (Portland, OR office). She develops machine learning models for detecting malware and malicious process behavior, and specializes in automated data and ML pipelines and microservices. Much of her work is cross-functional and collaborative with non-data science teams.