I want to know more about how to deploy ML algorithms in a web app. And that prompted me to enroll in this course.
The course assumes certain prior foundation and focuses on how to use AWS Sagemaker.
Sagemaker is a development platform for ML practitioners.
Ease of deployment: it enables real-time inference and batch transformation
Variety: It offers both high- and low-level API as well as pre-built models on Marketplace. The SageMaker autopilot offers automatic build, train, and tune functionalities for tabular data.
Auto-scaling: The model runs on container clusters, enabling it to deliver high availability.
The environment is akin to ipynb with a somewhat different workflow and syntax. Data are often read in from S3 after serialization. One can also optimize the models by targeting certain recall, or specifying the imbalance treament.
One can either load a prebuilt model, or create a training object called Estimator. After training, one can deploy the model endpoint as a predictor and run inference on it. It's also important to delete the endpoint after training since they charge by the time in use.
The syllabus is very well structured. Here are the main topics covered and the related project built.
How data scientists can leverage DevOps principles. It has a section covering how to make a python package – including testing, logging, and uploading to PyPI. This came in handy later on when I built a clustering package with a statistician teammate. The package has the advantage of not having to pre-specifying the number of clusters over KMeans.
How to deploying models as API endpoint. The corresponding project I completed is a plagiarism detector. It was trained and deployed on Sagemaker, invoked via lambda and API gateway.
How to plan a project end-to-end. The program contains a capstone with a few recommended options and a customized option. One needs to write a proposal of project plan before proceeding, and then submit the project with a report. I find the rubric of the report detailed and insightful. The corresponding project I did is dog breed classifier with dataset containing 133 types of dogs. It could be a good starting point for other image-related applications. Taking a spin from the original Kaggle competition, if supplied an image of a human face as detected by openCV, it will also identify the resembling dog breed.
Overall I had a positive experience. After submitting the project, one can receive some quick and detailed feedback. There are forums where one can discuss with TA or help each other out.
What stood out is that they also provide the rare offering of provide Github and Linkedin review. I received a few helpful feedback on Github usage, including suggestions on writing commit message in the style of:
feat: a new feature
fix: a bug fix
docs: changes to documentation
style: formatting, missing semicolons, etc; no code change
refactor: refactoring production code
test: adding tests, refactoring test; no production code change
chore: updating build tasks, package manager configs, etc; no production code change
While I like the style, I think the it can be further expanded to the data science context. Common code chunks such as feature selection or model interpretation won't fit in to either feat or chore.
What’s unique about Udacity is that it is very pertinent to the industry and the materials are continuously updated. The videos are clearly explained, highly digestible, with extra readings and resources. And they involve significantly less mathematical derivation. All the projects are hands-on.
Usually university courses on machine learning don't cover deployment, so this is more like a mixture of a machine learning and a AWS cloud computing course with a small dose of software development.
Another difference is that its program typically has a product tie-in. The MLE program, for example, is tied-in with the commercial software from AWS. In classroom setting, on the other hand, courses tend to be more vendor-agnostic, and could cover AWS, GCP, Azure across the board.
These products, after all, are only means to an end. Tools change, techniques evolve, but thought process and knowledge stays. It is more important to understand the field on both theoretical and practical grounds than simply mastering the tools in the field. This might not necessarily be a downside, as long as one can transfer the learning onto other cloud platforms.
The program is for someone who already had a first course in ML and intermediate Python skills, and wants to learn deployment/production. I did it during the free month and appreciate the fact that they have such a program.
I find it helpful to understand the entire ML workflow, which may potentially expand the kind of personal projects I can do.