Machine Learning Concepts
- Apply AWS ML to problems you have existing samples of actual answers
- For example, to predict if new email is spam or not, you need to collect examples of spam and non-spam.
- Binary classification (true / false)
- Is spam or not spam, churn, will customer accept campaign?
- Multiclass classification (one of more than two outcomes)
- Regression (numeric number)
- Building a Machine Learning Application
- Frame the core ML problems
- Collect, clean and prepare data
- Features from raw data
- Feed to learning algorithm to build models
- Use the model to generate predictions for new data
Linear Models
- Leaning process computes one weight for each feature to form a model that can predict the target value
- For example, estimated target = 0.2 + 5 * age + 0.00003 * income
Learning Algorithm
- Learn the weights of the model
- Loss function: penalty when estimate target provide by the model not equal exact result
- Optimization technique: minimize the loss (Stochastic Gradient Descent), during each passes updates the feature weights one example at a time with the aim of approaching the optimal weight that minimize the loss.
- For binary classification, Amazon ML uses logistic regression (logistic loss function + SGD).
- For multiclass classification, Amazon ML uses multinomial logistic regression (multinomial logistic loss + SGD).
- For regression, Amazon ML uses linear regression (squared loss function + SGD)
Evaluate Model Accuracy
- 70% to build up model, 30% for evaluation
- Binary classification, 0.5 almost same use random guessing
Workshop
- Download samples from http://bit.ly/john-2017ml-labdata, create a S3 bucket and upload 3 csv files into that S3 bucket.
- churn_new.csv => create data source from s3 file link => create model => use custom receipt
- With 3334 records has column "State,Account Length,Area Code,Phone,Intl Plan, VMail Plan, VMail Message, Day Mins,Day Calls,Day Charge,Eve Mins,Eve Calls,Eve Charge,Night Mins,Night Calls,Night Charge,Intl Mins,Intl Calls,Intl Charge,CustServ Calls,Churn?", once you import them into AWS ML, you will automatically have a model used to predict a customer will leave or continue subscription.
- 70% of imported data will be used to build up model, 30% will be used to evaluate the accuracy of the model.
- banking.csv => create data source from s3 file link => create model => use default
- banking-batch.csv => create batch prediction from model above
Thoughts
- This 3 hour workshop is easy and help you have basic understanding how to use AWS Machine Learning service to automatically create Model, evaluate Model and call API for prediction.
- Prepare your data to CSV format and upload to S3, then rest of modeling part and evaluation result AWS will create for you.
- There are also other sources for you to import real production data such as RDS / RedShift ...etc...
- The visualization is easy for you to evaluate the model
- There are APIs for you to do prediction based on your created models.
- Batch prediction
- Real time prediction
- The hardest part is "How to prepare your data and feature from raw data?"
- The AWS Machine Learning document is worth to read! You can have basic understanding of Machine Learning concepts and how AWS did internally.
References
- AWS Machine Learning Concepts
- Tutorial: Using Amazon ML to Predict Responses to a Marketing Offer
- [slide share] Getting Started with Amazon Machine Learning
留言
張貼留言