More about Rules of Machine Learning


Google recently released an extremely helpful guide of doing machine learning in industry - Rules of Machine Learning. However, the writing is bit messy; therefore, I spent some time understanding these rules. Here are some of my thoughts.


Rules of Machine Learning

Before Machine Learning

  • Rule 1: Don’t be afraid to launch a product without machine learning.
  • Rule 2: First, design and implement metrics.
  • Rule 3: Choose machine learning over a complex heuristic.

ML Phase I: Your First Pipeline

  • Rule 4: Keep the first model simple and get the infrastructure right.
  • Rule 5: Test the infrastructure independently from the machine learning.
  • Rule 6: Be careful about dropped data when copying pipelines.
  • Rule 7: Turn heuristics into features, or handle them externally.

Monitoring

  • Rule 8: Know the freshness requirements of your system.
  • Rule 9: Detect problems before exporting models.
  • Rule 10: Watch for silent failures.
  • Rule 11: Give feature column owners and documentation.

Your First Objective

  • Rule 12: Don’t overthink which objective you choose to directly optimize.
  • Rule 13: Choose a simple, observable and attributable metric for your first objective.
  • Rule 14: Starting with an interpretable model makes debugging easier.
  • Rule 15: Separate Spam Filtering and Quality Ranking in a Policy Layer.

ML Phase II: Feature Engineering

  • Rule 16: Plan to launch and iterate.
  • Rule 17: Start with directly observed and reported features as opposed to learned features.
  • Rule 18: Explore with features of content that generalize across contexts.
  • Rule 19: Use very specific features when you can.
  • Rule 20: Combine and modify existing features to create new features in human­-understandable ways.
  • Rule 21: The number of feature weights you can learn in a linear model is roughly proportional to the amount of data you have.
  • Rule 22: Clean up features you are no longer using.

Human Analysis of the System

  • Rule 23: You are not a typical end user.
  • Rule 24: Measure the delta between models.
  • Rule 25: When choosing models, utilitarian performance trumps predictive power.
  • Rule 26: Look for patterns in the measured errors, and create new features.
  • Rule 27: Try to quantify observed undesirable behavior.
  • Rule 28: Be aware that identical short­-term behavior does not imply identical long­term behavior.

Training­-Serving Skew

  • Rule 29: The best way to make sure that you train like you serve is to save the set of features used at serving time, and then pipe those features to a log to use them at training time.
  • Rule 30: Importance weight sampled data, don’t arbitrarily drop it!
  • Rule 31: Beware that if you join data from a table at training and serving time, the data in the table may change.
  • Rule 32: Re­use code between your training pipeline and your serving pipeline whenever possible.
  • Rule 33: If you produce a model based on the data until January 5th, test the model on the data from January 6th and after.
  • Rule 34: In binary classification for filtering, make small short-­term sacrifices in performance for very clean data.
  • Rule 35: Beware of the inherent skew in ranking problems.
  • Rule 36: Avoid feedback loops with positional features.
  • Rule 37: Measure Training/Serving Skew.

ML Phase III: Slowed Growth, Optimization Refinement, and Complex Models

  • Rule 38: Don’t waste time on new features if unaligned objectives have become the issue.
  • Rule 39: Launch decisions are a proxy for long ­term product goals.
  • Rule 40: Keep ensembles simple.
  • Rule 41: When performance plateaus, look for qualitatively new sources of information to add rather than refining existing signals.
  • Rule 42: Don’t expect diversity, personalization, or relevance to be as correlated with popularity as you think they are.
  • Rule 43: Your friends tend to be the same across different products. Your interests tend not to be.

My understandings

To be honest, these 43 rules are extremely helpful, especially for those junior machine learning engineers. Imagine if you need gain this first-hand experience from the projects you built, how many projects do you need involve? If one project takes one quarter (at minimum) to finish, then how many years do you need to gain this insightful experience? Even for some experienced machine learning engineers, these are also meaningful, because not every engineer would have so much exposure with machine learning projects in such scale or such frequency at Google.

Before Machine Learning

Rule 1

Don’t be afraid to launch a product without machine learning.

Machine learning is just a toolbox of stats algorithms, but there seems a trend that using machine learning is a must to every problem. This is not necessarily true. Following the Occam’s razor, machine learning approaches are not always the simplest approach to solve problems, and they also need much more effort to build, and they must be used based on certain assumptions. Therefore, when we have a problem to solve, we shall consider the cost and gain of applying the machine learning approach, and make a tradeoff between them.

Rules 2

First, design and implement metrics.

Building a machine learning system takes much more effort than other projects, which means the feedback loop is also longer. Therefore, building robust and meaningful metrics are vital. The feedback provided by these metrics may shorten the loop, and save time for us.

Rule 3

Choose machine learning over a complex heuristic.

Only if the system based on heuristics is too complex, will we consider transferring it to a machine learning project. One example, we may use regular expression to build a rule-based systems to do ETL jobs. To take care of outliers, we tend to add more and more regex rules, which will become very hard to maintain. In such a situation, it is better to consider applying machine learning algorithms to take care this problem.

ML Phase I: Your First Pipeline

Rule 4

Keep the first model simple and get the infrastructure right.

It is easy to dive deeply into model building/training while ignoring the infrastructure. Suppose if the first trial failed, without a proper infrastructure, we may need build everything from scratch again. This is quite time consuming. Although building the infrastructure takes much time, in long term, this saves much time and effort. Therefore, as suggested in Rule 4, we need keep the first model simple, and build the related infrastructure. This can save a lot of headaches for us.

Rule 5

Test the infrastructure independently from the machine learning.

To a complete machine learning project, the model building/training only occupies a certain percent of effort and time, and the rest includes problem defining, research surveying, data collecting, etc. Similarly, to a complete machine learning system, the model training/serving may only takes a certain part of the whole system; in addition, data collecting, data cleaning, feedback collecting and other components are not less important. Therefore, a machine learning project becomes a complex system, which might be messy to develop and maintain. Model training/serving might be the most complex component in this complex system, which should be separated to develop. Keeping other components developed and tested may reduce the complexity of the project.

Rule 7

Turn heuristics into features, or handle them externally.

Developing a rule-based system and transferring it a learning-based system seems a common practice to some certain problems. The rule-based system may have high precision but low recall, and it might be hard to maintain. Therefore it is needed to transfer to a learning-based model to gain recall and reduce maintenance cost. In this transferring, the heuristics gained in the rule-based system is extremely helpful: they can either be converted to features, or as filters in preprocessing or post processing.

Monitoring

Rule 8

Know the freshness requirements of your system.

Different products have various requirements of the temporal sensitiveness of machine learning system, and different input data may also affect the temporal sensitiveness as well. Thus, the architecture of a machine learning system varies accordingly. For example, in Facebook Newsfeed, an online learning system is included to consider the features from fresh data, and such a design is much more complex than a single machine system. If such a product requirement is decided, we need plan the required architecture at the beginning of the task.

Rule 9

Detect problems before exporting models.

Building a machine learning algorithm also needs extensive tests, for example, performance test, which should cover at least two aspects: processing speed and model metrics. A machine learning system usually involves heavy computing, so using code refactoring to improve processing speed is necessary. Also, how model performs is considered. Including tests from these perspectives can provide a better understanding for yourself and as well as your clients.

Rule 10

Watch for silent failures.

Eventually, a machine learning model is an estimation based on a training set; however, when we serve the model, the data might change or have a different distribution, and the change may expand gradually. Therefore, the machine learning system might cannot predict with features in the trained model. To prevent such failures, we need continuously monitor the data to be predicted, and retrain model with updated training set if necessary.

Rule 11

Give feature column owners and documentation.

Similar to code development, we need carefully document the selected features in a machine learning system. Most algorithms are open-sourced, then the value of a machine learning system lies in the curated features. To maintain or handover a machine learning project, the curated features must be well documented.


TBC

Posted with :

Related Posts