General thoughts
- If a simple algorithm can solve the problem, do we need a more complex algorithm to implement it?
- Most companies use multiple models to solve a single problem.
- Similar to software development, algorithm development needs to iterate as well: model update and feature engineering.
- When the scale is growing, we may need build a platform to support more algorithm developments.
- The evaluation of online inference may be different from the one of offline testing. We need align two evaluations.
- Split online training and offline training
- Do not use micro-service for micro-service. Implement with different gratitude.
Sessions
Session 1 by Prof. Hui WEI from Fudan
Theme: AI
Takeaway: We need treat AI carefully.
Limited definitions of of AI:
- based on code
- can be defined literally
- can be defined by algorithms
- can be computed
Why can Go game be solved by algorithms?
- clean data
- limited rules
- huge amount of tagged data
Unsolved question: how does knowledge guide our action without programmed routines?
Possible direction: a light-weight solution to do inference
Readings:
David Kirsh, Foundations of AI: The Big Issues. Artificial Intelligence. (1991). http://adrenaline.ucsd.edu/kirsh/Articles/BigIssues/big-issues.pdf
Session 2 by Zaiqing NIE from Alibaba
Theme: AliGene
Takeaway: An easy-to-use NLP platform is a new trend.
An HCI platform based on NLP
Request > Intent detection > Slot filling (knowledge base, user profiling) > dialogue management (predefined) > Response
Challenges:
- Ambiguity of natural language
- limited tagged data
- high expectation of precision
- limited NLP developers
Solutions:
Cold start < rules Warm start < deep learning Reinforcement learning Active learning (Use questions to increase certainty)
Readings:
Agichtein, E., & Gravano, L. (2000, June). Snowball: Extracting relations from large plain-text collections. In Proceedings of the fifth ACM conference on Digital libraries (pp. 85-94). ACM. http://www.mathcs.emory.edu/%7Eeugene/papers/dl00.pdf
Session 3 by Hui Wang from Paypal
Theme: Risk Management by Using AI
Takeaway: A single problem might need multiple models and rules
Risk management has no dimension limit, no volume limit for AI.
Data sources:
- account behaviors
- sessions (visiting histoy)
- negative trading history
- operational data
- external data
Detection order:
- transactions
- account level data
- multi account level (related accounts)
Story-based approach to decrease false positives
Run multiple models and rules before making decisions on transactions - all within milliseconds.
Session 4 by Xiongjie LIAO from TalkingData
Theme: Monitoring Micro-service
Takeaway: Monitoring only the key metrics is be more effective and efficient.
Micro-service split-up:
- horizontal
- vertical
- hybrid
Problems of micro-service:
- deployment management
- complicated workflow
- complicated monitoring
Approached of monitoring:
- focusing on performance (lightweight)
- focusing on workflow (currently main stream)
Philosophy of TalkingData’s solution:
- performance monitoring first
- only monitor possible bottlenecks (events VS workflows)
- locate the specific code caused bottlenecks
Key metrics to monitor:
- API response time
- throughput
- API response components
- network IO
- API workflow time (can use dubbo to monitor)
- slow request stacks
Session 6 by Yuefeng ZHOU from Google Brain
Theme: Tensorflow as a Computing Framework
Takeaway: We can use machine learning to optimize machine model / algorithms/ computing.
New features:
- Eager Execution
- 2nd TPU
- Tensorflow APIs (Keras 2.0)
- Estimator
- Tensorflow Lite 1. on mobile device 2. offline inference 3. APM-optimised
- New Input Pipeline
1. input pipeline = lazy lists
2. map + filter
3. tf.data
- Dataset > functional programming
- Iterator > sequential input
Learn2learn algorithms for selecting the best models
- based on RNN
- evolution algorithms 1. select the best as parent 2. pick two randomly 3. drop the lower one 4. copy mutate parents
Reinforcement learning for device placement
- many-device training (distributed)
- design for bigger models
- design for larger batch size
Session 7 by Yunsong GUO from Pinterest
Theme: Homefeed Recommendation in Pinterest
Takeaway:
- Feature engineering may bring higher returns than applying new models.
- Iterate models / features is a possible way to increase performance.
- A simple algorithm may perform better than a complicated model.
- Iterate quickly.
- Online inference may have a different evaluation with the offline training.
Features to use:
- users
- pins
- interactions
Workflow:
- candidate generation (reduce data amount) 1. collaborative filtering (use different pictures from same boards) 2. random walking
- machine learning algorithms
1. time based
2. logistic regressions
- LR
- SVM (sofia-ml)
- feature engineering
- the best feature may increase 4% relevance,
- age is not a useful feature in this case
- GBDT
- use with ensambling model
- split sparse and non-sparse features first
- similar effect as word-embedding
- feed generation (may reduce relevance)
Feature engineering
- specialized < needs to train internal domain experts
- high failure rate < be patient, iterate faster, reduce deployment period
- low transparency < record experiments
- high labor cost < build a platform to support
- high return < higher than new models
Cold start problem:
- ask questions
- use data from similar users
Evaluation difference of online inference and offline training:
- find the correlation between online and offline standards
- map the offline standard to online one.
- reduce experiment period
- filter out low-return experiments
Diversity problem:
- use recommendations from different categories
- drop some relevance for diversity
Session 8 by Lou YIN from Airbnb
Theme: GB Decision Table
Takeaway: Data structure may have a huge impact on performance.
Best practices:
- set depth to a limited number
- set shinkage 1. default is 1 2. better use a smaller one, e.g.: 0.01
- sub-sampling 1. similar to SGD
- tradeoff of model size 1. accuracy ↔ scoring time
Decision table VS decision tree
- freedom: decision table < decision tree
- speed: decision table > decision tree (50% gain)
- variance and bias: decision table < decision tree
Data structure of decision table
Optimization of decision table in gradient boosting
- backfitting:
- cyclic (recommended)
- random (recommended)
- greedy (biggest gain, but slow)
- speed up scoring
- avoid repeated tests
- sort the table by values
- online inference 1. pre-set user features
Feature engineering of GBDT
- feature discretisation
- reduce outliers
- euqi-width
- equi-frequency
- not good for linear features (pre-compute a linear model)
Readings:
Lou, Y., & Obukhov, M. (2017). BDT: Gradient Boosted Decision Tables for High Accuracy and Scoring Efficiency (pp. 1893–1901). Presented at the the 23rd ACM SIGKDD International Conference, New York, New York, USA: ACM Press. https://yinlou.github.io/papers/lou-kdd17.pdf
Session9 by Eric KIM from LinkedIn
Theme: Helix and Nuage
Takeaway:
- Easy-to-use and operatibility should be planned in advance.
- Applying algorithms in engineering is a new trend.
Helix: data storing platform Nuage: data consuming platform
Invisible:
- ultra-simple to use
- elastic and scale easily
- no single point of failure
- highly operatable
challenges:
- fault-tolerance
- data replication
- job management
- load balancing
- high performance
- failure fix
capacity detection:
- pessimistic model
- ARIMA
Session 12 by Zhiyuan ZONG from QIY
Theme: Risk Management in QIY
Takeaway: Always clarify and focus on the needs.
High precision, low recall Split account behavior from textual data Use LSTM to predict user behaviors (sequence prediction)
Session 13 by Ming HUANG from Tencent
Theme: Spark on Angel
Takeaway: Modify a tool if needed.
Needs: solve big models on big data
- driver is the main bottleneck
- needs to reduce dimension
- executors need to wait (PST mode)
Architecture
- mutable layer on immutable layer
- PS model
- mutable
- no new memory allocated
- operate PS model in order to operate on remote servers
- server: flexible
- model layer: virtual
- client: replaceable with DL / spark
Angel API
- simplify writing procedure
- a unifier API
- vector (inherit from Spark: Breeze PS, Cached PS)
- MLLib in Spark is based on Breeze
- easy to migrate to Angel
- matrix
Session 14 by Chao Can from Amazon
Theme: Micro-service and Serverless
Takeaway
- Refactoring to micro-service is an option.
- Consider about data structure when doing refactoring.
- Micro-service has different gratitude, choose accordingly.
The proper team size in Amazon is about the number that can share two pizzas.
Refactoring:
- make code as modules
- interface
- make facade
- split database
- build local mock
Auto-scaling is not at run time, because it needs time to load.
Moving to server less
- benchmark everything
- functions should be stateless
- user AWS Step Functions
- be careful with FAAS (function as a service)
- keep it natural use CLI to control Lambda
Readings: Domain-Driven Design The Art of Scalibility
Session 15 by Youlin LI from Facebook
Theme: Real-time training of Newsfeed
Takeaway: Follow the defined goal to develop and iterate.
Slogan of Facebook:
- Focus on impact
- Move fast
Daily deployment
FB Joiner captures all events for a given story/session within a window, and outputs.
Newsfeed:
- rule-based ranking -> machine-learning-based ranking
- use feedback to retrain model
- real-time joining
- time window: 3 min (invalid if a session is longer than 3 mins)
- tolerate value losses (0.5% loss)